I noticed something interesting about python
>>> a=10
>>> b=10
>>> print(a is b)
True
But also,
>>> a=1000
>>> b=1000
>>> print(a is b)
False
The small integer cache
Python (specifically CPython) pre-caches integers in the range -5 to 256 at interpreter startup. These integers are interned, meaning there’s only one shared object for each of them.
So when you do:
>>> a = 42
>>> b = 42
Both a and b point to the same memory address, because 42 is inside the small-int cache. That’s why a is b is True.
But:
>>> x = 1000
>>> y = 1000
Now you’re outside that cached range, so Python creates two separate int objects—even if their values are the same.
Cython Implementation
Inside CPython’s source code (Objects/longobject.c), there’s an array called _PyLong_SMALL_INTS
.
This array is initalised at startup, and all small ints(-5 to 256) that python tries to acess are returned from this array using this method:
static PyObject *
get_small_int(sdigit ival)
{
assert(IS_SMALL_INT(ival));
return (PyObject *)&_PyLong_SMALL_INTS[_PY_NSMALLNEGINTS + ival];
}
This stays true even if you generate it at runtime (e.g., int("10")
, or 3 + 7
) it returns the already-initialized object from this array. And since this cache is initalised and used for each python process, small int ids remain consistent across threads too.
>>> import threading
>>>
>>> def print_id():
... x = 3
... print(id(x))
...
>>> threads = [threading.Thread(target=print_id) for _ in range(4)]
>>> for t in threads: t.start()
...
4327116120
4327116120
4327116120
4327116120
TL;DR
- Integers from -5 to 256 are cached and reused.
- This is why a is b might be True for small numbers but False for large ones.
- It’s a smart performance trick, just don’t confuse it with value equality.