I noticed something interesting about python
a=10
b=10
print(a is b)
True
But also,
a=1000
b=1000
print(a is b)
False
The small integer cache
Python (specifically CPython) pre-caches integers in the range -5 to 256 at interpreter startup. These integers are interned, meaning there’s only one shared object for each of them.
So when you do:
a = 42
b = 42
Both a and b point to the same memory address, because 42 is inside the small-int cache. That’s why a is b is True.
But:
x = 1000
y = 1000
Now you’re outside that cached range, so Python creates two separate int objects—even if their values are the same.
Cython Implementation
Inside CPython’s source code (Objects/longobject.c), there’s an array called _PyLong_SMALL_INTS
.
This array is initalised at startup, and all small ints(-5 to 256) that python tries to acess are returned from this array using this method:
static PyObject *
get_small_int(sdigit ival)
{
assert(IS_SMALL_INT(ival));
return (PyObject *)&_PyLong_SMALL_INTS[_PY_NSMALLNEGINTS + ival];
}
This stays true even if you generate it at runtime (e.g., int("10")
, or 3 + 7
) it returns the already-initialized object from this array. And since this cache is initalised and used for each python process, small int ids remain consistent across threads too.
import threading
def print_id():
x = 3
print(id(x))
threads = [threading.Thread(target=print_id) for _ in range(4)]
for t in threads: t.start()
...
4327116120
4327116120
4327116120
4327116120
TL;DR
- Integers from -5 to 256 are cached and reused.
- This is why a is b might be True for small numbers but False for large ones.
- It’s a smart performance trick, just don’t confuse it with value equality.