I noticed something interesting about python

>>> a=10
>>> b=10
>>> print(a is b)
True

But also,

>>> a=1000
>>> b=1000
>>> print(a is b)
False

The small integer cache

Python (specifically CPython) pre-caches integers in the range -5 to 256 at interpreter startup. These integers are interned, meaning there’s only one shared object for each of them.

So when you do:

>>> a = 42
>>> b = 42

Both a and b point to the same memory address, because 42 is inside the small-int cache. That’s why a is b is True.

But:

>>> x = 1000
>>> y = 1000

Now you’re outside that cached range, so Python creates two separate int objects—even if their values are the same.

Cython Implementation

Inside CPython’s source code (Objects/longobject.c), there’s an array called _PyLong_SMALL_INTS. This array is initalised at startup, and all small ints(-5 to 256) that python tries to acess are returned from this array using this method:

static PyObject *
get_small_int(sdigit ival)
{
    assert(IS_SMALL_INT(ival));
    return (PyObject *)&_PyLong_SMALL_INTS[_PY_NSMALLNEGINTS + ival];
}

This stays true even if you generate it at runtime (e.g., int("10"), or 3 + 7) it returns the already-initialized object from this array. And since this cache is initalised and used for each python process, small int ids remain consistent across threads too.

>>> import threading
>>>
>>> def print_id():
...     x = 3
...     print(id(x))
...
>>> threads = [threading.Thread(target=print_id) for _ in range(4)]
>>> for t in threads: t.start()
...
4327116120
4327116120
4327116120
4327116120

TL;DR

  • Integers from -5 to 256 are cached and reused.
  • This is why a is b might be True for small numbers but False for large ones.
  • It’s a smart performance trick, just don’t confuse it with value equality.