While reading through CPython code for the last post, I came across string interning and how it’s used as a method of optimisation.
What is string interning?
String interning is a CPython optimisation that enables python to reuse immutable string objects instead of creating new ones. Much like the interned integers I wrote about in the last post, interened strings can be compared using pointer checks instead of an actual string comparision.
Without interning
string1="apple crumble"
string2="apple crumble"
string1==string2
Python compares the strings character by character till it finds a difference, which makes the worst case complexity O(n). Where n is the length of the strings (for cases where string lengths differ, python does an early exit, quicker).
With interning
string1 = sys.intern("apple crumble")
string2 = sys.intern("apple crumble")
string1 is string2
Python can just check wether both the variables are referencing the same object, which makes the complexity O(1).
Try printing the id of each variable, print(id(string1))
How is this used in Python
While python does let you intern things explicitly, like we did in that example above, I’ve not really seen any examples of people explicitly interning strings as an optimisation. Tbh, if someone was to ask me to review a PR where they were explicitly using string interning, 9 out of 10 times I’d call it premature optimisation which doesn’t benefit the performance as much as it messes with the redability. The remaining 1 time would be a case where certain strings are repeated a lot, like building a tokenizer or parser.
CPython would implicitly intern strings that
- Are valid identifiers (e.g., “variable_name”)
- Are short literals (usually ≤ 20–30 characters)
- Appear as names in code: variables, attributes, keywords
- Are repeated in the same module and optimized during compilation
It doesn’t usually intern strings that
- Contain spaces, punctuation, or special characters
- Are constructed at runtime (e.g. via join(), format(), user input)
- Are long or unique, where interning would waste memory
Try this out with strings that you think would fit the criteria
x="var1"
y="var1"
print(id(x))
print(id(y))