Clarifications to the Object References in Python

Now let me begin by thanking all those at Python Meetup Bangalore for attending the talk that I gave. The questions from the audience really served to question my own understanding . I hope to reconnect with at least some of you during the next talk and at least have a conversation on what we discussed. But in the meantime, I am trying to answer some of the questions were asked then.

Questions from the Audience:

  1. Try the following out:

A = 257
id(A)                     #25092720
B = 257
id(B)                     #25092752

The memory addresses are not pointing to the same location why?

Some of the concepts that I explored are there in the following posts:

  1. Dynamic Types of Python
  2. Memory allocations of complex data types

One of the things I erroneously assumed were because of this observation.

A = 1
id(A)                     #25092720
B = 1
id(B)                     #25092720

A = “ABC”
id(A)                      #43700992
B = “ABC”
id(B)                      #43700992

As can be observed, both A and B are pointing either to the same object 1 or “ABC”. But as the question showed, this does not hold true for all integer objects above 256 and every string (you can type in a sentence and you will find they have different ids). Why?

The reason is because of a concept known as interning.

Every programming language constantly battles over the memory versus speed performance. You could in theory have one immutable object and have all the references to that data point to that object which would reduce our usage of memory considerably. But that would mean each assignment or creation of data goes through a process of finding that object first in memory. So we point to the same object when it is cheap and easy to do so else just create an object regardless of whether it already exists.

In case of integers (in the range of -5 to 255), these are always interned and they always return the same object during the process of creation or assignment. In other cases, almost always a new one is created and these are returned simply because maintaining them across functions will be a trade-off in terms of speed.

In case of strings, this is not so predictable. But in general, Python follows these rules

  • All length 0 and length 1 strings are interned
  • Strings are interned at compile time

There is actually a keyword in Python known as intern.

Additional Notes

More interesting behavior that can be observed:

a = “a”
b = “b”

print (id(a+b), id(b+a))          #64865480 64865480

print (id(a+b))                         #64865480

print (id(b+a))                         #64867048

With the above example, we also have to discard the notion that if an object resides in the same location, we cannot conclusively state they are the same.

In some future post, I will try and go through what the confusion is all about.

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s