Ruby Garbage Collection Deep Dive: Object IDs

Through the Ruby GC Deep Dive series we’ve examined different strategies Ruby uses for its garbage collection. In this post, we’ll take a quick detour from GC strategies and instead examine the implications of these strategies on object_ids.

Every Ruby object gives us access to an object_id as a unique identifier for a specific instance of an object. If we read the Ruby docs, we can see that Object#object_id guarantees uniqueness and consistency of object_ids across objects. Explicitly, the docs say, “The same number will be returned on all calls to object_id for a given object, and no two active objects will share an id.”

Special Objects

In Ruby there are certain types of objects which are special cases of Object#object_id. Examples of these are true, false, nil and Integers. We’re not going to discuss Object#object_id calls for these special cases in this post because they’re not relevant. But… they are interesting. If you’re curious, look for the pattern in Object#object_id on Integers.

Before Ruby 2.7

Back on topic. Prior to Ruby 2.7, Object#object_id was derived from an object’s address in memory. This seems simple enough - each object has a unique memory address that only it occupies - Ruby can use this memory address to derive a unique object_id.

In Ruby versions below 2.7, object_id is derived by shifting the memory address to the right by one. We can actually see this! In an IRB console in Ruby 2.6.3, we can run the following code:

obj = Object.new
=> #<Object:0x00007fb8240a8ca8>

address = obj.inspect.match(/0x([0-9a-f]+)/)[1]
=> "00007fb8240a8ca8"

# The address is in hexadecimal so we use
# String#to_i(16) to convert it to decimal
obj.object_id == address.to_i(16) >> 1
=> true

And we’ve confirmed that shifting the memory address to the right one gives us the object_id!

Ruby 2.7 Onwards

Ruby 2.7 introduced a big change to Object#object_id. Why? Compaction was introduced in Ruby 2.7.

As we know from the previous post about compaction in this series, compaction can change the memory addresses of objects. So if we were to keep deriving Object#object_id from memory addresses, we would be stuck. What would happen when an object moved, and a new one occupied its former address? And, what would happen to the object_id of the moved object? How would Ruby keep the consistency guarantee of an object_id?

Instead, from Ruby 2.7 onwards, each object is only assigned an object_id at the moment that Object#object_id is called. Ruby simply increments the value of the last assigned object_id whenever Object#object_id is called on a new object. This is how Ruby ensures that each object_id will still be unique.

Ruby also keeps a map of memory addresses to object_ids. After the Ruby heap is compacted, this map is updated based on any new memory locations. So instead of having object_ids tied to addresses in memory, from Ruby 2.7 onwards object_ids are provided by a monotonically increasing counter.

We can again see this in action! If we run a little Ruby snippet in an IRB console, we can see that on each Object#object_id call, we get an object_id which increases by 20:

3.times.map { Object.new.object_id }
=> [260, 280, 300]

In the same console, we can then create a new object without calling Object#object_id on it:

obj = Object.new
=> #<Object:0x00007fb5b9090d00>

We’ll then see that our next call to Object#object_id will get the next object_id in the sequence from above: 300 + 20 == 320:

Object.new.object_id
=> 320

If we now call Object#object_id on obj (our previously created object), we’ll get the next number in the sequence, 340, even though it was initialized before the object with object_id == 320.

obj.object_id
=> 340

Pretty neat, huh?

Compaction and object_ids

We can also look at the effect of compaction on object_ids. Hint: we should see none! Compacting a heap should not interfere with the object_ids since Ruby guarantees us that they will be unique and specific to a specific object.

To test this, we’re going to create an array with many objects, and then one specific object afterwards. We’ll then set the array to nil, clearing out the many slots occupied by the array’s elements in the heap. If we compact the heap, we should see the specific object change address due to all the space vacated by setting the array to nil. (We can see the address using Object#inspect.) But, what we’re really looking for here is that the object will keep its initial object_id.

Enough words, I’ll let code clarify:

array = 100_000.times.map { Object.new }

obj = Object.new
=> #<Object:0x00007f9863aa81d0>
obj.object_id
=> 260

array = nil
GC.compact

# New memory address (7f9863aa81d0 != 7f9864049e40)
obj.inspect
=> "#<Object:0x00007f9864049e40>"

# Same object_id (260 == 260)
obj.object_id
=> 260

Phew, this change to how Object#object_id works in Ruby 2.7+ still gives us the same guarantees on object_ids, even if the objects themselves can change memory addresses.

TL;DR

Prior to Ruby 2.7 Object#object_id depended on memory address
Ruby 2.7 introduced compaction which meant some objects’ memory addresses could change
From Ruby 2.7 onwards, Object#object_id is determined using a monotonically increasing counter
An object is only assigned an object_id when Object#object_id is called on it

For more about Ruby garbage collection, check out the full series, or leave your email address below!