Ruby Garbage Collection Deep Dive: Object IDs
Through the Ruby GC Deep Dive series we’ve examined different strategies Ruby uses for its garbage collection. In this post, we’ll take a quick detour from GC strategies and instead examine the implications of these strategies on object_id
s.
Every Ruby object gives us access to an object_id
as a unique identifier for a specific instance of an object. If we read the Ruby docs, we can see that Object#object_id
guarantees uniqueness and consistency of object_id
s across objects. Explicitly, the docs say, “The same number will be returned on all calls to object_id for a given object, and no two active objects will share an id.”
Special Objects
In Ruby there are certain types of objects which are special cases of Object#object_id
. Examples of these are true
, false
, nil
and Integers. We’re not going to discuss Object#object_id
calls for these special cases in this post because they’re not relevant. But… they are interesting. If you’re curious, look for the pattern in Object#object_id
on Integers.
Before Ruby 2.7
Back on topic. Prior to Ruby 2.7, Object#object_id
was derived from an object’s address in memory. This seems simple enough - each object has a unique memory address that only it occupies - Ruby can use this memory address to derive a unique object_id
.
In Ruby versions below 2.7, object_id
is derived by shifting the memory address to the right by one. We can actually see this! In an IRB console in Ruby 2.6.3, we can run the following code:
obj = Object.new
=> #<Object:0x00007fb8240a8ca8>
address = obj.inspect.match(/0x([0-9a-f]+)/)[1]
=> "00007fb8240a8ca8"
# The address is in hexadecimal so we use
# String#to_i(16) to convert it to decimal
obj.object_id == address.to_i(16) >> 1
=> true
And we’ve confirmed that shifting the memory address to the right one gives us the object_id
!
Ruby 2.7 Onwards
Ruby 2.7 introduced a big change to Object#object_id
. Why? Compaction was introduced in Ruby 2.7.
As we know from the previous post about compaction in this series, compaction can change the memory addresses of objects. So if we were to keep deriving Object#object_id
from memory addresses, we would be stuck. What would happen when an object moved, and a new one occupied its former address? And, what would happen to the object_id
of the moved object? How would Ruby keep the consistency guarantee of an object_id
?
Instead, from Ruby 2.7 onwards, each object is only assigned an object_id
at the moment that Object#object_id
is called. Ruby simply increments the value of the last assigned object_id
whenever Object#object_id
is called on a new object. This is how Ruby ensures that each object_id
will still be unique.
Ruby also keeps a map of memory addresses to object_id
s. After the Ruby heap is compacted, this map is updated based on any new memory locations. So instead of having object_id
s tied to addresses in memory, from Ruby 2.7 onwards object_id
s are provided by a monotonically increasing counter.
We can again see this in action! If we run a little Ruby snippet in an IRB console, we can see that on each Object#object_id
call, we get an object_id
which increases by 20
:
3.times.map { Object.new.object_id }
=> [260, 280, 300]
In the same console, we can then create a new object without calling Object#object_id
on it:
obj = Object.new
=> #<Object:0x00007fb5b9090d00>
We’ll then see that our next call to Object#object_id
will get the next object_id
in the sequence from above: 300 + 20 == 320
:
Object.new.object_id
=> 320
If we now call Object#object_id
on obj
(our previously created object), we’ll get the next number in the sequence, 340
, even though it was initialized before the object with object_id == 320
.
obj.object_id
=> 340
Pretty neat, huh?
Compaction and object_ids
We can also look at the effect of compaction on object_id
s. Hint: we should see none! Compacting a heap should not interfere with the object_id
s since Ruby guarantees us that they will be unique and specific to a specific object.
To test this, we’re going to create an array with many objects, and then one specific object afterwards. We’ll then set the array to nil, clearing out the many slots occupied by the array’s elements in the heap. If we compact the heap, we should see the specific object change address due to all the space vacated by setting the array to nil
. (We can see the address using Object#inspect
.) But, what we’re really looking for here is that the object will keep its initial object_id
.
Enough words, I’ll let code clarify:
array = 100_000.times.map { Object.new }
obj = Object.new
=> #<Object:0x00007f9863aa81d0>
obj.object_id
=> 260
array = nil
GC.compact
# New memory address (7f9863aa81d0 != 7f9864049e40)
obj.inspect
=> "#<Object:0x00007f9864049e40>"
# Same object_id (260 == 260)
obj.object_id
=> 260
Phew, this change to how Object#object_id
works in Ruby 2.7+ still gives us the same guarantees on object_id
s, even if the objects themselves can change memory addresses.
TL;DR
- Prior to Ruby 2.7
Object#object_id
depended on memory address - Ruby 2.7 introduced compaction which meant some objects’ memory addresses could change
- From Ruby 2.7 onwards,
Object#object_id
is determined using a monotonically increasing counter - An object is only assigned an
object_id
whenObject#object_id
is called on it
For more about Ruby garbage collection, check out the full series, or leave your email address below!