There’s an old saying that we can learn a lot about people by looking at their trash. The same can hold true about programming languages. More precisely, we can learn a lot about a programming language by looking at its garbage collection. This post kicks off a series of posts (and eventually a book) about Ruby’s GC.

Continuing with the (admittedly not great) metaphor here, we’ll start by looking at what is always in Ruby’s trash. Ruby exposes GC::INTERNAL_CONSTANTS, so we can see what is constant within the garbage collector.

GC::INTERNAL_CONSTANTS

Printing out GC::INTERNAL_CONSTANTS in Ruby 3.0, we see the following:

{
  :DEBUG=>false,
  :RVALUE_SIZE=>40,
  :HEAP_PAGE_OBJ_LIMIT=>409,
  :HEAP_PAGE_BITMAP_SIZE=>56,
  :HEAP_PAGE_BITMAP_PLANES=>4,
  :HEAP_PAGE_SIZE=>16384
}

We’ll go one by one through these constants, and in defining them, learn much more about how Ruby organizes its memory.

:DEBUG => FALSE

This :DEBUG constant is really more useful for debugging writing Ruby source code than writing Ruby programs. Almost every Ruby program we run will have :DEBUG => FALSE so we don’t have to worry about it!

For the extremely curious, the only way :DEBUG is TRUE is if we compile Ruby source code with a cppflag setting DGC_DEBUG to true (see this example). But, since most of the time we’re not compiling Ruby source code, it’ll almost always be FALSE.

:RVALUE_SIZE => 40

It’s important to first note that the units for all of these constants which end in _SIZE are bytes. So this constant is telling us that each RVALUE is 40 bytes.

Neat. But, it begs the question, what is an RVALUE? An RVALUE contains basic information about Ruby objects. RVALUES are C structs which are unions of various C representations of Ruby objects.

As part of this information, they either contain the actual values of Ruby objects, or, if the values are too large for the 40 byte limit, they contain pointers to where on the OS heap the Ruby object’s value lives.

(Small aside, strings with 23 characters or fewer have thier values stored inside RVALUES, while strings with more than 23 characters have their values stored on the OS Heap. Pat Shaughnessy has a fun blog post explaining this nuance.)

:HEAP_PAGE_OBJ_LIMIT => 409

Okay, next up, HEAP_PAGE_OBJ_LIMIT. 409 is the maximum number of objects per each HEAP_PAGE. But what is a HEAP and what is a PAGE?

The Ruby HEAP is Ruby’s entire Object Space, or memory. It is where all of our RVALUES live. The HEAP is segmented into PAGES. When the garbage collector needs more memory, it doesn’t request new memory for each individual object from the OS. As we learned above with the RVALUE_SIZE, this would mean requesting more memory in 40 byte segments, which would be wildly inefficient.

Instead, when the HEAP needs more memory, it requests an entire new PAGE. These pages each contain up to 409 objects. Some pages contain slightly fewer than 409 objects; Aaron Patterson has a great blog post which explains why. But, as this constant tells us, 409 is the upper limit of objects (or RVALUEs) per page.

:HEAP_PAGE_BITMAP_SIZE => 56

We now know HEAP and PAGE, but what is the BITMAP? Well, each HEAP_PAGE also has a representation of its objects in a bitmap. Each object is represented by one bit.

Okay, so this bitmap takes up 56 bytes of memory. The math here checks out. There are 8 bits in a byte. Which means with our 56 bytes we have access to 56 * 8 == 448 objects which will safely cover all of the 409 objects we want to represent.

But, I still haven’t answered the question of why we need these bitmaps to represent all of the objects in the first place. This is integral to the actual algorithm Ruby uses for garbage collection. It’s a tri-color mark and sweep algorithm. The details of this algorithm deserve at least one of their own entire blog posts. (When I’ve written that, I’ll link it here.) For now, you’re just going to have to trust that it’s important this bitmap exists per page, and has enough space for each object within a page.

:HEAP_PAGE_BITMAP_PLANES => 4

I must admit, this one stumped me. So I searched the Ruby source code, found where it is set to 4, but absolutely no use of the variable itself. As far as I can tell, this value is unused. I then, admittedly, dove deep into a nostalgic-for-travel googling of airplanes.

Anyways, back to the matter at hand, if anyone knows what HEAP_PAGE_BITMAP_PLANES means, please, please let me know, I am super curious to learn! I have a hunch that it is remnants from an older Ruby version….

:HEAP_PAGE_SIZE => 16384

Lastly, we have the size of the pages themselves. 16384 bytes means each page is a little over 16KB. These pages have headers with some information, and then all of their RVALUES. The spaces where the RVALUES are stored are called slots. So above, when we said the HEAP_PAGE_OBJECT_LIMIT was 409, another way to frame this would have been that each page has a maximum of 409 slots.

Again, we can check the math here. There should be enough space within each page for all of its RVALUES, and a little extra for the header information. We know each page has a maximum of 409 RVALUES, and each RVALUE is 40 bytes. So 409 * 40 == 16,306 < 16384.

TL;DR

For any visual learners, here’s a diagram explaining the above:

Heap diagram

And here’s a quick definition-based summary:

  • Heap (Variable size): Ruby’s Object Space, or memory. Where Ruby stores all of its Pages.
  • Page (~16 KB): How Ruby segments its Heap. Heaps contain pages. Pages contain at most 409 slots.
  • Slot (40 bytes): A space on a Page where an RVALUE is stored
  • RVALUE (40 bytes): Ruby’s C representation of objects. Sometimes contains the value of the object, sometimes contains a pointer to where the object lives in the OS heap

That’s all for this post! As I referenced earlier, I will be writing much more about garbage collection (both blog posts and a book!!). If you’re interested in following along and hearing when I have new posts, please leave your email address below.