One question I had when initially learning about the Tri-Color Mark and Sweep algorithm is why Tri-Color? Why not just two colors? (If you haven’t read my mark and sweep post, that intro will be helpful before diving into this one!)
Interestingly enough, before Ruby 2.2, Ruby was using a run-of-the-mill Mark and Sweep algorithm with no colors. The mark phase was simple: Ruby’s GC traversed all RVALUES, starting at the root, and marked any RVALUE it encountered.
So what happened in Ruby 2.2? Why did this become Tri-Color? I’ve eliminated some of the suspense with the title of this blog post: Incremental GC was introduced to Ruby.
But… I’ve also saved some suspense. Why? And how does that relate to Tri-Color?
Ruby’s uses a stop-the-world garbage collector. The world is really just the execution of our program. Each time Ruby’s GC runs, the execution of our program is paused completely. This is okay for most minor GCs, but could be cumbersome for major GCs. (See my previous post about Generational GC for definitions of minor and major GC.) It would be especially cumbersome if these pauses happen at inopportune times.
Imagine a world (cough program execution cough) where a user is loading a webpage that usually takes a hundred milliseconds. But, it just so happens that a big major GC runs at the time of their page load.
Now seemingly out of nowhere, their world is paused for the garbage collector to run and their page takes way longer to run than normal with no explanation to them. Clearly this would be suboptimal.
Our mysterious Tri-Colors afford us a way to pause garbage collection. Specifically, we can thank the grey color for this ability. If we look at a snapshot of RVALUES and their references mid-garbage collection like this, we can clearly see what the garbage collector has to do next:
We know it can pick any one of the grey RVALUES and continue to follow its algorithm of finding references, marking them grey, and then marking the initial RVALUE black. We know exactly which RVALUES the garbage collector has already looked at, and which ones it still must inspect.
Contrast this with the idea of a plain old mark and sweep algorithm. If we paused a non-Tri-Color Mark and Sweep in the middle of its execution of the mark phase, we wouldn’t know what should happen next. It wouldn’t be clear which part of the traversal we were on. Even if we kept a pointer to the current node, or something like that, it wouldn’t be clear which other RVALUES had references which were already traversed or not.
If we interrupted a non-Tri-Color Mark and Sweep in the middle of its execution, we would have to start all over. And here is the real genius of the Tri-Color Mark and Sweep algorithm! Sure, it introduces slighly more cognitive and logical complexity than its predecessor. But in return it allows us to pause and not have to begin again!
This means that we can incrementally run the garbage collector. Or, put another way, it means we can run a little bit of our mark phase at a time. Insted of risking a long stop-the-world pause at an inopportune time, incremental GC allows us to do a little bit of marking interleaved with a little bit of program execution.
Write Barriers, again!
There is one crucial detail we haven’t accounted for yet. What if the world changes while our program is executing and mid-way through this incremental mark phase? Critically, what if an RVALUE we had already marked black now references a new white RVALUE. The garbage collector needs to know about this! Otherwise it would think it had already marked all references from the black RVALUE, never see this new white RVALUE, and sweep it away even though we have a live reference to it.
Write barriers to the rescue! As you may recall from the post about Generational Garbage Collection, write barriers allow us to execute pieces of code whenever an object is written to.
Can you see how this solves our problem? The garbage collector can use write barriers to its advantage in the case described above!
Whenever a black RVALUE is changed while the incremental marking is on one of its pauses, the garbage collector can use a write barrier to re-color the black RVALUE grey. This will force the garbage collector to again look at all of the RVALUES that the now-grey RVALUE is referencing. Critically, this will catch the case where the execution of our program interleaved with GC introduces a white RVALUE referenced by a black RVALUE.
Write Barrier Unprotected Objects
Uh-oh. Some objects are not protected by write barriers, called write barrier unprotected objects. We’ll go into why this happens in future posts, for now it’s just important to understand that our programs can have write barrier unprotected objects. This may seem like a flaw in our above strategy. But, it’s actually not a huge problem at all. The garbage collector simply looks at all write barrier unprotected objects at the very end of its incremental mark.
An immediate concern may be that this would slow down the program. It turns out there are usually not many of these write barrier unprotected objects, and so their mark time does not add significant overhead to the GC algorithm. However, if you ever see a case where minor GC seems like it is taking longer than it should, looking at the number of write barrier unprotected objects is a good place to start investigating.
One other point to notice about the write barrier solution is that it introduces slightly more work for the garbage collector. In the plain Mark and Sweep algorithm, the garbage collector only had to mark and trace references of each RVALUE once. Now, it has to re-mark some RVALUES. This will add time to the total time spent in garbage collecting; the garbage collector is doing more work.
But, the garbage collector is now also never stopping the execution of the program for too long. This is a case where the benefits outweigh the costs. It is more effective to smooth program execution to have these incremental marks interleaved with program execution even if this increases the total time spent marking.
That was a whirlwind tour or incremental GC. Let’s recap some new definitions we’ve learned in this post:
- Stop-the-world: refers to pausing the execution of our program to run garbage collection
- Incremental gc: a strategy to do a little bit of marking at a time and cause less interference with the execution of the program
- write barrier unprotected: when we don’t have access to run snippets of code every time an object is written to.
This post is part of a Ruby GC Deep Dive Series I’m writing. Other posts in the series so far are:
If you’d like to stay up to date on the series and the book I’m also writing, please drop your email address below! Can promise more jokes and (hopefully) much more learning.