This is the third post in the Ruby GC Deep Dive series. It’ll make most sense if you check out the previous two first:

For this post, we’ll learn about Generational GC. Generational garbage collection is predicated on the weak generational hypothesis: most objects die young. This means most objects will not have any active references, and so be available for collection soon after creation. The hypothesis also says that those objects which don’t die young tend to live for a long time. If they aren’t pretty immediately available for collection, they likely won’t be available for a while.

We can look at a Rails example to justify this hypothesis to ourselves. To generate a webpage for a client request, the Rails application will create many new Ruby objects. Once the page has been returned to the client, all of these objects are no longer needed and their space in memory can be reclaimed. However, there are some objects which need to live between all requests, like controllers, configuration data, user sesssion data, and so on. These objects will live for a long, long time.

Major and Minor GCs

Back to the crux of the point here: as of Ruby 2.1, Ruby introduced Generational GC which takes advantage of the weak generational hypothesis. It concentrates more frequent garbage collection efforts on young, newer objects.

Ruby’s garbage collector actually has two different types of garbage collection: major GCs and minor GCs. Minor GCs happen more frequently, and mostly look at young objects. (There’s an edge case here which we’ll cover in a bit.) Major GCs happen less frequently and look at all objects. Minor GCs are faster than major GCs because they’re looking through fewer objects.

Old and Young Objects

How does Ruby determine whether an object is old or young? Well, any object which has survived a three garbage collections (major or minor) becomes old.

We can prove this to ourselves with a little code snippet. In a future post in this series we’ll look more in depth at ways Ruby allows us to see what’s happening in GC. For now, we just need to know that we can inspect some of Ruby’s trash. I’m going to handwave the nuance behind #old? (I’ll put the method definition and relevant doc links at the bottom of this post):

# We disable any automatic GC runs to ensure that GC is
# only happening when we call it manually
GC.disable

# full_mark: false is a minor gc
# full_mark: true is a major gc
def count_gc_until_old(full_mark)
  gc_count = 0
  obj = Object.new

  while !old?(obj)
    GC.start(full_mark: full_mark)
    gc_count += 1
  end

  gc_count
end

# Major GCs
puts count_gc_until_old(true)
=> 3

# Minor GCs
puts count_gc_until_old(false)
=> 3

We’ve confirmed that running 3 garbage collections will age a young object into an old one!

What triggers minor vs major GCs?

Minor GCs are triggered when the Ruby Heap does not have any free slots left. In this case, it runs a minor garbage collection in an attempt to find new free slots into which it can allocate objects.

Major GC can be triggered in a few ways. If there are still no free slots after a minor GC, a major GC will happen. Major GCs are also triggered is if we cross the internal limit of old objects. This limit increases as the size of the Ruby Heap increases, and we can see it by looking at GC.stat(:old_objects_limit). For a quick example:

GC.stat(:old_objects_limit)
=> 67450

objs = 10_000.times.map { Object.new }
GC.stat(:old_objects_limit)
=> 92770

We can see that if we allocate 10_000 objects, our old_objects_limit will increase, in this case from 67450 objects to 92770 objects.

There are a couple more ways a major GC will be triggered. Notably for now, we can manually trigger a major GC by running GC.start. Compaction can also trigger major GC as we’ll learn in a future post.

Write barriers

You might have noticed a problem in the algorithm above with old and new objects. This works fine if new objects reference old objects, but what if it happens the other way around? What if we have some RVALUE in an old generation, say, and we create a new RVALUE referenced from that old RVALUE?

diag-wo

As this diagram illustrates, we’ll run into a problem where the garbage collector is marking all new objects in a minor GC, doesn’t see any references to this new object (because the reference is with the old objects) and so collects it even though it is referenced by an old RVALUE which is reachable from the root. Uh-oh.

Fear not, the folks who wrote Ruby’s garbage collection implemented a very neat solution to this problem using write barriers. Write barriers are pieces of code that are executed whenever an object written to. Relevant to GC, we can put write barriers on top of objects and then know when an old object has been written to. The garbage collector then puts these old objects which have references to new objects in a remembered set. This leaves us with the final step - minor GCs look at the remembered set as well as new objects.

diag-w

We are therefore guaranteed that all objects which are not young and not in the remembered set will not have any young references. This means the problem we noticed earlier in this section has disappeared completely, and we can proceed knowing our generational garbage collector is indeed looking at all objects which it needs to.

TL;DR

Generational garbage collection is a strategy to speed up Ruby’s GC. It does this by having two different passes of garbage collection: more frequent, minor GCs, where it looks at new objects and less frequent, major GCs where it looks at all objects.

Here are all the new terms we learned:

  • Weak Generational Hypothesis: Most objects die young
  • Young (new) Objects: Objects which have not survived a GC yet and will either have their slots reclaimed or be marked old in the next GC
  • Old Objects: Objects which have survived a GC and are therefore only marked and swept in major GCs
  • Major GC: GC where we mark and sweep all objects in Ruby’s Object Space
  • Minor GC: GC where we mark and sweep only young objects and objects in the remembered set
  • Rememebered Set: Old objects which have references to new objects
  • Write barriers: Callbacks which Ruby uses to determine when a new object references an old object

As I said, we’ll learn how to use Ruby’s GC, ObjectSpace and GC::Profiler functionality in future posts. (Small plug to drop your email address here or follow me on twitter to learn when new posts in this series and eventually a whole book are written!)

But, because I teased it, here’s how I defined #old? above:

require "json"
require "objspace"

def old?(obj)
  !!JSON.parse(ObjectSpace.dump(obj))["flags"]["old"]
end

And if you’d like to play around more with the code snippets, GC.start(full_mark: false) will run a minor GC whereas GC.start(full_mark: true) or just GC.start will run a major GC. Enjoy!