SI 413 Fall 2021 / Labs


This is the archived website of SI 413 from the Fall 2021 semester. Feel free to browse around; you may also find more recent offerings at my teaching page.

Lab 10: Garbage Collection

This lab is due at 2359 on Wednesday, 24 November. It should be submitted to the course SI413 as Lab10, with the files named properly according to the lab instructions. See the submit page for details. You are highly encouraged to submit whatever you have done by the end of this lab time. Remember that you must thoroughly test your code; the tests that are visible before the deadline are only there to make sure you are using the right names and input/output formats.

1 Starter Code

You should already have a shared git repository spl. To get the starter code for today's lab, cd to your si413/spl directory and run these commands:

git pull starter main
git pull

You should now have a folder like si413/spl/lab10 with the starter code for this lab and a handy script to submit your code.

You can also browse the starter code from the Gitlab page (USNA intranet only).

2 Background

via GIPHY

Today's lab is all about garbage collection. In it, we will be using a few C++ features that we haven't seen yet: static methods, static fields, constructors, and deconstructors. Except for destructors, they all have counterparts in Java which you already know well. Here's a brief overview on each.

Static methods

Also sometimes called class methods, these are methods in a class that are not called on any particular object of that class. They have access to any static fields in the class (see below), but not any of the other fields, and they also have no this pointer.

You declare a static method by using the word static:

class SomeClass {
  private:
    ...

  public:
    static int foo() { /* body of the method goes here. */ }
};

Static fields

Static fields are data members that exist only once for the entire class, not for each individual object. Really, they are just dressed-up global variables.

The tricky thing with static fields is that they are declared in the class declaration in the header file, as usual, but they must also be declared, and possibly initialized, in some cpp file. So for example we might have

class SomeClass {
  private:
    static bool x;

  public:
    ...
};

This would go in the header file. Then, in the cpp file, we have to initialize the static member as follows:

bool SomeClass::x = true;

Constructors

Okay, we've been using constructors for a while now, so you should know what they look like. They're just like any other (non-static) class method, except that (1) there is no return type, and (2) the name is the same as the name of the class. For instance:

class SomeClass {
  public:
    SomeClass (int arg) { /* This is a constructor. */ }
};

Destructors

We haven't yet seen destructors. These are called whenever an object is de-allocated at run time. Remember this can happen either when a local variable goes out of scope (i.e., the function or block it was declared in comes to an end), or when an object allocated using new is explicitly deallocated with a call to delete.

A destructor is again like any other class method, except that it has no return type, no arguments, and the name is a tilde (~) followed by the class name. For example:

class SomeClass {
  public:
    ~SomeClass() { /* This is a destructor. */ }
};

(You won't necessarily have to write any destructors for this lab, I just wanted to make sure you know what they are.)

3 Counting Frames

To help see why we really need automatic garbage collection in our SPL interpreter, let's see just how many frames are being created in our program. You'll do this by storing a big list of all the Frames that get created, as well as a way to access the size of this list from within an SPL program.

First thing to do is modify the Frame class to keep track of all Frames that get created in a list. This requires a few additions to the class:

  • A static field of type list<Frame*> that will contain a pointer to every Frame object that is currently allocated by the interpreter. You will have to
    #include <list>
    at the beginning of frame.hpp to be able to use this built-in data structure. Then you'll need a declaration like
    static list<Frame*> allFrames;
    inside the Frame class, to declare the static member. Finally, you have to initialize this member somewhere outside the class. That means adding a line like
    list<Frame*> Frame::allFrames;
    at the beginning of the spl.ypp parser file, in the same place the other global variables are initialized.
  • A (public) static method to get the number of Frames in memory. Of course this will just return the size of the list you just set up.
  • A new line in the Frame constructor that adds a pointer to the newly-created frame to the list. (Remember the special keyword that is used to get a pointer to the current object?)
    You might want use the push_back method of the list class; see here for more complete reference.

After these modifications, your SPL interpreter should work just fine, but you have no way of telling that it's doing anything differently. We want a function that can be called from within a SPL program to determine the number of Frames currently allocated in memory. Sounds like a perfect candidate for a built-in! Go back and review Lab 9 if you need help making another built-in function, or look at the starter code for this week's lab.

Exercises

  1. Modify the Frame class as described above to keep track of allocated frames in a list. There are no tests for this, but to help yourself debug, you might want to add, say, a debugging statement to the Frame constructor that prints out the current size of the list to cerr.
  2. Create a built-in function frames that ignores its argument and returns the number of Frames in the list at that moment. (To emphasize, your built-in must take one argument, since that's how SPL works. It just shouldn't do anything with the argument.)
    After this, you should be able to do something like:
    spl> write frames@0;
    2
    spl> new sum := lambda n {
      ifelse n=1
        {ret := 1;}
        {ret := n + sum@(n-1);}
    };
    spl> sum@5;
    spl> write frames@0;
    18
    spl> sum@100;
    spl> write frames@0;
    319
    (Your actual numbers may be slightly different if you used your own solution from last week instead of mine.)
Submit now and check the autotesting so far! Just run ./submitme.sh from your lab directory.

After you get this working, you should try some examples to see how many frames are created. You can of course run whatever examples you like, but here's a fun one that I made: a rather inefficient function to compute the prime factorization of any positive integer:

# Takes an integer and prints out all its prime factors
new factors := lambda n {

  # Produces the remainder when the first argument is divided by the second
  # (curried function)
  new remainder := lambda a {
    ret := lambda b {
      ret := a - a/b*b;
    };
  };

  # Returns the smallest (prime) factor of n, where n >= 2.
  new smallest_factor := lambda n {
    new helper := lambda i {
      ifelse remainder@n@i = 0
        { ret := i; }
        { ret := helper@(i+1); }
    };
    ret := helper@2;
  };

  if n > 1 {
    new p := smallest_factor@n;
    write p;
    factors@(n/p);
  }
};

4 Mark and Sweep

Now we want to actually implement automatic garbage collection for our SPL interpreter. The basic idea is to add the following to the Frame class:

  • A regular, non-static boolean field to indicate whether this Frame has been "marked".
  • A non-static method mark to mark this Frame and anything this Frame points to.
  • A static method sweep to delete and remove from the list any un-marked Frames.
  • The ultimate goal (the exercise is listed below) is to have automatic mark-and-sweep garbage collection happen after every statement is executed by our interpreter. Once this is working, the number of Frames in memory after every step should shrink down to some very small number every time. You can proceed however you like; however, I recommend the following steps:

    1. Add a field (NOT static) of type bool to the Frame class to indicate whether this frame has been marked. Add a line to the constructor to initialize this field to false.
    2. Add a class method (also NOT static) called mark that performs the marking part of mark-and-sweep, starting with this Frame. If the this frame is already marked, then just return without doing anymore work. Otherwise, you want to set the boolean field to true, and make a recursive call on the parent frame (if it isn't null) and on the env Frame of any Closure that is in the bindings of this Frame — again, if that environment Frame is not null. To do this, you will need to loop through all the key-value pairs in bindings. Here's one way to do that:
      for (auto& item : bindings) {
        /* item is a reference to a pair<string,Value>.
         * So item.first is the name of the binding (a string)
         * and item.second is the Value itself.
         *
         * (By the way, did you know about the "auto" keyword in C++?
         * It asks the compiler to figure out the type for you. Nifty!)
         */
      }
    3. Finally, add a static class method sweep. This is going to go through all the Frame pointers in the list from step (i) and do one of two things:
      • If the Frame is marked, then just unmark it (set the bool from step (i) to false).
      • If the Frame is not marked, then call delete on it to de-allocate its memory, and remove its entry from the list.
      For this part you will have to iterate through the list. This will be a loop similar to the one for iterating though bindings, except that now *iter will be of type Frame* rather than some kind of pair. Then you want to use the erase method in the list class (again, see reference here) to remove list items as necessary. Careful: Your iterator will point to nonsense after you erase the corresponding entry. Luckily, the erase method returns an iterator to the next element in the list. But then you don't want to increment the iterator in this case!

    Exercises

    1. Add automatic mark-and-sweep garbage collection in the while loop of your main in spl.ypp, to occur after every statement is executed. This means calling mark on the global frame, and then sweep.
    Submit now and check the autotesting so far! Just run ./submitme.sh from your lab directory.

    5 Automatically-triggered GC

    Great! Now that you have garbage collection, you can run a series of fairly computationally-intensive operations like (using my example SPL function above) factors@65432 over and over again without ever running out of memory.

    But if you try to do it a few times in a block, or in a function, or loop, etc., the garbage collector never gets called and your program will run out of memory. Why? The problem is that garbage collection is currently only triggered from the top-level in spl.ypp. So now we want to improve on this by triggering garbage collection from within function calls, blocks, loops, and the like, whenever it is needed.

    The tricky part is what we called the "root set" in class. See, it's no longer sufficient just to mark all the Frames that are reachable from the global frame, because the frame that we're currently executing in also needs to be considered. So we have to keep track of all the currently active frames in some sort of stack, and mark everything reachable from any of these frames before sweeping.

    This is an optional exercise. Make sure you have completed AND SUBMITTED a working version of everything else before continuing. This will break your solution to exercise 3, so no skipping allowed!

    Here are a few hints on how you might proceed:

    • You need to store the "root set", the Frames that are currently in the function call stack and should definitely not be deleted yet. You should of course add the global frame to this set, but you also need to add Frames as they are created in Block::exec and Funcall::eval. These frames also NEED TO BE REMOVED when those methods return! (I used an STL deque to hold my root set, but you may do as you wish.)
    • You will need to make a new static method (or something like it) that calls mark for every Frame in the root set, before sweeping. But where should this method be called from? You need to be really careful here that everything important gets marked. I might recommend calling it at the same place where you remove a frame from the root set, since that's the essentially the time when new garbage is created.
    • Performing garbage collection too often is very wasteful, and creates a phenomenon known as "thrashing". Avoid this problem. (You will probably want some kind of counter to keep track of what has happened since the last time garbage was collected. You will have to set the threshold experimentally.)

    Exercises

    1. (OPTIONAL) Implement automatically-triggered garbage collection as described above.
      If you finish, do not submit normally. Instead, email your instructor who whould very much like to see and run your code!