I just got through writing up my initial notes for a concurrent, real-time garbage collector for the parallel systems programming language I'm designing. (I've got a well-thumbed copy of Richard Jones's and Rafael Lins's book in front of me while I do this.) This is a short note to describe where I'm going.
My collector is a variant of the Leroy-Doligez-Gonthier collector for Caml Light, but I've got a sneaky trick up my sleeve that stems from the fact that my language is implicitly parallel. Here's the brief version of my notes...
The minor heap is reserved for immutable data, just like in LDG. I'm not sure yet whether I'll even allow mutable data in the major heap. (There's a part of me that wants to quarantine all the mutable data off in the foreign function interface, but I'm not sure I want to be that much of a jack-booted
fascist yet.) The other evil thing I'm planning to do, at least for the virtual machine prototype implementation, is to use the
Cheney On The M.T.A. strategy for allocating the minor heap directly on the C-language call stack.
None of that is really new.
Here's what I think might be new: a scheduling kernel in the runtime environment for the language that contains an arbiter for shifting pending work queue items from busy cores to available ones. This shifting is a special case of copying from the minor heap into the major heap. The scan can be done in parallel by one or more competing execution contexts, even while the engine for the original work queue is collecting its own minor heap. Because the C stack duals as the minor heap, this will be trivially easy to do.
My only worry is about how badly I will blow up the cache with all the calls to OSAtomicTestAndSet I'll be making. (No, I don't give a rat's ass about any platforms other than Mac OS X right now. Thanks for asking.) Hey, it's just a prototype though— the whole point is to figure out where the performance bottlenecks are located, then lean into them hard with the power tools. This will be fun.