Our PL reading group read the paper “μKanren: A Minimal Functional Core for Relational Programming” this week. It presents a minimalist logic programming language in Scheme in 39 lines of code. Since none of us are really Schemers, a bunch of us quickly set about porting the code to our personal pet languages. Chris Martens produced this SML version. I hacked up a version in Haskell.
The most interesting part about this was the mistake I made in the initial version. To deal with recursion and potentially infinite search trees, the Scheme version allows some laziness; streams of results can be functions that delay search until forced; when a Scheme μKanren program wants to create a recursive relation it needs wrap the recursive call in a dummy function (and plumb through the input state); the Scheme version wraps this in a macro called
Zzz to make doing it more palatable. I originally thought that all of this could be dispensed with in Haskell; since Haskell is lazy, no special work needs to be done to prevent self reference from causing an infinite loop. It served an important secondary purpose, though: providing a way to detect recursion so that we can switch which branch of the tree we are exploring. Without this, although the fives test below works, the fivesRev test infinite loops without producing anything.
The initial version was also more generalized. The type signatures allowed for operating over any
MonadPlus, thus allowing pluggable search strategies.
KList was just a newtype wrapper around lists. When I had to add
delay I could have defined a new
MonadPlusDelay typeclass and parametrized over that, but it didn’t’ seem worthwhile.
A mildly golfed version that drops blank lines, type annotations, comments, aliases, and test code clocks in at 33 lines.
I have to build LLVM and Clang a lot for my research. Clang/LLVM is quite large and takes a long time to build if I don’t use -j8 or so to parallelize the build; but I also quickly discovered that parallelizing the build didn’t work either! I work on a laptop with 8gb of RAM and while this can easily handle 8 parallel compiles, 8 parallel links plus Firefox and Emacs and everything else is a one way ticket to swap town.
So I set about finding a way to parallelize the compiles but not the links. Here I am focusing on building an existing project. There are probably nicer ways that someone writing the Makefile could use to make this easier for people or the default, but I haven’t really thought about that.
My first attempt was the hacky
(while ! pgrep ld.bfd.real; do sleep 1; done; killall make ld.bfd.real) & make -j8; sleep 2; make. Here we wait until a linker has run, kill make, then rerun make without parallel execution. I expanded this into a more general script:
This approach is kind of terrible. It’s really hacky, it has a concurrency bug (that I would fix if the whole thing wasn’t already so bad), and it slows things down way more than necessary; as soon as one link has started, nothing more is done in parallel.
A better approach is by using locking to make sure only one link command can run at a time. There is a handy command,
flock, that does just that: it uses a file link to serialize execution of a command. We can just replace the Makefile’s linker command with a command that calls
flock and everything will sort itself out. Unfortunately there is no totally standard way for Makefiles to represent how they do linking, so some Makefile source diving becomes necessary. (Many use $(LD); LLVM does not.) With LLVM, the following works:
make -j8 'Link=flock /tmp/llvm-build $(Compile.Wrapper) $(CXX) $(CXXFLAGS) $(LD.Flags) $(LDFLAGS) $(TargetCommonOpts) $(Strip)'
That’s kind of nasty, and we can do a bit better. Many projects use $(CC) and/or $(CXX) as their underlying linking command; if we override that with something that uses flock then we’ll wind up serializing compiles as well as links. My hacky solution was to write a wrapper script that scans its arguments for “-c”; if it finds a “-c” it assumes it is a compile, otherwise it assumes it is a link and uses locking. We can then build LLVM with:
make -j8 'CXX=lock-linking /tmp/llvm-build-lock clang++'.
Is there a better way to do this sort of thing? Continue reading
I have something of an unfortunate fondness for indefensible hacks.
Like I discussed in my last post, RCU is a synchronization mechanism that excels at protecting read mostly data. It is a particularly useful technique in operating system kernels because full control of the scheduler permits many fairly simple and very efficient implementations of RCU.
In userspace, the situation is trickier, but still manageable. Mathieu Desnoyers and Paul E. McKenney have built a Userspace RCU library that contains a number of different implementations of userspace RCU. For reasons I won’t get into, efficient read side performance in userspace seems to depend on having a way for a writer to force all of the reader threads to issue a memory barrier. The URCU library has one version that does this using standard primitives: it sends signals to all other threads; in their signal handlers the other threads issue barriers and indicate so; the caller waits until every thread has done so. This is very heavyweight and inefficient because it requires running all of the threads in the process, even those that aren’t currently executing! Any thread that isn’t scheduled now has no reason to execute a barrier: it will execute one as part of getting rescheduled. Mathieu Desnoyers attempted to address this by adding a membarrier() system call to Linux that would force barriers in all other running threads in the process; after more than a dozen posted patches to LKML and a lot of back and forth, it got silently dropped.
While pondering this dilemma I thought of another way to force other threads to issue a barrier: by modifying the page table in a way that would force an invalidation of the Translation Lookaside Buffer (TLB) that caches page table entries! This can be done pretty easily with mprotect or munmap.
Full details in the patch commit message.