Tuesday, June 30, 2009

DMTCP Checkpointing

Checkpointing is the ability to stop a process, save it, and restart it later.  As a user of HOL Light, checkpointing is nearly essential, since every time you start the system it evaluates all of mathematics from scratch.  Loading advanced libraries can take upwards of an hour.  Checkpointing is hard to get right though, and I haven't found a single program that works on the latest Linux kernel.  I started using DMTCP this weekend, and it works perfectly.  I highly recommend it.  The steps are simple.

1) Install DMTCP using a Debian package or from source.
2) dmtcp_checkpoint -n ocaml
3) #use "hol.ml";; (to start HOL Light)
4) dmtcp_command -c (from another terminal.  This checkpoints the process)
5) Kill the ocaml process.  Then do dmtcp_restart on the saved binary.

Works like a charm.