How do you synchronize two directories which are both changing? Rsync is ubiquitous and has been the staple of sysadmins for a long time. Unfortunately, it cannot differentiate between a file deletion on one node and a creation on another so two-way synchronisation is difficult if not impossible. Just search Google for it to see how many scripting solutions there are! Fortunately, there’s a solution: Unison.
Unison has many advantages over rsync (read the web page for details, I am not going to repeat them all here). Importantly it can tell the difference between a deletion on one node and a creation on another as it holds information about the state of the synchronised folders (AKA replicas) in a database file in your home directory. Unison also detects conflicts when a file has been updated in both locations and provides a mechanism for automatically or manually resolving these.
So, how do you use this whizzy piece of software? Well, the documentation is very good so again I won’t duplicate that. Use of Unison is very easy and really does just involve installing the RPMs (available in the EPEL repository) and then running the command according to the documentation. To make the commands simpler, you can set up profiles which contain all the configuration options you want to specify.
So, Unison is easy to install, easy to run and is well documented. “What’s the catch?” I hear you cry. Well there is one that for me was a show-stopper. I was trying to syncronise a file system that holds over a million files. As Unison needs to traverse the file system for changes, its performance starts to drop as the number of files increase. I noticed that the performance was starting to suffer at about 800,000 files. This might be partly due to my low-spec virtual machines that I was using for testing but other replication solutions I tested did not suffer in this way. I left Unison running over a weekend and found it still had not finished when I came in on Monday! This is no good for my purposes.
For anyone struggling with with Unison and lots of files, there is a way of running Unison in batches that speeds things up. Begin by creating an Unison profile (in this case ~/.unison/synctest.prf). It should have the following contents:
# Unison preferences file
# root of synchronisation
root = /unison
root = ssh://remote-host//unison
# don't prompt
batch = true
# diff command
diff = colordiff -u CURRENT2 CURRENT1
# ignore lost+found
ignore = Path */lost+found
Once the profile is set up, you can run Unison in a shell loop for each of the top-level directories in your replicas:
for topdir in <list of directories>; do
unison synctest -path $topdir
done
This took 2 hours rather than several days to synchronise the replicas. This was too slow for me as I would like my replicas to be only 30 minutes out so I had to rule out Unison for my use. You may find that it suits your situation perfectly though.
Happy syncing!