OK, here's how I do stuff like this. The attached script will sync a remote directory tree to the local host, placing it in a subdirectory whose name is based on the current date. Any files which have not changed since the last time the backup was run are simply hard-linked to the old copies of the files with the same names, so only new/modified files are actually tranferred (and occupy additional disk space) on the local host.
The beauty of this approach is that it has the bandwidth and storage efficiency of an incremental backup, while also allowing you to view complete snapshots of the entire state of the backed up filesystem by date.
This script does not take care of nuking the old backups, but that's easy enough to do manually since they're stored in separate directory trees by date. Because of the way hard links work, the actual files get deleted when the last directory tree that references them is removed. Or you can modify this script, or create a cron job on the local host to do the cleanup of old backups automatically.
Replace path-to-remote-data-dir, remote-username, and remote-hostname in the script with the appropriate values. Any command line arguments specified when the script is launched are passed directly to rsync.
In case you're wondering, the nonsense with $PREVIOUS and $PREVIOUS2 is to ensure sane behavior if the script gets run twice on the same day.
PREVIOUS=`ls -d backups-???????? | tail -n1`
PREVIOUS2=`ls -d backups-???????? | tail -n2 | head -n1`
if [[ $PREVIOUS2 == $PREVIOUS ]]; then
if [[ $CURRENT != $PREVIOUS ]]; then
if [[ $LINKDIR != "" ]]; then
echo "syncing to $CURRENT with links to $LINKDIR"
echo "syncing to $CURRENT"
rsync --archive --hard-links --numeric-ids --relative --rsync-path="cd /path-to-remote-data-dir && rsync" --progress $OPTS $* remote-username@remote-hostname:. $CURRENT
The years just pass like trains. I wave, but they don't slow down.
-- Steven Wilson