Sanity check! Rsync for dummies

Where Penguins and Daemons chill together in the warmth of the Sun.

Moderators: SecretSquirrel, notfred

Sanity check! Rsync for dummies

Postposted on Tue Jul 30, 2013 5:30 pm

Hi,

I'm a little rusty on this and I'm looking for a second opinion on the quality of my syntax.
Essentially what I'm trying to accomplish is have rsync copy files that end in .vma.gz and which have changed within the last 24 hours from a remote server to a local server.

I think I'm in complete error with the way 'find' works as I need it to actually find the files on the remote server and not the local server.

How does this look:
Code: Select all
find /var/lib/vz/dump/ -mtime -1 -name \*.vma.gz -print0 | rsync -aviPhz --compress-level=9 --from0 --files-from=- / xxx.xxx.xxx.xxx:/var/lib/vz/dump/   


How can I fix this?
-Playing shooters on a console is like doing brain surgery with an ice-cream scoop-
Jon
Gerbil Elite
 
Posts: 964
Joined: Sat Feb 14, 2004 7:44 pm
Location: -Canada-

Re: Sanity check! Rsync for dummies

Postposted on Tue Jul 30, 2013 6:41 pm

Why use find at all? What is the requirement for just the last 24 hours? rsync will sync any files and you can specify in the rsync syntax to overwrite existing files or not.

The way that looks right now you are only finding files on your local server.

Just one little (big) note about rsync... it gets very particular at times so beware of trailing slashes... that can make all the difference between a directory and a file. And it can copy over a directory as a file... not that I'd know or anything...... :o
Elsoze
Gerbil
 
Posts: 29
Joined: Tue Jun 29, 2010 6:57 am

Re: Sanity check! Rsync for dummies

Postposted on Tue Jul 30, 2013 6:46 pm

Yeah, the find is going to run locally; that's not what you want. Have you tried playing around with rsync's filtering rules?

Also, the --dry-run and --progress options are your friends. Good way to see what rsync thinks you want it to do without actually transferring anything.
(this space intentionally left blank)
just brew it!
Administrator
Gold subscriber
 
 
Posts: 37705
Joined: Tue Aug 20, 2002 10:51 pm
Location: Somewhere, having a beer

Re: Sanity check! Rsync for dummies

Postposted on Tue Jul 30, 2013 7:05 pm

Jon wrote:Hi,

I'm a little rusty on this and I'm looking for a second opinion on the quality of my syntax.
Essentially what I'm trying to accomplish is have rsync copy files that end in .vma.gz and which have changed within the last 24 hours from a remote server to a local server.

I think I'm in complete error with the way 'find' works as I need it to actually find the files on the remote server and not the local server.

How does this look:
Code: Select all
find /var/lib/vz/dump/ -mtime -1 -name \*.vma.gz -print0 | rsync -aviPhz --compress-level=9 --from0 --files-from=- / xxx.xxx.xxx.xxx:/var/lib/vz/dump/   


How can I fix this?



That ain't gonna work if you are trying to "find" files on the remote side. Try logging in to the remote host and then run the 'find' command on the remote host. One option: Use ssh in command mode with find. For example:
Code: Select all
ssh xxx.xxx.xxx.xxx find /var/lib/vz/dump/ -mtime -1 -name \*.vma.gz -print0 | rsync -aviPhz --compress-level=9 --from0 --files-from=- / xxx.xxx.xxx.xxx:/var/lib/vz/dump/ 

This assumes "xxx.xxx.xxx.xxx" is the same remote host, once for running find, and a second time as the target of the rysnc. Note that you'll want to have a security certificate setup if typing in a password isn't how you want to login to the remote host.

Another point: As the other posters have mentioned, rsync's built-in file filtering may be enough for your purposes. I'd only recommend the remote-find command if rsync can't pull off the job since using remote-find requires you to make more assumptions about the installed software on the remote system (find is pretty generic though, but sometimes the remote box isn't a fully-featured PC).
4770K @ 4.7 GHz; 32GB DDR3-2133; GTX-770; 512GB 840 Pro (2x); Fractal Define XL-R2; NZXT Kraken-X60
--Many thanks to the TR Forum for advice in getting it built.
chuckula
Gerbil Elite
Gold subscriber
 
 
Posts: 562
Joined: Wed Jan 23, 2008 9:18 pm
Location: Probably where I don't belong.

Re: Sanity check! Rsync for dummies

Postposted on Wed Jul 31, 2013 10:02 am

just brew it! wrote:Yeah, the find is going to run locally; that's not what you want. Have you tried playing around with rsync's filtering rules?

Also, the --dry-run and --progress options are your friends. Good way to see what rsync thinks you want it to do without actually transferring anything.


As far as I can understand rsyncs filtering rules don't have a way of filtering files based on their timestamp so that would rule that out. I have about 500GB of files in a remote directory and of that I need to keep only about 150GB of it offsite. 30GB of new files are created every day and kept for 14 days until they expire when they are deleted by the system automatically. So once a week I need to grab the files that were created within the past 24 hours and rsync them to this offsite location.

chuckula wrote:That ain't gonna work if you are trying to "find" files on the remote side. Try logging in to the remote host and then run the 'find' command on the remote host. One option: Use ssh in command mode with find. For example:

Code: Select all
: Select all
    ssh xxx.xxx.xxx.xxx find /var/lib/vz/dump/ -mtime -1 -name \*.vma.gz -print0 | rsync -aviPhz --compress-level=9 --from0 --files-from=- / xxx.xxx.xxx.xxx:/var/lib/vz/dump/


I have keybased authentication setup but it keeps prompting me to enter the passphrase, which I am 100% sure I'm entering correctly. Would this even work, ssh -> find -> rsync?
-Playing shooters on a console is like doing brain surgery with an ice-cream scoop-
Jon
Gerbil Elite
 
Posts: 964
Joined: Sat Feb 14, 2004 7:44 pm
Location: -Canada-

Re: Sanity check! Rsync for dummies

Postposted on Wed Jul 31, 2013 10:33 am

OK, here's how I do stuff like this. The attached script will sync a remote directory tree to the local host, placing it in a subdirectory whose name is based on the current date. Any files which have not changed since the last time the backup was run are simply hard-linked to the old copies of the files with the same names, so only new/modified files are actually tranferred (and occupy additional disk space) on the local host.

The beauty of this approach is that it has the bandwidth and storage efficiency of an incremental backup, while also allowing you to view complete snapshots of the entire state of the backed up filesystem by date.

This script does not take care of nuking the old backups, but that's easy enough to do manually since they're stored in separate directory trees by date. Because of the way hard links work, the actual files get deleted when the last directory tree that references them is removed. Or you can modify this script, or create a cron job on the local host to do the cleanup of old backups automatically.

Replace path-to-remote-data-dir, remote-username, and remote-hostname in the script with the appropriate values. Any command line arguments specified when the script is launched are passed directly to rsync.

In case you're wondering, the nonsense with $PREVIOUS and $PREVIOUS2 is to ensure sane behavior if the script gets run twice on the same day.

Code: Select all
#!/bin/bash
CURRENT=backups-`date +%Y%m%d`
PREVIOUS=`ls -d backups-???????? | tail -n1`
PREVIOUS2=`ls -d backups-???????? | tail -n2 | head -n1`
if [[ $PREVIOUS2 == $PREVIOUS ]]; then
    PREVIOUS2=
fi
if [[ $CURRENT != $PREVIOUS ]]; then
    LINKDIR=$PREVIOUS
else
    LINKDIR=$PREVIOUS2
fi
OPTS=""
if [[ $LINKDIR != "" ]]; then
    OPTS=--link-dest=../$LINKDIR
    echo "syncing to $CURRENT with links to $LINKDIR"
else
    echo "syncing to $CURRENT"
fi
rsync --archive --hard-links --numeric-ids --relative --rsync-path="cd /path-to-remote-data-dir && rsync" --progress $OPTS $* remote-username@remote-hostname:. $CURRENT
(this space intentionally left blank)
just brew it!
Administrator
Gold subscriber
 
 
Posts: 37705
Joined: Tue Aug 20, 2002 10:51 pm
Location: Somewhere, having a beer

Re: Sanity check! Rsync for dummies

Postposted on Wed Jul 31, 2013 10:46 am

Looks like it's worth a try. Before I do, how does this look:

Code: Select all
!/bin/sh
find /var/lib/vz/dump/ -daystart -mtime +1 -name \*.vma.gz >EXCL_list
rsync -avPhz -e ssh --compress-level=9 --exclude-from=EXCL_list --delete / xxx.xxx.xxx.xxx:/var/lib/vz/dump/


I *think* this will achieve the same thing.....about to try it now.. Ugh doesn't work. The file that gets created by find is on the remote server and executing rsync locally it looks for that file locally.
-Playing shooters on a console is like doing brain surgery with an ice-cream scoop-
Jon
Gerbil Elite
 
Posts: 964
Joined: Sat Feb 14, 2004 7:44 pm
Location: -Canada-

Re: Sanity check! Rsync for dummies

Postposted on Wed Jul 31, 2013 11:02 am

Jon wrote:As far as I can understand rsyncs filtering rules don't have a way of filtering files based on their timestamp so that would rule that out. I have about 500GB of files in a remote directory and of that I need to keep only about 150GB of it offsite. 30GB of new files are created every day and kept for 14 days until they expire when they are deleted by the system automatically. So once a week I need to grab the files that were created within the past 24 hours and rsync them to this offsite location.


Is rsync's delta capability going to work in this case? The delta capability only works if the files exists in the source and destination, plus one of the two has to be a remote system. The remote system part is covered, but unless the files are on both systems with only changes being transferred, rsync is nothing more then a cp clone.

I have keybased authentication setup but it keeps prompting me to enter the passphrase, which I am 100% sure I'm entering correctly. Would this even work, ssh -> find -> rsync?


Either there is a password on the key, or key authentication isn't working.

If there is a password on the key file, do you have ssh-agent running, and do you use ssh-add to add the keys to the ssh-agent cache? If ssh-agent doesn't have the password in its cache, ssh is going to ask you to unlock the key every time it's used.

Code: Select all
--exclude-from=EXCL_list



"--include-from=file --exclude=*" is what you want. Exclude tells rsync to ignore those files.
Flatland_Spider
Gerbil Elite
 
Posts: 852
Joined: Mon Sep 13, 2004 8:33 pm
Location: The 918/539

Re: Sanity check! Rsync for dummies

Postposted on Wed Jul 31, 2013 11:05 am

A few additional comments on my remote incremental backup script...

The du command understands hard links. So if you do a du * in the directory where you're keeping the backups, the output will show you the disk space delta (amount of new data) for each day.

Another nifty strategy (probably not applicable to your specific use case, but good to know in the general case) is that you can prune old backups rather than just blowing all of them away past some date cutoff. So you can (say) keep daily snapshots going back x days, weekly ones going back y months, and so on. The hard links optimally manage the disk space for you, keeping the contents of only those files which are still referenced by at least one snapshot.

I've also thought about adding file-level de-duplication capabilities (i.e. files with different names/locations but identical contents get cross-linked instead of stored multiple times) as a post-processing step, but that's still pretty far down on my list of "things I ought to look into sometime". It should be pretty easy to do based on MD5 hashes, but will be rather I/O intensive for large backups since all files in the entire directory tree will need to be scanned to generate the hashes...
(this space intentionally left blank)
just brew it!
Administrator
Gold subscriber
 
 
Posts: 37705
Joined: Tue Aug 20, 2002 10:51 pm
Location: Somewhere, having a beer

Re: Sanity check! Rsync for dummies

Postposted on Wed Jul 31, 2013 12:24 pm

Flatland_Spider wrote:
Is rsync's delta capability going to work in this case? The delta capability only works if the files exists in the source and destination, plus one of the two has to be a remote system. The remote system part is covered, but unless the files are on both systems with only changes being transferred, rsync is nothing more then a cp clone.


That's right in this case rsync is acting like a glorified cp. The use of my commands needs to be re-evaluated it seems.

"--include-from=file --exclude=*" is what you want. Exclude tells rsync to ignore those files.


The files generated from that find command are actually the ones I want to exclude. This is what -mtime +1 means. Anything greater than 24 hours gets added to the list so I'm only copying files that are not on that list which are files that have been created with the last 24 hours.
-Playing shooters on a console is like doing brain surgery with an ice-cream scoop-
Jon
Gerbil Elite
 
Posts: 964
Joined: Sat Feb 14, 2004 7:44 pm
Location: -Canada-

Re: Sanity check! Rsync for dummies

Postposted on Wed Jul 31, 2013 1:27 pm

Jon wrote:That's right in this case rsync is acting like a glorified cp. The use of my commands needs to be re-evaluated it seems.


Since you're not using delta capability try something like the command below. I haven't tested it, so they may blow up. You may need to the the -T option for tar.

Code: Select all
Launched on Localhost:
ssh user@host.domain.tld "tar -zcf - `find /path/to/dir --daystart -mtime -1 -type f -name \*.vma.gz`" | tar -xzvf - -C /destination/path


Note: ` are not single quotes; they are backticks. It's on the same key as the tilde (~).

The files generated from that find command are actually the ones I want to exclude. This is what -mtime +1 means. Anything greater than 24 hours gets added to the list so I'm only copying files that are not on that list which are files that have been created with the last 24 hours.


I didn't realize you were building a list of files you didn't want. Different thought processes. I usually do the opposite.
Flatland_Spider
Gerbil Elite
 
Posts: 852
Joined: Mon Sep 13, 2004 8:33 pm
Location: The 918/539

Re: Sanity check! Rsync for dummies

Postposted on Wed Jul 31, 2013 1:43 pm

Flatland_Spider wrote:
Code: Select all
Launched on Localhost:
ssh user@host.domain.tld "tar -zcf - `find /path/to/dir --daystart -mtime -1 -type f -name \*.vma.gz`" | tar -xzvf - -C /destination/path


Interesting, I'll give it a try but I think one of the big benefits of using rsync is also compression using --compress

I didn't realize you were building a list of files you didn't want. Different thought processes. I usually do the opposite.


I tried the way you mentioned too, creating a list of files to include using "-mtime -1" to indicate include files newer than 24 hours and then using --include-from=/tmp/INCL_list --exclude=* but that doesn't seem to work. No files copy.

I've tried this but nothing transfers at all as indicated above:

Code: Select all
!/bin/bash --
ssh xxx.xxx.xxx.xxx "find /var/lib/vz/dump/ -daystart -mtime -1 -name \*.vma.gz" > /tmp/INCL_list
rsync -avPhz -e ssh --compress-level=9 --include-from=/tmp/INCL_list --exclude=* --delete xxx.xxx.xxx.xxx:/var/lib/vz/dump/ /var/lib/vz/dump/


I've also tried this but it copies everything and ignores the "--exclude-from=" parameter:

Code: Select all
!/bin/bash --
ssh xxx.xxx.xxx.xxx "find /var/lib/vz/dump/ -daystart -mtime +1 -name \*.vma.gz" > /tmp/EXCL_list
rsync -avPhz -e ssh --compress-level=9 --exclude-from=/tmp/EXCL_list --include=* --delete xxx.xxx.xxx.xxx:/var/lib/vz/dump/ /var/lib/vz/dump/
-Playing shooters on a console is like doing brain surgery with an ice-cream scoop-
Jon
Gerbil Elite
 
Posts: 964
Joined: Sat Feb 14, 2004 7:44 pm
Location: -Canada-

FIXED! Sanity check! Rsync for dummies

Postposted on Wed Jul 31, 2013 8:55 pm

I got it working yay. This is what I did:

Code: Select all
!/bin/bash --
ssh xxx.xxx.xxx.xxx "find /var/lib/vz/dump/ -daystart -mtime -1 -name \*.vma.gz" > /tmp/INCL_list_paths
sed 's/\/var\/lib\/vz\/dump\///' /tmp/INCL_list_paths >/tmp/INCL_list
rsync -avPhz -e ssh --compress-level=9 --include-from=/tmp/INCL_list --exclude='*' --delete xxx.xxx.xxx.xxx:/var/lib/vz/dump/ /var/lib/vz/dump/


Works perfectly now.
-Playing shooters on a console is like doing brain surgery with an ice-cream scoop-
Jon
Gerbil Elite
 
Posts: 964
Joined: Sat Feb 14, 2004 7:44 pm
Location: -Canada-

Re: Sanity check! Rsync for dummies

Postposted on Wed Jul 31, 2013 10:41 pm

Jon wrote:
Flatland_Spider wrote:
Code: Select all
Launched on Localhost:
ssh user@host.domain.tld "tar -zcf - `find /path/to/dir --daystart -mtime -1 -type f -name \*.vma.gz`" | tar -xzvf - -C /destination/path


Interesting, I'll give it a try but I think one of the big benefits of using rsync is also compression using --compress


The -z tar option turns on gzip compression, and setting the GZIP environmental variable to -9 will enable maximum compression. This is equivalent to using the -z option and setting the compression level with rsync.
Example:
Code: Select all
GZIP=-9 tar -zcf tarball.tar.gz /folder


Then there is xz which can be envoked with the -J option and setting the XZ_OPT to -9 has the same effect as with GZIP.

References:
How to specify level of compression when using tar -zcvf?
http://superuser.com/questions/305128/h ... g-tar-zcvf

How to XZ a directory with TAR using maximum compression?
http://unix.stackexchange.com/questions ... ompression
Flatland_Spider
Gerbil Elite
 
Posts: 852
Joined: Mon Sep 13, 2004 8:33 pm
Location: The 918/539

Re: FIXED! Sanity check! Rsync for dummies

Postposted on Thu Aug 01, 2013 10:32 am

Jon wrote:I got it working yay. This is what I did:

Code: Select all
!/bin/bash --
ssh xxx.xxx.xxx.xxx "find /var/lib/vz/dump/ -daystart -mtime -1 -name \*.vma.gz" > /tmp/INCL_list_paths
sed 's/\/var\/lib\/vz\/dump\///' /tmp/INCL_list_paths >/tmp/INCL_list
rsync -avPhz -e ssh --compress-level=9 --include-from=/tmp/INCL_list --exclude='*' --delete xxx.xxx.xxx.xxx:/var/lib/vz/dump/ /var/lib/vz/dump/


Works perfectly now.


Glad to see you were able to get it working! Sorry for not being able to help with the intermediate sed part though.
4770K @ 4.7 GHz; 32GB DDR3-2133; GTX-770; 512GB 840 Pro (2x); Fractal Define XL-R2; NZXT Kraken-X60
--Many thanks to the TR Forum for advice in getting it built.
chuckula
Gerbil Elite
Gold subscriber
 
 
Posts: 562
Joined: Wed Jan 23, 2008 9:18 pm
Location: Probably where I don't belong.


Return to Linux, Unix, and Assorted Madness

Who is online

Users browsing this forum: No registered users and 2 guests