Personal computing discussed

Moderators: renee, SecretSquirrel, notfred

 
Jon
Gerbil Elite
Topic Author
Posts: 980
Joined: Sat Feb 14, 2004 7:44 pm
Location: -Alberta-

Sanity check! Rsync for dummies

Tue Jul 30, 2013 5:30 pm

Hi,

I'm a little rusty on this and I'm looking for a second opinion on the quality of my syntax.
Essentially what I'm trying to accomplish is have rsync copy files that end in .vma.gz and which have changed within the last 24 hours from a remote server to a local server.

I think I'm in complete error with the way 'find' works as I need it to actually find the files on the remote server and not the local server.

How does this look:
find /var/lib/vz/dump/ -mtime -1 -name \*.vma.gz -print0 | rsync -aviPhz --compress-level=9 --from0 --files-from=- / xxx.xxx.xxx.xxx:/var/lib/vz/dump/   


How can I fix this?
Image
-Playing shooters on a console is like doing brain surgery with an ice-cream scoop-
 
Elsoze
Gerbil
Posts: 34
Joined: Tue Jun 29, 2010 6:57 am

Re: Sanity check! Rsync for dummies

Tue Jul 30, 2013 6:41 pm

Why use find at all? What is the requirement for just the last 24 hours? rsync will sync any files and you can specify in the rsync syntax to overwrite existing files or not.

The way that looks right now you are only finding files on your local server.

Just one little (big) note about rsync... it gets very particular at times so beware of trailing slashes... that can make all the difference between a directory and a file. And it can copy over a directory as a file... not that I'd know or anything...... :o
 
just brew it!
Administrator
Posts: 54500
Joined: Tue Aug 20, 2002 10:51 pm
Location: Somewhere, having a beer

Re: Sanity check! Rsync for dummies

Tue Jul 30, 2013 6:46 pm

Yeah, the find is going to run locally; that's not what you want. Have you tried playing around with rsync's filtering rules?

Also, the --dry-run and --progress options are your friends. Good way to see what rsync thinks you want it to do without actually transferring anything.
Nostalgia isn't what it used to be.
 
chuckula
Minister of Gerbil Affairs
Posts: 2109
Joined: Wed Jan 23, 2008 9:18 pm
Location: Probably where I don't belong.

Re: Sanity check! Rsync for dummies

Tue Jul 30, 2013 7:05 pm

Jon wrote:
Hi,

I'm a little rusty on this and I'm looking for a second opinion on the quality of my syntax.
Essentially what I'm trying to accomplish is have rsync copy files that end in .vma.gz and which have changed within the last 24 hours from a remote server to a local server.

I think I'm in complete error with the way 'find' works as I need it to actually find the files on the remote server and not the local server.

How does this look:
find /var/lib/vz/dump/ -mtime -1 -name \*.vma.gz -print0 | rsync -aviPhz --compress-level=9 --from0 --files-from=- / xxx.xxx.xxx.xxx:/var/lib/vz/dump/   


How can I fix this?



That ain't gonna work if you are trying to "find" files on the remote side. Try logging in to the remote host and then run the 'find' command on the remote host. One option: Use ssh in command mode with find. For example:
ssh xxx.xxx.xxx.xxx find /var/lib/vz/dump/ -mtime -1 -name \*.vma.gz -print0 | rsync -aviPhz --compress-level=9 --from0 --files-from=- / xxx.xxx.xxx.xxx:/var/lib/vz/dump/  

This assumes "xxx.xxx.xxx.xxx" is the same remote host, once for running find, and a second time as the target of the rysnc. Note that you'll want to have a security certificate setup if typing in a password isn't how you want to login to the remote host.

Another point: As the other posters have mentioned, rsync's built-in file filtering may be enough for your purposes. I'd only recommend the remote-find command if rsync can't pull off the job since using remote-find requires you to make more assumptions about the installed software on the remote system (find is pretty generic though, but sometimes the remote box isn't a fully-featured PC).
4770K @ 4.7 GHz; 32GB DDR3-2133; Officially RX-560... that's right AMD you shills!; 512GB 840 Pro (2x); Fractal Define XL-R2; NZXT Kraken-X60
--Many thanks to the TR Forum for advice in getting it built.
 
Jon
Gerbil Elite
Topic Author
Posts: 980
Joined: Sat Feb 14, 2004 7:44 pm
Location: -Alberta-

Re: Sanity check! Rsync for dummies

Wed Jul 31, 2013 10:02 am

just brew it! wrote:
Yeah, the find is going to run locally; that's not what you want. Have you tried playing around with rsync's filtering rules?

Also, the --dry-run and --progress options are your friends. Good way to see what rsync thinks you want it to do without actually transferring anything.


As far as I can understand rsyncs filtering rules don't have a way of filtering files based on their timestamp so that would rule that out. I have about 500GB of files in a remote directory and of that I need to keep only about 150GB of it offsite. 30GB of new files are created every day and kept for 14 days until they expire when they are deleted by the system automatically. So once a week I need to grab the files that were created within the past 24 hours and rsync them to this offsite location.

chuckula wrote:
That ain't gonna work if you are trying to "find" files on the remote side. Try logging in to the remote host and then run the 'find' command on the remote host. One option: Use ssh in command mode with find. For example:

: Select all
    ssh xxx.xxx.xxx.xxx find /var/lib/vz/dump/ -mtime -1 -name \*.vma.gz -print0 | rsync -aviPhz --compress-level=9 --from0 --files-from=- / xxx.xxx.xxx.xxx:/var/lib/vz/dump/


I have keybased authentication setup but it keeps prompting me to enter the passphrase, which I am 100% sure I'm entering correctly. Would this even work, ssh -> find -> rsync?
Image
-Playing shooters on a console is like doing brain surgery with an ice-cream scoop-
 
just brew it!
Administrator
Posts: 54500
Joined: Tue Aug 20, 2002 10:51 pm
Location: Somewhere, having a beer

Re: Sanity check! Rsync for dummies

Wed Jul 31, 2013 10:33 am

OK, here's how I do stuff like this. The attached script will sync a remote directory tree to the local host, placing it in a subdirectory whose name is based on the current date. Any files which have not changed since the last time the backup was run are simply hard-linked to the old copies of the files with the same names, so only new/modified files are actually tranferred (and occupy additional disk space) on the local host.

The beauty of this approach is that it has the bandwidth and storage efficiency of an incremental backup, while also allowing you to view complete snapshots of the entire state of the backed up filesystem by date.

This script does not take care of nuking the old backups, but that's easy enough to do manually since they're stored in separate directory trees by date. Because of the way hard links work, the actual files get deleted when the last directory tree that references them is removed. Or you can modify this script, or create a cron job on the local host to do the cleanup of old backups automatically.

Replace path-to-remote-data-dir, remote-username, and remote-hostname in the script with the appropriate values. Any command line arguments specified when the script is launched are passed directly to rsync.

In case you're wondering, the nonsense with $PREVIOUS and $PREVIOUS2 is to ensure sane behavior if the script gets run twice on the same day.

#!/bin/bash
CURRENT=backups-`date +%Y%m%d`
PREVIOUS=`ls -d backups-???????? | tail -n1`
PREVIOUS2=`ls -d backups-???????? | tail -n2 | head -n1`
if [[ $PREVIOUS2 == $PREVIOUS ]]; then
    PREVIOUS2=
fi
if [[ $CURRENT != $PREVIOUS ]]; then
    LINKDIR=$PREVIOUS
else
    LINKDIR=$PREVIOUS2
fi
OPTS=""
if [[ $LINKDIR != "" ]]; then
    OPTS=--link-dest=../$LINKDIR
    echo "syncing to $CURRENT with links to $LINKDIR"
else
    echo "syncing to $CURRENT"
fi
rsync --archive --hard-links --numeric-ids --relative --rsync-path="cd /path-to-remote-data-dir && rsync" --progress $OPTS $* remote-username@remote-hostname:. $CURRENT
Nostalgia isn't what it used to be.
 
Jon
Gerbil Elite
Topic Author
Posts: 980
Joined: Sat Feb 14, 2004 7:44 pm
Location: -Alberta-

Re: Sanity check! Rsync for dummies

Wed Jul 31, 2013 10:46 am

Looks like it's worth a try. Before I do, how does this look:

!/bin/sh
find /var/lib/vz/dump/ -daystart -mtime +1 -name \*.vma.gz >EXCL_list
rsync -avPhz -e ssh --compress-level=9 --exclude-from=EXCL_list --delete / xxx.xxx.xxx.xxx:/var/lib/vz/dump/


I *think* this will achieve the same thing.....about to try it now.. Ugh doesn't work. The file that gets created by find is on the remote server and executing rsync locally it looks for that file locally.
Image
-Playing shooters on a console is like doing brain surgery with an ice-cream scoop-
 
Flatland_Spider
Graphmaster Gerbil
Posts: 1324
Joined: Mon Sep 13, 2004 8:33 pm

Re: Sanity check! Rsync for dummies

Wed Jul 31, 2013 11:02 am

Jon wrote:
As far as I can understand rsyncs filtering rules don't have a way of filtering files based on their timestamp so that would rule that out. I have about 500GB of files in a remote directory and of that I need to keep only about 150GB of it offsite. 30GB of new files are created every day and kept for 14 days until they expire when they are deleted by the system automatically. So once a week I need to grab the files that were created within the past 24 hours and rsync them to this offsite location.


Is rsync's delta capability going to work in this case? The delta capability only works if the files exists in the source and destination, plus one of the two has to be a remote system. The remote system part is covered, but unless the files are on both systems with only changes being transferred, rsync is nothing more then a cp clone.

I have keybased authentication setup but it keeps prompting me to enter the passphrase, which I am 100% sure I'm entering correctly. Would this even work, ssh -> find -> rsync?


Either there is a password on the key, or key authentication isn't working.

If there is a password on the key file, do you have ssh-agent running, and do you use ssh-add to add the keys to the ssh-agent cache? If ssh-agent doesn't have the password in its cache, ssh is going to ask you to unlock the key every time it's used.

--exclude-from=EXCL_list



"--include-from=file --exclude=*" is what you want. Exclude tells rsync to ignore those files.
 
just brew it!
Administrator
Posts: 54500
Joined: Tue Aug 20, 2002 10:51 pm
Location: Somewhere, having a beer

Re: Sanity check! Rsync for dummies

Wed Jul 31, 2013 11:05 am

A few additional comments on my remote incremental backup script...

The du command understands hard links. So if you do a du * in the directory where you're keeping the backups, the output will show you the disk space delta (amount of new data) for each day.

Another nifty strategy (probably not applicable to your specific use case, but good to know in the general case) is that you can prune old backups rather than just blowing all of them away past some date cutoff. So you can (say) keep daily snapshots going back x days, weekly ones going back y months, and so on. The hard links optimally manage the disk space for you, keeping the contents of only those files which are still referenced by at least one snapshot.

I've also thought about adding file-level de-duplication capabilities (i.e. files with different names/locations but identical contents get cross-linked instead of stored multiple times) as a post-processing step, but that's still pretty far down on my list of "things I ought to look into sometime". It should be pretty easy to do based on MD5 hashes, but will be rather I/O intensive for large backups since all files in the entire directory tree will need to be scanned to generate the hashes...
Nostalgia isn't what it used to be.
 
Jon
Gerbil Elite
Topic Author
Posts: 980
Joined: Sat Feb 14, 2004 7:44 pm
Location: -Alberta-

Re: Sanity check! Rsync for dummies

Wed Jul 31, 2013 12:24 pm

Flatland_Spider wrote:

Is rsync's delta capability going to work in this case? The delta capability only works if the files exists in the source and destination, plus one of the two has to be a remote system. The remote system part is covered, but unless the files are on both systems with only changes being transferred, rsync is nothing more then a cp clone.


That's right in this case rsync is acting like a glorified cp. The use of my commands needs to be re-evaluated it seems.

"--include-from=file --exclude=*" is what you want. Exclude tells rsync to ignore those files.


The files generated from that find command are actually the ones I want to exclude. This is what -mtime +1 means. Anything greater than 24 hours gets added to the list so I'm only copying files that are not on that list which are files that have been created with the last 24 hours.
Image
-Playing shooters on a console is like doing brain surgery with an ice-cream scoop-
 
Flatland_Spider
Graphmaster Gerbil
Posts: 1324
Joined: Mon Sep 13, 2004 8:33 pm

Re: Sanity check! Rsync for dummies

Wed Jul 31, 2013 1:27 pm

Jon wrote:
That's right in this case rsync is acting like a glorified cp. The use of my commands needs to be re-evaluated it seems.


Since you're not using delta capability try something like the command below. I haven't tested it, so they may blow up. You may need to the the -T option for tar.

Launched on Localhost:
ssh [email protected] "tar -zcf - `find /path/to/dir --daystart -mtime -1 -type f -name \*.vma.gz`" | tar -xzvf - -C /destination/path


Note: ` are not single quotes; they are backticks. It's on the same key as the tilde (~).

The files generated from that find command are actually the ones I want to exclude. This is what -mtime +1 means. Anything greater than 24 hours gets added to the list so I'm only copying files that are not on that list which are files that have been created with the last 24 hours.


I didn't realize you were building a list of files you didn't want. Different thought processes. I usually do the opposite.
 
Jon
Gerbil Elite
Topic Author
Posts: 980
Joined: Sat Feb 14, 2004 7:44 pm
Location: -Alberta-

Re: Sanity check! Rsync for dummies

Wed Jul 31, 2013 1:43 pm

Flatland_Spider wrote:

Launched on Localhost:
ssh [email protected] "tar -zcf - `find /path/to/dir --daystart -mtime -1 -type f -name \*.vma.gz`" | tar -xzvf - -C /destination/path


Interesting, I'll give it a try but I think one of the big benefits of using rsync is also compression using --compress

I didn't realize you were building a list of files you didn't want. Different thought processes. I usually do the opposite.


I tried the way you mentioned too, creating a list of files to include using "-mtime -1" to indicate include files newer than 24 hours and then using --include-from=/tmp/INCL_list --exclude=* but that doesn't seem to work. No files copy.

I've tried this but nothing transfers at all as indicated above:

!/bin/bash --
ssh xxx.xxx.xxx.xxx "find /var/lib/vz/dump/ -daystart -mtime -1 -name \*.vma.gz" > /tmp/INCL_list
rsync -avPhz -e ssh --compress-level=9 --include-from=/tmp/INCL_list --exclude=* --delete xxx.xxx.xxx.xxx:/var/lib/vz/dump/ /var/lib/vz/dump/


I've also tried this but it copies everything and ignores the "--exclude-from=" parameter:

!/bin/bash --
ssh xxx.xxx.xxx.xxx "find /var/lib/vz/dump/ -daystart -mtime +1 -name \*.vma.gz" > /tmp/EXCL_list
rsync -avPhz -e ssh --compress-level=9 --exclude-from=/tmp/EXCL_list --include=* --delete xxx.xxx.xxx.xxx:/var/lib/vz/dump/ /var/lib/vz/dump/
Image
-Playing shooters on a console is like doing brain surgery with an ice-cream scoop-
 
Jon
Gerbil Elite
Topic Author
Posts: 980
Joined: Sat Feb 14, 2004 7:44 pm
Location: -Alberta-

FIXED! Sanity check! Rsync for dummies

Wed Jul 31, 2013 8:55 pm

I got it working yay. This is what I did:

!/bin/bash --
ssh xxx.xxx.xxx.xxx "find /var/lib/vz/dump/ -daystart -mtime -1 -name \*.vma.gz" > /tmp/INCL_list_paths
sed 's/\/var\/lib\/vz\/dump\///' /tmp/INCL_list_paths >/tmp/INCL_list
rsync -avPhz -e ssh --compress-level=9 --include-from=/tmp/INCL_list --exclude='*' --delete xxx.xxx.xxx.xxx:/var/lib/vz/dump/ /var/lib/vz/dump/


Works perfectly now.
Image
-Playing shooters on a console is like doing brain surgery with an ice-cream scoop-
 
Flatland_Spider
Graphmaster Gerbil
Posts: 1324
Joined: Mon Sep 13, 2004 8:33 pm

Re: Sanity check! Rsync for dummies

Wed Jul 31, 2013 10:41 pm

Jon wrote:
Flatland_Spider wrote:

Launched on Localhost:
ssh [email protected] "tar -zcf - `find /path/to/dir --daystart -mtime -1 -type f -name \*.vma.gz`" | tar -xzvf - -C /destination/path


Interesting, I'll give it a try but I think one of the big benefits of using rsync is also compression using --compress


The -z tar option turns on gzip compression, and setting the GZIP environmental variable to -9 will enable maximum compression. This is equivalent to using the -z option and setting the compression level with rsync.
Example:
GZIP=-9 tar -zcf tarball.tar.gz /folder


Then there is xz which can be envoked with the -J option and setting the XZ_OPT to -9 has the same effect as with GZIP.

References:
How to specify level of compression when using tar -zcvf?
http://superuser.com/questions/305128/h ... g-tar-zcvf

How to XZ a directory with TAR using maximum compression?
http://unix.stackexchange.com/questions ... ompression
 
chuckula
Minister of Gerbil Affairs
Posts: 2109
Joined: Wed Jan 23, 2008 9:18 pm
Location: Probably where I don't belong.

Re: FIXED! Sanity check! Rsync for dummies

Thu Aug 01, 2013 10:32 am

Jon wrote:
I got it working yay. This is what I did:

!/bin/bash --
ssh xxx.xxx.xxx.xxx "find /var/lib/vz/dump/ -daystart -mtime -1 -name \*.vma.gz" > /tmp/INCL_list_paths
sed 's/\/var\/lib\/vz\/dump\///' /tmp/INCL_list_paths >/tmp/INCL_list
rsync -avPhz -e ssh --compress-level=9 --include-from=/tmp/INCL_list --exclude='*' --delete xxx.xxx.xxx.xxx:/var/lib/vz/dump/ /var/lib/vz/dump/


Works perfectly now.


Glad to see you were able to get it working! Sorry for not being able to help with the intermediate sed part though.
4770K @ 4.7 GHz; 32GB DDR3-2133; Officially RX-560... that's right AMD you shills!; 512GB 840 Pro (2x); Fractal Define XL-R2; NZXT Kraken-X60
--Many thanks to the TR Forum for advice in getting it built.

Who is online

Users browsing this forum: No registered users and 1 guest
GZIP: On