Rsync¶
Rsync is the best way to transfer files between hosts or even within the same system. It has features like:
If a connection is broken, it will pick up where it left off.
It will synchronize a directory, only copying over files that are newer. If configured to do so, it can also clean up files on the remote that have been deleted locally
Can compress data over the wire, resulting in faster transfers
Verifies if the file was correctly transferred
More performant, flexible, and configurable than alternatives like scp or cp
Rsync is installed by default on most Mac and Linux machines. It works over SSH so it’s secure, and has more features than scp and even cp. Use it any time you’re transferring between local and remote, and use it any time you’re copying large directories locally.
Typical usage is:
rsync -av --progress from/ user@hostname:/to/
See explainshell
for this command to read more on what those arguments do, but this effectively
recursively copies everything from the from/
directory, preserving group,
owner, permissions, and times, and shows progress while it’s running.
One tricky thing about rsync is that it is sensitive to trailing slashes (/) on the source directory. A trailing slash on the destination, however, has no effect.
To demonstrate, imagine we have a directory, source
, with one file in it:
source/
└── a.txt
Then we run these commands to transfer to the dest
directory:
rsync -r source dest
When we check the contents of dest
, we see that no trailing slash on
source
put the whole directory inside the destination:
dest/
└── source/
└── a.txt
If we instead add a trailing slash in the command:
rsync -r source/ dest
# ^
# |
# Trailing slash
Then the contents of source
get copied inside dest
:
dest/
└── a.txt
Other useful options¶
Sometimes you want to copy the files that symlinks point to, rather than the symlink. In such a case, use the
-L
argument to follow symlinks.If you have large uncompressed files or a slow connection, use
-z
to compress over the wire. Rsync will automatically decompress on the remote. This sends less data over the network at the cost of CPU to compress/decompress. It uses zlib compression, so if you have lots of data in gzip or tar.gz files, using-z
will try to compress already-compressed data, wasting CPU.Use
--partial
(or-P
, which implies both--partial
and--progress
) which will keep partially-transferred files on the remote. This is especially good for large files, so rsync can pick up where it left off on the same file.More recent versions of rsync support the
--info=progress2
argument, which reports the transfer speeds for the entire transfer rather than for each file. This can give you a better idea of the real transfer speed.Use
--exclude
to avoid transferring matching patterns. Useful to avoid copying conda environments, for example.
There are many more options for rsync, so as usual, read the manual (e.g.,
man rsync
)!