|
Discussion Forums
|
Thread: Rsync, S3 and Backups
|
|
|
Replies:
12
-
Pages:
1
-
Last Post:
May 25, 2008 4:57 AM
by: addady
|
|
|
Posts:
31
Registered:
4/15/08
|
|
|
|
Rsync, S3 and Backups
Posted:
May 15, 2008 5:46 AM PDT
|
|
|
Hi there,
I am doing backups using Rsync to my S3 account. I use s3fs to mount the bucket.
However, when I use S3fox to see my bucket, its shows it as empty with size 0 but I am sure the data is there coz I can browse to it from my instance.
Is there any other way apart from mounting the bucket in my instance for me to access this data? Has anybody experienced this before?
Thanks for your help.
AM
|
|
Posts:
1,134
Registered:
7/7/07
|
|
|
|
Re: Rsync, S3 and Backups
Posted:
May 15, 2008 9:44 AM PDT
in response to: aceacermd
|
|
|
Yep, I have this same problem with S3fox. It should be fixable.
In s3fs when you create a file named, say, "bucket/directory/file.txt"it uploads one key to the bucket named"directory/file.txt"and it uploads another key to the bucket named"directory".
The second key representing the directory apparently confuses S3fox. Where it would normally show you a "directory" link representing the keys starting with "directory/" it now has to show a key with that same name.
S3fox could be enhanced to show both the key "directory" and a link for the pseudo-directory named "directory".
Note: In S3fox you can manually add "directory/" to the bucket name in the text field at the top and it will show you the keys you expect. It all works except for showing you the link.
|
|
Posts:
1,134
Registered:
7/7/07
|
|
|
|
Re: Rsync, S3 and Backups
Posted:
May 15, 2008 10:36 AM PDT
in response to: aceacermd
|
|
|
Tip: When using rsync to s3fs, consider using the --inplace option.
Without this option, rsync will create a temporary file and then rename it to the real file name after it has copied it. Unfortunately, without key renaming in S3 this means that s3fs uploads temporaryname, then downloads it and uploads realname, doubling or tripling your time and bandwidth charges (depending on caching?).
--
Eric Hammond
http://www.anvilon.com
|
|
Posts:
31
Registered:
4/15/08
|
|
|
|
Re: Rsync, S3 and Backups
Posted:
May 15, 2008 12:34 PM PDT
in response to: aceacermd
|
|
|
Thanks for the reply.
Is there any other easy way for me to see my data in S3 apart from S3fox or mounting it in my instances?
Thanks
AM
|
|
Posts:
1,134
Registered:
7/7/07
|
|
|
|
Re: Rsync, S3 and Backups
Posted:
May 15, 2008 4:04 PM PDT
in response to: aceacermd
|
|
|
There are a number of command line tools and web sites that can show you the contents of S3 buckets. You might check over on the S3 forum as this question is not specific to EC2.
http://s3forum.notlong.com
The S3 browser in RightScale does a good job of browsing buckets created by s3fs.
http://www.rightscale.com
(Dashboard / Manage / Storage / S3 Browser)
|
|
Posts:
31
Registered:
4/15/08
|
|
|
|
Re: Rsync, S3 and Backups
Posted:
May 16, 2008 5:26 AM PDT
in response to: aceacermd
|
|
|
Have tried it with --inplace too but doesnt help. Rsync-s3fs-s3 are painfully slow. It took me a entire day to backup 1GB. Ok, it has many many small files but still, thats very slow.
Might be something with the switch time between each files. Any ideas?
Thanks
Adil
|
|
Posts:
1,134
Registered:
7/7/07
|
|
|
|
Re: Rsync, S3 and Backups
Posted:
May 17, 2008 1:58 PM PDT
in response to: Eric Hammond
|
|
|
Tip: When using rsync to s3fs, consider using the --inplace option.
It would appear that Randy has been very busy improving s3fs and my information is several versions out of date.
Recent versions of s3fs are now doing things much more efficiently, so --inplace and the like should not be needed in rsync to s3fs mounted file systems.
Upgrading now... :)
|
|
Posts:
29
Registered:
8/18/07
|
|
|
|
Re: Rsync, S3 and Backups
Posted:
May 22, 2008 1:56 PM PDT
in response to: aceacermd
|
|
|
Form the rsync man page:
–inplace meaning that the rsync algorithm can’t extract the full
amount of network reduction it might otherwise.
This option is useful systems that are disk bound, not network bound.
This idea of mounting my S3 bucket as local disk drive and then use rsync will not work well.
It will not provide any bandwidth efficient algorithm and upload the whole new file, not just what was changed.
You can bypass this limitation using our rsync to s3 gateway.
For more information take a look at our new service
http://www.s3rsync.com/
Regards,
Addady
|
|
Posts:
5,320
Registered:
3/19/07
|
|
|
|
Re: Rsync, S3 and Backups
Posted:
May 22, 2008 2:03 PM PDT
in response to: addady
|
|
|
You could also launch an EC2 instance, mount an S3 based file system on the instance, use the instance as an rsync server, then terminate the instance when done.
|
|
Posts:
1,134
Registered:
7/7/07
|
|
|
|
Re: Rsync, S3 and Backups
Posted:
May 22, 2008 4:57 PM PDT
in response to: addady
|
|
|
[–inplace] is useful systems that are disk bound, not network bound.
In the case of an S3 bucket mounted with s3fs over a slow network, it is "disk bound" as far as rsync is concerned. However, --inplace is no longer needed with the latest versions of s3fs anyway. Your other points are consistent with my experience for large files whose contents change.
I use multiple approaches depending on the needs for backing up to S3 from outside EC2. Here are a couple:
1. rsync to S3 bucket mounted with s3fs (optionally transparently encrypted with encfs). I use this when I have a growing collection of large files where new files are added but the contents of old files change rarely, and I only want a single copy of the latest entire backup on S3. Without encryption it's also useful for situations where it might be handy to be able to get access to the files with other S3 browsing tools like RightScale or S3fox or direct URLs.
2. rsync to (temporary) EC2 instance with S3 bucket mounted with ElasticDrive (optionally encrypted). I use this when I want to back up a file system with rapidly changing files where I need to use rsync's fantastic ability to send just the modified part of the file over the network. I use ElasticDrive because I require rsync's ability to use hard links to create multiple snapshots of a file system over time without multiplying storage needs for unchanged files (e.g., using dirvish) and a block device makes this transparent.
As cool as the technology behind PersistentFS seems, I'm hesitant to store long term backups using software which expires, but for other purposes it is worth testing.
It sounds like s3rsync.com is sort of a combination where you are providing the temporary EC2 instance in (2) above while saving the files directly in S3 buckets similar to (1) above.
|
|
Posts:
29
Registered:
8/18/07
|
|
|
|
Re: Rsync, S3 and Backups
Posted:
May 23, 2008 12:14 AM PDT
in response to: Eric Hammond
|
|
|
2. rsync to (temporary) EC2 instance with S3 bucket mounted with ElasticDrive (optionally encrypted). I use this when I want to back up a file system with rapidly changing files where I need to use rsync's fantastic ability to send just the modified part of the file over the network. I use ElasticDrive because I require rsync's ability to use hard links to create multiple snapshots of a file system over time without multiplying storage needs for unchanged files (e.g., using dirvish) and a block device makes this transparent.
Using this technique limits you to sync at the minimum granularity of the file system block size.
The lower the block size is the better the synchronization level. But it comes with a price of slower performance and more costly Amazon PUT/POST request.
Our service allows you to really benefit from Rsync "--block-size" parameter with no additional cost.
As cool as the technology behind PersistentFS seems, I'm hesitant to store long term backups using software which expires, but for other purposes it is worth testing.
www.s3rsync.com
save your data on your bucket in a standard GNU Tar format.
It sounds like s3rsync.com is sort of a combination where you are providing the temporary EC2 instance in (2) above while saving the files directly in S3 buckets similar to (1) above.
Our solution aims to wide variety customer types. Those who don't want/need/able to deal with EC2 and file system drivers in order to backup there data.
Some of our users are running Rsync on MS-Windows so they can't mount PersistentFS/ElasticDrive/s3fs
Others are running Rsync from limited NAS appliance.
Actually there is no special prerequisite on the client side beside Rsync, even Ssh is optional.
Addady
|
|
Posts:
1,134
Registered:
7/7/07
|
|
|
|
Re: Rsync, S3 and Backups
Posted:
May 25, 2008 12:35 AM PDT
in response to: addady
|
|
|
addady:
When rsync'ing files from a local system through an EC2 instance to S3 there are two distinct outbound transfers:
A. From the local system to the EC2 instance.
B. From the EC2 instance to the S3 storage.
In the method (2) I described above, rsync uses its own optimized algorithm to transfer the minimal amount of changed data (compressed) from the local system to EC2 (A) and it only updates the blocks of the file which changed from EC2 to the S3 storage (B).
Your statements about s3sync seem confusing to me. It sounds like you are using rsync to go from the local system to the EC2 instance (A) but then it becomes unclear how the data goes from the EC2 instance to the S3 storage (B).
On the one hand, your site says that you can "access your files through [...] any S3 interface" which seems to imply that they are stored one file per bucket key. On the other hand, you state that the files are saved in a "tar format" which implies that I could not get access to an individual file directly from S3 but perhaps might need to download the entire backup (ouch). Then, you talk about how you don't upload the entire file when it changes which seems to imply that neither of the above are true but that you might be saving files in chunks on S3 (which would imply something resembling proprietary).
What method are you using to transfer files from the EC2 instance to S3? What is the file format in the S3 buckets?
|
|
Posts:
29
Registered:
8/18/07
|
|
|
|
Re: Rsync, S3 and Backups
Posted:
May 25, 2008 4:57 AM PDT
in response to: Eric Hammond
|
|
|
Hi,
Sorry for the misunderstanding
In the method (2) I described above, rsync uses its own optimized algorithm to transfer the minimal amount of changed data (compressed) from the local system to EC2 (A) and it only updates the blocks of the file which changed from EC2 to the S3 storage (B).
Right (A) is network efficient. The disadvantage of (B) is that you have to pay Amazon for all the PUT/POST/LIST/GET requests, is some cases it can be a lot of requests. The other thing is that you are using proprietary format to store your data in S3 and maybe (not sure how ElasticDrive check your software licence) depend on 3rd party in other to access you data.
Your statements about s3sync seem confusing tome. It sounds like you are using rsync to go from the local system tothe EC2 instance (A) but then it becomes unclear how the data goes fromthe EC2 instance to the S3 storage (B).
We move the data to your S3 bucket in GNU multi tar format.
On the one hand, your site says that you can "access your files through [...] any S3 interface" which seems to imply that they are stored one file per bucket key. On the other hand, you state that the files are saved in a "tar format" which implies that I could not get access to an individual file directly from S3 but perhaps might need to download the entire backup (ouch).
You can restore single file or some files by using our service + Rsync client.
You can restore all your files using any S3 interface by downloading the entire tar files.
which would imply something resembling proprietary.
There is no use proprietary format whatsoever.
Thanks
Addady
|
|
|
|