Discussion Forums



Thread: Rsync, S3 and Backups

Welcome, Guest Help
Login Login


Permlink Replies: 12 - Pages: 1 - Last Post: May 25, 2008 4:57 AM by: addady
aceacermd

Posts: 31
Registered: 4/15/08
Rsync, S3 and Backups
Posted: May 15, 2008 5:46 AM PDT
  Click to reply to this thread Reply

Hi there,

I am doing backups using Rsync to my S3 account. I use s3fs to mount the bucket.

However, when I use S3fox to see my bucket, its shows it as empty with size 0 but I am sure the data is there coz I can browse to it from my instance.

Is there any other way apart from mounting the bucket in my instance for me to access this data? Has anybody experienced this before?

Thanks for your help.

AM


Eric Hammond
RealName(TM)


Posts: 1,134
Registered: 7/7/07
Re: Rsync, S3 and Backups
Posted: May 15, 2008 9:44 AM PDT   in response to: aceacermd
  Click to reply to this thread Reply

Yep, I have this same problem with S3fox. It should be fixable.

In s3fs when you create a file named, say, "bucket/directory/file.txt"it uploads one key to the bucket named"directory/file.txt"and it uploads another key to the bucket named"directory".

The second key representing the directory apparently confuses S3fox.  Where it would normally show you a "directory" link representing the keys starting with "directory/" it now has to show a key with that same name.

S3fox could be enhanced to show both the key "directory" and a link for the pseudo-directory named "directory".

Note: In S3fox you can manually add "directory/" to the bucket name in the text field at the top and it will show you the keys you expect.  It all works except for showing you the link.




Eric Hammond
RealName(TM)


Posts: 1,134
Registered: 7/7/07
Re: Rsync, S3 and Backups
Posted: May 15, 2008 10:36 AM PDT   in response to: aceacermd
  Click to reply to this thread Reply

Tip: When using rsync to s3fs, consider using the --inplace option.

Without this option, rsync will create a temporary file and then rename it to the real file name after it has copied it. Unfortunately, without key renaming in S3 this means that s3fs uploads temporaryname, then downloads it and uploads realname, doubling or tripling your time and bandwidth charges (depending on caching?).

--
Eric Hammond
http://www.anvilon.com



aceacermd

Posts: 31
Registered: 4/15/08
Re: Rsync, S3 and Backups
Posted: May 15, 2008 12:34 PM PDT   in response to: aceacermd
  Click to reply to this thread Reply

Thanks for the reply.

Is there any other easy way for me to see my data in S3 apart from S3fox or mounting it in my instances?

Thanks

AM


Eric Hammond
RealName(TM)


Posts: 1,134
Registered: 7/7/07
Re: Rsync, S3 and Backups
Posted: May 15, 2008 4:04 PM PDT   in response to: aceacermd
  Click to reply to this thread Reply

There are a number of command line tools and web sites that can show you the contents of S3 buckets.  You might check over on the S3 forum as this question is not specific to EC2.

  http://s3forum.notlong.com

The S3 browser in RightScale does a good job of browsing buckets created by s3fs.

  http://www.rightscale.com

(Dashboard / Manage / Storage / S3 Browser)



aceacermd

Posts: 31
Registered: 4/15/08
Re: Rsync, S3 and Backups
Posted: May 16, 2008 5:26 AM PDT   in response to: aceacermd
  Click to reply to this thread Reply

Have tried it with --inplace too but doesnt help. Rsync-s3fs-s3 are painfully slow. It took me a entire day to backup 1GB. Ok, it has many many small files but still, thats very slow.

Might be something with the switch time between each files. Any ideas?

Thanks

Adil


Eric Hammond
RealName(TM)


Posts: 1,134
Registered: 7/7/07
Re: Rsync, S3 and Backups
Posted: May 17, 2008 1:58 PM PDT   in response to: Eric Hammond
  Click to reply to this thread Reply

Earlier I wrote:
Tip: When using rsync to s3fs, consider using the --inplace option.

It would appear that Randy has been very busy improving s3fs and my information is several versions out of date.

Recent versions of s3fs are now doing things much more efficiently, so --inplace and the like should not be needed in rsync to s3fs mounted file systems.

Upgrading now... :)


addady

Posts: 29
Registered: 8/18/07
Re: Rsync, S3 and Backups
Posted: May 22, 2008 1:56 PM PDT   in response to: aceacermd
  Click to reply to this thread Reply

Form the rsync man page:
–inplace meaning that the rsync algorithm can’t extract the full
amount of network reduction it might otherwise.
This option is useful systems that are disk bound, not network bound.

This idea of mounting my S3 bucket as local disk drive and then use rsync will not work well.
It will not provide any bandwidth efficient algorithm and upload the whole new file, not just what was changed.

You can bypass this limitation using our rsync to s3 gateway.
For more information take a look at our new service http://www.s3rsync.com/

Regards,
Addady



Allen

Posts: 5,320
Registered: 3/19/07
Re: Rsync, S3 and Backups
Posted: May 22, 2008 2:03 PM PDT   in response to: addady
  Click to reply to this thread Reply

You could also launch an EC2 instance, mount an S3 based file system on the instance, use the instance as an rsync server, then terminate the instance when done.


Eric Hammond
RealName(TM)


Posts: 1,134
Registered: 7/7/07
Re: Rsync, S3 and Backups
Posted: May 22, 2008 4:57 PM PDT   in response to: addady
  Click to reply to this thread Reply

addady wrote:
[–inplace] is useful systems that are disk bound, not network bound.

In the case of an S3 bucket mounted with s3fs over a slow network, it is "disk bound" as far as rsync is concerned.  However, --inplace is no longer needed with the latest versions of s3fs anyway.  Your other points are consistent with my experience for large files whose contents change.

I use multiple approaches depending on the needs for backing up to S3 from outside EC2.  Here are a couple:

1. rsync to S3 bucket mounted with s3fs (optionally transparently encrypted with encfs).  I use this when I have a growing collection of large files where new files are added but the contents of old files change rarely, and I only want a single copy of the latest entire backup on S3.  Without encryption it's also useful for situations where it might be handy to be able to get access to the files with other S3 browsing tools like RightScale or S3fox or direct URLs.

2. rsync to (temporary) EC2 instance with S3 bucket mounted with ElasticDrive (optionally encrypted).  I use this when I want to back up a file system with rapidly changing files where I need to use rsync's fantastic ability to send just the modified part of the file over the network.  I use ElasticDrive because I require rsync's ability to use hard links to create multiple snapshots of a file system over time without multiplying storage needs for unchanged files (e.g., using dirvish) and a block device makes this transparent.

As cool as the technology behind PersistentFS seems, I'm hesitant to store long term backups using software which expires, but for other purposes it is worth testing.

It sounds like s3rsync.com is sort of a combination where you are providing the temporary EC2 instance in (2) above while saving the files directly in S3 buckets similar to (1) above.



addady

Posts: 29
Registered: 8/18/07
Re: Rsync, S3 and Backups
Posted: May 23, 2008 12:14 AM PDT   in response to: Eric Hammond
  Click to reply to this thread Reply


anvilon wrote:

2. rsync to (temporary) EC2 instance with S3 bucket mounted with ElasticDrive (optionally encrypted).  I use this when I want to back up a file system with rapidly changing files where I need to use rsync's fantastic ability to send just the modified part of the file over the network.  I use ElasticDrive because I require rsync's ability to use hard links to create multiple snapshots of a file system over time without multiplying storage needs for unchanged files (e.g., using dirvish) and a block device makes this transparent.

Using this technique limits you to sync at the minimum granularity of the file system block size.
The lower the block size is the better the synchronization level. But it comes with a price of slower performance and more costly Amazon PUT/POST request.
Our service allows you to really benefit from Rsync "--block-size" parameter with no additional cost.




As cool as the technology behind PersistentFS seems, I'm hesitant to store long term backups using software which expires, but for other purposes it is worth testing.

www.s3rsync.com save your data on your bucket in a standard GNU Tar format.





It sounds like s3rsync.com is sort of a combination where you are providing the temporary EC2 instance in (2) above while saving the files directly in S3 buckets similar to (1) above.

Our solution aims to wide variety customer types. Those who don't want/need/able to deal with EC2 and file system drivers in order to backup there data.
Some of our users are running Rsync on MS-Windows so they can't mount PersistentFS/ElasticDrive/s3fs
Others are running Rsync from limited NAS appliance.

Actually there is no special prerequisite on the client side beside Rsync, even Ssh is optional.

Addady

Eric Hammond
RealName(TM)


Posts: 1,134
Registered: 7/7/07
Re: Rsync, S3 and Backups
Posted: May 25, 2008 12:35 AM PDT   in response to: addady
  Click to reply to this thread Reply

addady:

When rsync'ing files from a local system through an EC2 instance to S3 there are two distinct outbound transfers:

A. From the local system to the EC2 instance.
B. From the EC2 instance to the S3 storage.

In the method (2) I described above, rsync uses its own optimized algorithm to transfer the minimal amount of changed data (compressed) from the local system to EC2 (A) and it only updates the blocks of the file which changed from EC2 to the S3 storage (B).

Your statements about s3sync seem confusing to me. It sounds like you are using rsync to go from the local system to the EC2 instance (A) but then it becomes unclear how the data goes from the EC2 instance to the S3 storage (B).

On the one hand, your site says that you can "access your files through [...] any S3 interface" which seems to imply that they are stored one file per bucket key.  On the other hand, you state that the files are saved in a "tar format" which implies that I could not get access to an individual file directly from S3 but perhaps might need to download the entire backup (ouch).  Then, you talk about how you don't upload the entire file when it changes which seems to imply that neither of the above are true but that you might be saving files in chunks on S3 (which would imply something resembling proprietary).

What method are you using to transfer files from the EC2 instance to S3?  What is the file format in the S3 buckets?




addady

Posts: 29
Registered: 8/18/07
Re: Rsync, S3 and Backups
Posted: May 25, 2008 4:57 AM PDT   in response to: Eric Hammond
  Click to reply to this thread Reply

Hi,

Sorry for the misunderstanding

anvilon wrote:
In the method (2) I described above, rsync uses its own optimized algorithm to transfer the minimal amount of changed data (compressed) from the local system to EC2 (A) and it only updates the blocks of the file which changed from EC2 to the S3 storage (B).

Right (A) is network efficient. The disadvantage of (B) is that you have to pay Amazon for all the PUT/POST/LIST/GET requests, is some cases it can be a lot of requests.  The other thing is that you are using proprietary format to store your data in S3 and maybe (not sure how ElasticDrive check your software licence) depend on 3rd party in other to access you data.


anvilon wrote:
Your statements about s3sync seem confusing tome. It sounds like you are using rsync to go from the local system tothe EC2 instance (A) but then it becomes unclear how the data goes fromthe EC2 instance to the S3 storage (B).

We move the data to your S3 bucket in GNU multi  tar format.


anvilon wrote:
On the one hand, your site says that you can "access your files through [...] any S3 interface" which seems to imply that they are stored one file per bucket key.  On the other hand, you state that the files are saved in a "tar format" which implies that I could not get access to an individual file directly from S3 but perhaps might need to download the entire backup (ouch).

You can restore single file or some files by using our service + Rsync client.
You can restore all your files using any S3 interface by downloading the entire tar files.


anvilon wrote:
which would imply something resembling proprietary.

There is no use proprietary format whatsoever.


Thanks
Addady




Point your RSS reader here for a feed of the latest messages in all forums