|
Discussion Forums
|
Thread: How do you Mount S3 on an External Linux Server?
 |
This question is not answered.
Helpful answers available: 2.
Correct answers available: 1.
|
|
|
|
Replies:
21
-
Pages:
2
[
1
2
| Next
]
-
Last Post:
Feb 16, 2010 11:02 AM
by: brianpoissant
|
|
|
Posts:
5
Registered:
6/15/07
|
|
|
|
How do you Mount S3 on an External Linux Server?
Posted:
Aug 10, 2007 1:21 PM PDT
|
|
|
For servers external to Amazon (e.g., Linux Fedora Core 6 or RHEL 5 servers), I'd like to mount an S3 filesystem so that existing web applications can, without change, use S3 to read and write files.
I've gone through various posts, but I don't see any clear concensus. Nor have I found any read/write performance analysis. Can one, in fact, use S3 transparently using some kind of mount and have performance that website visitors will find acceptable?
What is your best recommendation? Any details sufficient to get this up and running are greatly appreciated!
|
|
Posts:
2,842
Registered:
5/25/06
|
|
|
|
Re: How do you Mount S3 on an External Linux Server?
Posted:
Aug 10, 2007 1:42 PM PDT
in response to: Steve Amerige
|
|
|
Steve,
The thing is, inherent in a design that uses S3 as a file system, there will be performance penalties on certain cases. Smart caching can minimize how often the underlying filesystem needs to hit S3, but that probably won't always help.
Plus, with a system outside of Amazon, you'll certainly pay the bandwidth charges.
Can you do a simple configuration change to provide a different path to your resource files? That way you could store a bunch of static files on S3 and simply serve them form there. I know it isn't the same, but it might work better for those files
David
|
|
Posts:
2,112
Registered:
3/15/06
|
|
|
|
Re: How do you Mount S3 on an External Linux Server?
Posted:
Aug 10, 2007 2:28 PM PDT
in response to: D. Kavanagh
|
|
|
I just find it difficult to accept that a layer of software sitting on top of S3, trying to act like a filesystem, is ultimately all that useful.
The thing I like about S3 is that I can PUT a file there and if I get a 200 response I feel very confident that the file is safely stored. It's a nice, atomic operation. Transactional, really. And from that moment on I can forget about that file because I have confidence in the durability and integrity of S3.
Now, I introduce a layer of software on top of this incredibly reliable service which is anything but transactional, where you are never sure if the file you have just written to your "file system" has actually been pushed out to S3 or not. And suddenly I've reduced the overall reliability of my system to the inherent reliability of a file system. Is that an improvement? It doesn't seem like it to me. But, judging from the interest on the forum in these types of solutions I must be in the minority.
Mitch
|
|
Posts:
5
Registered:
6/15/07
|
|
|
|
Re: How do you Mount S3 on an External Linux Server?
Posted:
Aug 10, 2007 3:25 PM PDT
in response to: D. Kavanagh
|
|
|
Hi David,
> Can you do a simple configuration change to provide a different path to your resource files?
The majority use case for this is to have web services code on the local server's disk, but that code sometimes references data stored on S3. Well, "sometimes" might be quite often, actually. Anyway, if third-party code is involved for which there is no possibility of modification, and if that third-party code accesses files using filesystem paths, then I'd want a solution that allows that third-party code to read/write from that path, not knowing whether the files are local or remote... just like nfs.
And, even if the code isn't third-party software, the work to change code to be able to use REST APIs for reading/writing of data might not be an acceptable use of resources. So, for better or worse, there are business reasons to look for the ability to "nfs mount" an S3 filesystem.
But, I don't have the answer yet: what software do I need to add and configure, etc., to our Linux servers to enable S3 access via unix paths? And, what are the performance characteristics? (If I can get the former, I can figure out the latter :-)
Thanks!
|
|
Posts:
2,842
Registered:
5/25/06
|
|
|
|
Re: How do you Mount S3 on an External Linux Server?
Posted:
Aug 10, 2007 4:20 PM PDT
in response to: Steve Amerige
|
|
|
I haven't used any personally, but a bunch of people seem to be using
http://www.jungledisk.com/
David
|
|
Posts:
183
Registered:
12/20/06
|
|
|
|
Re: How do you Mount S3 on an External Linux Server?
Posted:
Aug 10, 2007 9:20 PM PDT
in response to: M. Garnaat
|
|
|
I just find it difficult to accept that a layer of software sitting on top of S3, trying to act like a filesystem, is ultimately all that useful.
The thing I like about S3 is that I can PUT a file there and if I get a 200 response I feel very confident that the file is safely stored. It's a nice, atomic operation. Transactional, really. And from that moment on I can forget about that file because I have confidence in the durability and integrity of S3.
Now, I introduce a layer of software on top of this incredibly reliable service which is anything but transactional, where you are never sure if the file you have just written to your "file system" has actually been pushed out to S3 or not. And suddenly I've reduced the overall reliability of my system to the inherent reliability of a file system. Is that an improvement? It doesn't seem like it to me. But, judging from the interest on the forum in these types of solutions I must be in the minority.
Mitch
However, reliability can be viewed from another perspective. The simple atomic operation of S3 is nice and comforting(and I have my little lua script that can do this GET/PUT things which is pretty general purpose and can be plugged into any larger apps).
But when I need something that goes beyond this atomic level access(let's say some form of FS like behaviour), I am left with two choices:
1. redo the FS stuff all from scratch Or
2. put a FS/block layer on top of S3
My experience said that (1) is actually more error prone because of the complexity than (2) so (2) ends up to be more reliable, as a whole.
An example:
why should I write the rsync stuff all again just because the storage is not file but S3 object ?
All software needs time and/or active use to be refined and having bugs spotted and fixed.
If I am developing new apps all from scratch(which needs only key pair access), I agree with your point but for many cases(and especially in *nix), it is more appropriate to chain together existing try and true bags of tools and most of them expect some form of FS to operate on.
The analogy is like, why don't I treat /dev/hd1 as a block device and do all my storage operation using that protocol but use ext3 instead ? Because I need the features of ext3 and re-implementing them would only reduce the reliability.
|
|
Posts:
444
Registered:
9/3/06
|
|
|
|
Re: How do you Mount S3 on an External Linux Server?
Posted:
Aug 11, 2007 7:26 PM PDT
in response to: bonobo2000
|
|
|
Try
http://www.elasticdrive.com , ElasticDrive is a network block device based upon the Amazon S3 (SimpleStorage Service). ElasticDrive provides a caching block device driver whichpushes blocks to and from S3 as if they were being written to a local blockdevice.
Reuven
http://www.enomalylabs.com
|
|
Posts:
555
Registered:
8/26/06
|
|
|
|
Re: How do you Mount S3 on an External Linux Server?
Posted:
Aug 12, 2007 5:52 AM PDT
in response to: bonobo2000
|
|
|
"The analogy is like, why don't I treat /dev/hd1 as a block device and
do all my storage operation using that protocol but use ext3 instead ?
Because I need the features of ext3 and re-implementing them would
only reduce the reliability."
not really, S3 is very far above the level of a block device
already. putting a file system on top of S3 is more like putting
another file system on top of ext3; such a thing would have to
understand all of ext3's peculiarities while pretending to present a
different set of peculiarities to the client in its new file system
costume.
I can understand the desire to conveniently view the list of keys and
drag/drop them around and so forth rather than doing these operations
using a command line tool, but having an app that understands S3
displaying and manipulating its content is quite different from
mounting S3 as a file system. a GUI interface to S3 still has to
understand the limitations and what is actually going on under the
hood, and thus can display appropriate messages if necessary, but a
mounted file system can be accessed by anything, which includes a lot
of things that for sure *don't* have any idea what is going on with
S3.
for example, suppose I "mount" S3 in windows as a share. now let's
rename a 2GiB file from S:\my_bucket\old.xls to S:\my_bucket\new.xls
using windows explorer. S3 doesn't allow renames, so the bits take 12
hours to transfer because I have DSL with a slow upload speed. when
does "new.xls" show up for use by apps? does the rename edit I'm
doing by clicking in explorer hang for 12 hours? does the new name
magically show up for other apps even though it doesn't exist in S3
until the transfer is complete? if it does show up what happens if
the transfer is cancelled, e.g. the network goes away, while another
app has the "file" (really a cached local copy?) open? what happens
if the S3 file system gets a series of 500 errors because S3 isn't in
the mood or if the etag doesn't match? and so forth.
|
|
Posts:
183
Registered:
12/20/06
|
|
|
|
Re: How do you Mount S3 on an External Linux Server?
Posted:
Aug 12, 2007 6:56 AM PDT
in response to: lowflyinghawk
|
|
|
"The analogy is like, why don't I treat /dev/hd1 as a block device and
do all my storage operation using that protocol but use ext3 instead ?
Because I need the features of ext3 and re-implementing them would
only reduce the reliability."
not really, S3 is very far above the level of a block device
already. putting a file system on top of S3 is more like putting
another file system on top of ext3; such a thing would have to
understand all of ext3's peculiarities while pretending to present a
different set of peculiarities to the client in its new file system
costume.
I can understand the desire to conveniently view the list of keys and
drag/drop them around and so forth rather than doing these operations
using a command line tool, but having an app that understands S3
displaying and manipulating its content is quite different from
mounting S3 as a file system. a GUI interface to S3 still has to
understand the limitations and what is actually going on under the
hood, and thus can display appropriate messages if necessary, but a
mounted file system can be accessed by anything, which includes a lot
of things that for sure *don't* have any idea what is going on with
S3.
for example, suppose I "mount" S3 in windows as a share. now let's
rename a 2GiB file from S:my_bucketold.xls to S:my_bucket ew.xls
using windows explorer. S3 doesn't allow renames, so the bits take 12
hours to transfer because I have DSL with a slow upload speed. when
does "new.xls" show up for use by apps? does the rename edit I'm
doing by clicking in explorer hang for 12 hours? does the new name
magically show up for other apps even though it doesn't exist in S3
until the transfer is complete? if it does show up what happens if
the transfer is cancelled, e.g. the network goes away, while another
app has the "file" (really a cached local copy?) open? what happens
if the S3 file system gets a series of 500 errors because S3 isn't in
the mood or if the etag doesn't match? and so forth.
I was not talking about the technical side of the thing, may be I chose the wrong analogy.
May be let's put it this way. I want to run a robust IMAP server which needs lots of lots of storage and I decided S3 can provide that, with its backup feature as an added bonus.
Now should I write an IMAP server from scratch implementing all those protocols and the various authentication sub-system(kerberos, LDAP, SQL, PAM etc.), the SSL layer, the folder management etc. just because all existing IMAP servers which have been running so well expect the underlying store to be a FS and I cannot use them ? Or it is better for me to make S3 looks like a FS, or make it looks like a block device then have a FS on it ? My bet is the later option is order of magnitude simpler(thus less bug as bug of any system is proportional to LOC).
Your example actually is the reason why there are situation where the basic atomic operation is not enough. Since S3 doesn't support rename and you want to rename a 2GB object. You are asking for things the atomic GET/PUT cannot provide. Now in my s3-fuse, this rename only touch one directory entry(one S3 entry in my case) with the 2GB still as it is. Using the NBD model, it would still touch one strip and can be done in one PUT, within a second or so through my cable link, not counting in using cache which can be async.
And I have said in other threads that with the right cache system, it can live for quite sometime for S3(or the network) is down. Of course none-cached data cannot be retrieved(but that is no different from if I only use S3 as key paired object store, I still cannot get them) but update can continue and cached till the next time the link(and S3) is up again. And 500 is something that is expected(which I saw all the time in my log) and designed into the system. I have tested all these cases, including pulling the network cable in the middle.
Note all these is under the assumption that the data on S3 are not shared from multiple locations. That is a completely different issue which I have also voiced my concern.
|
|
Posts:
2,112
Registered:
3/15/06
|
|
|
|
Re: How do you Mount S3 on an External Linux Server?
Posted:
Aug 12, 2007 8:18 AM PDT
in response to: bonobo2000
|
|
|
I don't think the choices are as black and white as "make S3 look like a filesystem" or "rewrite the application from scratch". Certainly, making S3 look like a filesystem is the quickest way to get that IMAP server up and running. But you have to realize that by doing it that way you haven't really created an IMAP server with the reliability and durability of S3. You have created an IMAP server with lots of storage that happens to reside on S3. There's a big difference.
If you want an IMAP server that really leverages the reliability, scalability and durability of S3 you have to modify your IMAP server to understand the semantics of S3. Unless that IMAP server is an incredibly poorly written piece of software, that shouldn't require starting from scratch but it will take some work. However, I think the end result is vastly superior to the easy approach.
I'm not saying that the easy way is necessarily wrong. I'm just pointing out that if you go that way you are compromising the most important feature of S3; reliability.
Mitch
|
|
Posts:
5
Registered:
6/15/07
|
|
|
|
Re: How do you Mount S3 on an External Linux Server?
Posted:
Aug 12, 2007 8:26 AM PDT
in response to: Steve Amerige
|
|
|
In a nutshell, the issue is about backward compatibility, staff resource availability for re-coding, and QE time and effort. If S3 can only be used in non-backward-compatible applications, then its use is significantly limited.
|
|
Posts:
2,112
Registered:
3/15/06
|
|
|
|
Re: How do you Mount S3 on an External Linux Server?
Posted:
Aug 12, 2007 8:35 AM PDT
in response to: Steve Amerige
|
|
|
I don't think mounting S3 as a file system solves your QE problem. You've just introduced a whole new layer of software and that would require significant QE planning and effort.
Mitch
|
|
Posts:
555
Registered:
8/26/06
|
|
|
|
Re: How do you Mount S3 on an External Linux Server?
Posted:
Aug 12, 2007 10:18 AM PDT
in response to: M. Garnaat
|
|
|
mitch is right:
1) "filesystem or rewrite from scratch" is a false choice
2) making s3 *look* like a filesystem doesn't mean that it is one. if you have a lot of bandwidth and you test with small files you may even think it is, but if your bandwidth is low or happens to drop one day, or if you start putting large files on it, or one of your apps wants to be able to seek or rewrite a few blocks often, you will shortly find out that your testing problem just appeared to disappear.
|
|
Posts:
444
Registered:
9/3/06
|
|
|
|
Re: How do you Mount S3 on an External Linux Server?
Posted:
Aug 12, 2007 2:13 PM PDT
in response to: lowflyinghawk
|
|
|
I disagree. For the majority of applications having a native file system will make the setup and maintenance of applications backed on s3 much easier. Having to re-architect your application to take advantage of s3 is far more time consuming, then using a prebuilt adaptive s3 file system such as our ElasticDrive.
We've built ElasticDrive because of our own need for a variety of usages ranging from distributed computing systems, content delivery / asset management and remote data backup to name a few.
In our typical initial application development phase we were doing a lot of the same tasks in architecting our applications to take advantage of s3. We were spending weeks working on a relatively small portions of the overall application development. We quickly realized we needed an adaptive system that could be easily re-used and modified based on the requirements of either the application, project or service agreement. After doing our research we determined there really wasn't anything out there that fulfilled all our needs so we started development of our own distributed network block device that could be configured in a variety of ways.
Use Case
One of the most obvious uses of S3 is as a backup / disaster recovery system similar to that of a tape drive. There are literally dozens of commercial tape backup software systems to choose from, none of which are focused on s3. In looking at the various offerings we eventually choose the Amanda Open source solution. Amanda provides the unique capability of writing backups to tape and disk simultaneously as well as a variety of other innovative features, and its free. Amanda is extremely flexible so we could have modified the system to work with s3, but this would have taken 120 - 200 hours of development, by using ElasticDrive, we we're able to get the backup running in a matter of hours.
Another quick Example
We currently have development servers both onsite and at offsite hosting facilities such as rackspace, EV1 and Amazon's EC2. Neither provide us with great recovery solutions, what we needed was a quick and easy way to recover our servers in case of an outage or system failure. Our solution was to mount ElasticDrive in a RAID configuration, all data is both written locally for speed and remotely for recovery. Bandwidth and requests are kept to a minimum by our usage of both block level compression and caching. If connectivity goes down our adaptive cache waits and writes the data at a later point. We could even mount a local loop back for instance snapshots and file level versioning using alternative file systems such as QCOW or EXT3COW.
I should also note that another benefit is that we could use ElasticDrive in correlation to not only Amazon's S3 service but other remote storage services such as AOL's X-drive or your even own SAN environment.
Reuven Cohen
Chief Technologist, Enomaly Inc
Links
http://www.elasticDrive.com - S3 Network Block Device
http://amanda.zmanda.com - Backup Application
http://www.enomalyLabs.com - Enomaly R&D
|
|
Posts:
183
Registered:
12/20/06
|
|
|
|
Re: How do you Mount S3 on an External Linux Server?
Posted:
Aug 12, 2007 3:57 PM PDT
in response to: M. Garnaat
|
|
|
I don't think mounting S3 as a file system solves your QE problem. You've just introduced a whole new layer of software and that would require significant QE planning and effort.
Mitch
True, but that one layer is still much simpler than redoing the 10+ layer on top of it that people have spent years to test and refined.
|
|
|
|