Discussion Forums



Thread: Persistant storage - FTP vs NBD vs ...?

Welcome, Guest Help
Login Login


Permlink Replies: 15 - Pages: 2 [ 1 2 | Next ] - Last Post: Jun 4, 2008 8:29 PM by: MarkHu
infiniteftp

Posts: 90
Registered: 6/25/07
Persistant storage - FTP vs NBD vs ...?
Posted: Aug 10, 2007 8:33 AM PDT
  Click to reply to this thread Reply

So here's the great question: which persistent storage solutions areavailable, and which are appropriate for various situations? Mythoughts posted separately..




infiniteftp

Posts: 90
Registered: 6/25/07
Re: Persistant storage - FTP vs NBD vs ...?
Posted: Aug 10, 2007 2:07 PM PDT   in response to: infiniteftp
  Click to reply to this thread Reply

Here are my thoughts, not intended to be trolling or an endorsement for any particular product.

Aside from mirroring utilities or distributed file systems requiring a cluster of machines, I am aware of only three solutions to persistent storage on EC2:

FUSE-based:  s3fuse
NBD-based: ElasticDrive
FTP-based: InfiniteFTP

If you don't know what FUSE is: it's a utility that makes something (like S3) look like a file system when it's really not.

If you don't know what NBD is: it's a utility that makes something (likeS3) look like a raw, unformatted, physical disk when it's really not.From there, you can format the disk using whatever file system you need.

If you don't know what FTP is: it's a utility that transfers files to and from a remote source (like S3).

I haven't tried s3fuse and don't know anyone who has. I also have not researched it, sorry. I have mounted InfiniteFTP using FUSE. It works well.

I have looked at NBD vs FTP/FUSE and the big difference I see comes down to whether you need random access IO to your files. If you do, NBD will work better because it can pull individual blocks which are probably smaller than whole files, therefore faster.

Random access sounds good, but you lose some things due to the nature of disk volumes and S3 limitations. Some of these can be mitigated. Here is my list of what you give up to get random IO:

* File size limits
* Path size restrictions, file systems do
* Volume size limits

Those are the easy ones to spot and maybe aren't even big problems since we live with them every day already. Here are some harder problems,especially on S3:

* Defrag. Common file systems on Linux likeext2 do and would be extremely expensive in S3 charges. Moreover, some file systems need to be taken offline to perform a defrag.
*Expansion. File systems can't expand infinitely like S3. Some file systems allow you to grow the partition as needed, but this may require you to take the file system offline. Eventually, you will hit a volume size limit and then you're done. No infinite storage here. You can try volume spanning and distributed systems, but I have no expertise in that nor do most users. It's back to a life of filling up disks and pushing data around, but now you're doing so over S3 which is relatively slow.
* Disaster. NBD seems to be at risk of a catastrophic crash. There is a great debate about this in a separate thread, but the reality is that data written to S3 may not appear immediately and you can not afford to lose the wrong block on your hard disk. The worst that happens with something like FTP is you lose the file you tried to write. Worst case on NBD is you lose a block of your allocation table and your whole volume is ruined. The best data we have so far on that scenario is "it hasn't happened to me yet"
* Space economy. Common file systems on Linux must pre-allocate storage space.This means you're paying for unused disk space on S3. Say you allocated a 500GB volume but are only using 100GB. You're paying for all 500GB.
*Initialization. File systems must be formatted first. A user reported this takes hours because S3 data transfer speed is about 10MByte/s, not to mention the cost.
* Exclusive access. NBD requires exclusive access by one machine. No sharing a file area.

So those are harder problems but not impossible problems. Ways to mitigate some of the drawbacks:

* Some less common file systems do not need to pre-allocate space.
* Some less common file systems do not store empty blocks
* Some less common file systems do not need to defrag
* Some less common file systems can do a 'quick' format which does not intialize everything with 0's
*You can do concurrent access by running SAMBA or NFS on top of your file system. Whichever machine has exclusive access to the NBD storebecomes the the file server.
* You can run software RAID on top of NBD to get redundancy. In theory this would make it less likely that S3would lose a key block of data that would crash your whole file system.

In summary, if you can find The Perfect File System, you can work around the hard problems and just be left with problems that might not matter,like file size and volume size limitations.

I'm intrigued byNBD. There's a lot of use cases that don't approach the outer limits ofplain old file systems. Have I missed anything, or is that a fair analysis?

Disclosure: I make InfiniteFTP which provides FTP access to Amazon S3. Other users on this board like Enomaly have created NBD solutions which provides a virtual block storage device backed by S3. Still others make FUSE and mirroring tools. If you do respond and are affiliated with a product, it would be helpful to disclose.


---
Ben
http://www.infiniteftp.info
FTP access to Amazon S3 - $4.95 monthly


bonobo2000

Posts: 183
Registered: 12/20/06
Re: Persistant storage - FTP vs NBD vs ...?
Posted: Aug 10, 2007 10:04 PM PDT   in response to: infiniteftp
  Click to reply to this thread Reply

I have written(and use) all three forms.

A FUSE based file system using S3 as the backend store(still treating S3 as block device, because of the random access need). this one do not need formatting and can be expanded inifinitely without taking it offline. Nor does it needs defrag. It does get slow and expensive when there are lots of metadata update(e.g tar or use for Maildir).

A NBD device then any FS or other apps that expect linux block device can run on.

A FUSE based GIT specific FS. This is very similar to FTP in that it can only be GET/PUT but not rewrite/resize, so every object is atomic and the directory is kind of a 1:1 mapping to S3 key. I can even use S3 to act like a read only GIT repository (GIT use only HTTP GET to access repo) without running my own web server.

It really boils down to, what is the usage scenario.

However, there is something that I don't agree, especially about NBD and FS.

Proper implementation of NBD like device don't need to pre-allocate any space on S3, A 404 status can be treated as "X size of null for this block", this is used in many linux FS. The same is true for S3 based FS.

With planning, you don't need to move things around to expand, LVM2. start another instance of NBD then add it to VG.

Initialization of FS on NBD device also depends on how the device is implemented. Not all FS needs to touch every single block just to format them. My experiement with ext3(which is not a less common FS but the most used FS) is that it only needs to put something at specific blocks(very small percentage). Even if it does, just use zlib to compress the block(almost all null) and it would be a very short PUT, though cost a bit because of the new price model.

Reliabiity, I have already voiced my view, it is at best a "theoretically would happen" case that can be shielded with proper caching.


addady

Posts: 29
Registered: 8/18/07
Re: Persistant storage - FTP vs NBD vs ...?
Posted: Aug 18, 2007 2:19 PM PDT   in response to: infiniteftp
  Click to reply to this thread Reply



infiniteftp wrote:
*Expansion. File systems can't expand infinitely like S3. Some file systems allow you to grow the partition as needed, but this may require you to take the file system offline. Eventually, you will hit a volume size limit and then you're done. No infinite storage here. You can try volume spanning and distributed systems, but I have no expertise in that nor do most users. It's back to a life of filling up disks and pushing data around, but now you're doing so over S3 which is relatively slow.
* Space economy. Common file systems on Linux must pre-allocate storage space.This means you're paying for unused disk space on S3. Say you allocated a 500GB volume but are only using 100GB. You're paying for all 500GB.
*Initialization. File systems must be formatted first. A user reported this takes hours because S3 data transfer speed is about 10MByte/s, not to mention the cost.
* Defrag. Common file systems on Linux likeext2 do and would be extremely expensive in S3 charges. Moreover, some file systems need to be taken offline to perform a defrag.

Hi,

I'm looking for solution to connect EC2 to S3. It seems that " Jungle Disk" (jungledisk.com) don't have the above limitation.
As far as I understand it mount the drive with DavFS and I don't need tospecify the size of the partition.  I will pay for the exact volume I consume.

What are the disadvantage of solution like Jungle Disk compare to s3fuse and ElasticDrive?

Thanks,
Addady


infiniteftp

Posts: 90
Registered: 6/25/07
Re: Persistant storage - FTP vs NBD vs ...?
Posted: Aug 18, 2007 3:20 PM PDT   in response to: addady
  Click to reply to this thread Reply

Proxies like InfiniteFTP and JD can expand indefinitely. s3fuse might too, I'm not sure? ElasticDrive is an NBD device and therefore size is limited by the file system format you use.

---
Ben
http://www.infiniteftp.info - FTP access to Amazon S3
http://www.infinitebits.info - Web access to Amazon S3


infiniteftp

Posts: 90
Registered: 6/25/07
Re: Persistant storage - FTP vs NBD vs ...?
Posted: Aug 18, 2007 3:24 PM PDT   in response to: bonobo2000
  Click to reply to this thread Reply


bonobo2000 wrote:
Proper implementation of NBD like device don't need to pre-allocate any space on S3, A 404 status can be treated as "X size of null for this block", this is used in many linux FS. The same is true for S3 based FS.

I agree with that. I don't know that it would work all the time, but it sounds like a viable optimization strategy.

---
Ben
http://www.infiniteftp.info - FTP access to Amazon S3
http://www.infinitebits.info - Web access to Amazon S3





bonobo2000

Posts: 183
Registered: 12/20/06
Re: Persistant storage - FTP vs NBD vs ...?
Posted: Aug 18, 2007 5:14 PM PDT   in response to: addady
  Click to reply to this thread Reply


addady wrote:
What are the disadvantage of solution like
Jungle Disk compare to
s3fuse and ElasticDrive?


Thanks,

Addady


Performance. HTTP(WebDAV is based on) is a very expensive protocol which is not designed for regular usage pattern of a FS where you can have frequent but small read/write and metadata manipulation.

To the point that the following sequence is faster than WebDAV

Windows -> Samba share -> s3-fuse/NBD based FS

Beside, somehow, WebDAV(client) is not a fully tested thing, either on Windows or Linux. So many applications just encounter problems here and there if the underlying FS is WebDAV.

Of course if all you want is backup/restore like usage, it doesn't matter.

enomaly

Posts: 444
Registered: 9/3/06
Re: Persistant storage - FTP vs NBD vs ...?
Posted: Aug 19, 2007 1:10 PM PDT   in response to: bonobo2000
  Click to reply to this thread Reply

Why did we select an Network block device instead of Fuse?

1. RAID Support - Raid isn't supported using Fuse or FTP based systems. It is using NBD.
2. Support - Just about every linux OS since 1996 supports it.
3. Performance - NBD is far more customizable and can be configured to use multiple protocols, interfaces and file systems such as ISCSI, Raid, EXT, XFS, ZFS, etc.
4. Security - FTP can be easily hacked, is slow, has poor error reporting.
5. Caching and multi-threading - We're able to greatly increase speed via adaptive caching and multi-treaded read/write.

I would say the main reason is purely based on RAID support. It's also nice that we can write to several various storage systems at once, for example onsite, offsite (s3), and alternative site (xdrive). We could even write to a local loop back device and provide block level versioning with near instant roll backs and file system snapshots.

Reuven
http://www.elasticdrive.com


bonobo2000

Posts: 183
Registered: 12/20/06
Re: Persistant storage - FTP vs NBD vs ...?
Posted: Aug 19, 2007 5:32 PM PDT   in response to: enomaly
  Click to reply to this thread Reply

While I agree with your other points, caching and multitasking though has nothing to do whether it is FTP, FUSE, WebDAV or NBD. All can employ these two techniques.

As for RAID, while it is definitely better for the NBD approach(I have said in other threads that not re-inventing the wheel is one of the way to have better reliability), the same can be added for FTP/FUSE etc. too. Just write to a seperate S3 bucket(or other service provider). I experimented it a bit using my dreamhost account and now that xdrive has opened up their API, I may play with it too.

In fact after the recent price change, I am thinking about instead of using S3, it may be better for me to use multiple el cheapo hostings services like dreamhost + bluehost that give you insane space for a fixed very low monthly fee and with no access cost. RAID over the net. Two+ less reliable service may beat S3.

And I have the added advantage that I can use other linux tools to manipulate data on these accounts, unlike S3 where I need to pay heavily say for RAID rebuilt.

enomaly

Posts: 444
Registered: 9/3/06
Re: Persistant storage - FTP vs NBD vs ...?
Posted: Aug 19, 2007 6:48 PM PDT   in response to: bonobo2000
  Click to reply to this thread Reply

Your right, for us NBD was the most straightforward solution. re-inventing the wheel is not something we wanted to do.

reuven


addady

Posts: 29
Registered: 8/18/07
Re: Persistant storage - FTP vs NBD vs ...?
Posted: Aug 21, 2007 12:07 AM PDT   in response to: bonobo2000
  Click to reply to this thread Reply


bonobo2000 wrote:
In fact after the recent price change, I amthinking about instead of using S3, it may be better for me to usemultiple el cheapo hostings services like dreamhost + bluehost thatgive you insane space for a fixed very low monthly fee and with noaccess cost. RAID over the net. Two+ less reliable service may beat S3.

How you can implement raid over remote storage like dreamhost  bluehost?  They give you only ftp/webdav access.

Addady




bonobo2000

Posts: 183
Registered: 12/20/06
Re: Persistant storage - FTP vs NBD vs ...?
Posted: Aug 21, 2007 12:36 AM PDT   in response to: addady
  Click to reply to this thread Reply


addady wrote:
How you can implement raid over remote storage like dreamhost  bluehost?  They give you only ftp/webdav access.


Addady




I mean I use bluehost/dreamhost as the poorman's S3.

Firstly, S3 is nothing but GET/PUT with authentication using SHA1-HMAC and webdav also support GET/PUT. You can use ftp too but that is an even worse protocol than HTTP GET/PUT.

Secondly, I can run my own fastcgi process which can even mimic the SHA1-HMAC behaviour. Personal S3.

Performance is not an issue as the bottleneck would be my link anyway.

By RAID, I mean instead of GET/PUT only to S3, I can have multiple GET/PUT of the same data to bluehost + dreamhost + other similar el cheapo hosting.

Or in the context of NBD, I can have

/dev/nbd0 -> dreamhost/data/<chunks>
/dev/nbd1 -> bluehost/data/<chunks>

then let the md driver takes care of the rest(over nbd0, nbd1, ...).

I do this to test out my ideas about S3 for free as I already have a dreamhost account.


derek anderson

Posts: 36
Registered: 4/4/07
Re: Persistant storage - FTP vs NBD vs ...?
Posted: Aug 21, 2007 9:40 AM PDT   in response to: infiniteftp
  Click to reply to this thread Reply


infiniteftp wrote:
Proxies like InfiniteFTP and JD can expand indefinitely. s3fuse might too, I'm not sure? ElasticDrive is an NBD device and therefore size is limited by the file system format you use.


---

Ben

http://www.infiniteftp.info - FTP access to Amazon S3

http://www.infinitebits.info - Web access to Amazon S3




Yeah, we (ElasticDrive) are currently limited to around 4 Terabytes per filesystem, although formatting something that big takes a LONG time (all day from loaded systems).



derek anderson

Posts: 36
Registered: 4/4/07
Re: Persistant storage - FTP vs NBD vs ...?
Posted: Aug 21, 2007 9:44 AM PDT   in response to: bonobo2000
  Click to reply to this thread Reply

This is really interesting. Are you able to sue the same libraries for both, and just change the target hostname? I ask because the newest version of elasticdrive (not released yet) allows you to configure by url. The url is static for s3, but we could allow other url's to be used which would allow s3 workalike functionality to target any hosting environment.



bonobo2000 wrote:

addady wrote:

How you can implement raid over remote storage like dreamhost  bluehost?  They give you only ftp/webdav access.



Addady





I mean I use bluehost/dreamhost as the poorman's S3.


Firstly, S3 is nothing but GET/PUT with authentication using SHA1-HMAC and webdav also support GET/PUT. You can use ftp too but that is an even worse protocol than HTTP GET/PUT.


Secondly, I can run my own fastcgi process which can even mimic the SHA1-HMAC behaviour. Personal S3.


Performance is not an issue as the bottleneck would be my link anyway.


By RAID, I mean instead of GET/PUT only to S3, I can have multiple GET/PUT of the same data to bluehost + dreamhost + other similar el cheapo hosting.


Or in the context of NBD, I can have


/dev/nbd0 -> dreamhost/data/<chunks>

/dev/nbd1 -> bluehost/data/<chunks>


then let the md driver takes care of the rest(over nbd0, nbd1, ...).


I do this to test out my ideas about S3 for free as I already have a dreamhost account.



bonobo2000

Posts: 183
Registered: 12/20/06
Re: Persistant storage - FTP vs NBD vs ...?
Posted: Aug 21, 2007 5:47 PM PDT   in response to: derek anderson
  Click to reply to this thread Reply


derek@chargedmultimedia.com wrote:
This is really interesting. Are you able to sue the same libraries for both, and just change the target hostname? I ask because the newest version of elasticdrive (not released yet) allows you to configure by url. The url is static for s3, but we could allow other url's to be used which would allow s3 workalike functionality to target any hosting environment.


Since I don't use any library(and do strict HTTP GET/PUT/HEAD/DELETE) and controls the server process on dreamhost, yes. Just replace the URL is enough. Something like the following :

ROOT=http://my_dreamhost_domain/s3like/

vs

ROOT=http://s3.amazonaws.com/my_bucket/

For my use case, I don't need to emulate the XML stuff for the LIST operation as I don't need it. That simplify things a bit.


Point your RSS reader here for a feed of the latest messages in all forums