Discussion Forums



Thread: Database Functionality: What Do You Need?

This question is not answered. Helpful answers available: 2. Correct answers available: 1.

Welcome, Guest Help
Login Login


Permlink Replies: 148 - Pages: 10 [ 1 2 3 4 5 | Next ] - Last Post: Feb 2, 2008 4:52 PM by: mohanjith
Martin@AWS

Posts: 9
Registered: 2/9/06
Database Functionality: What Do You Need?
Posted: Sep 11, 2006 11:59 AM PDT
 
  Click to reply to this thread Reply

We've had multiple requests for "database functionality" and we'd like to get a clearer picture of what that means from a functionality perspective:

- What kind of actions and functionality do you have in mind?  Would this be performing simple queries against a dataset, or do you want to do more complex operations such as mathmatical calulations (multiply, divide, average), record manipulations, etc. We have a sense that what most people use MySQL for is name-value pair lookups...

- Are you comfortable running databases in EC2 (using clusters of MySQL instances mirroring record metadata between them and S3 for record storage), or do you need a separate, more robust, reliable service?  If you are ok with running this in EC2, what characteristics should an instance optimized to host MySQL have?

- What sorts of read / write speeds would you need to see for this to be usable for your application?

Please include any other parameters I've overlooked.


Glenn I. Fleishman
RealName(TM)


Posts: 13
Registered: 3/12/06
Re: Database Functionality: What Do You Need?
Posted: Sep 11, 2006 1:02 PM PDT   in response to: Martin@AWS
 
  Click to reply to this thread Reply

This is a great question.

For me, I use MySQL for three somewhat distinct purposes.

Relatively complex queries from a static set of databases that changes perhaps on a weekly basis (as a complete replacement or update that could be loaded from elsewhere).

Relatively simple insertions for certain kinds of transactions that are used for state-management or simple storage. So, for instance, an ID that's stored in a cookie for a user to track several variables across their use of a Web site. Or, the retrieval of pricing information that is a single INSERT as a row in a single table.

Somewhere more complicated arrangements in which a query can result in some changes to tables and requires joins or selects across multiple tables to produce a result--where maybe 50 database queries produce a single Web page presentation.

For the first and second, I suspect that the cost and complexity are enormously lower for you guys to run generic, tuned MySQL instances that are paid by CPU time/transaction, than for me and 1,000 other people to duplicate that with our own images. Right now, I have a $4,000+ server that only does MySQL, and it's well tuned, but from my perspective (even though I own and operate the hardware) it's a black box that talks MySQL. The hardware is irrelevant.

Typically, very high memory is key. I'd like an option of a MySQL server with 4 GB to 8 GB, for instance. I have been told that being able to load entire databases into memory is a key aspect of performance with MySQL. So if it were instance based, I might wind up running three 4 GB instances, one for each large table set (where tables don't need to join across servers).

I expect that you would need to price this based on tunage -- for read-only MySQL with a certain complexity, you charge $X; for read/write with simulation of a certain kind of machine, you charge $Y and so forth.

I complete several million transactions per day, but I haven't bothered to measure read/write speed. Since I can pull stats from my current setup, what would you like to see?


pieter334

Posts: 1
Registered: 9/11/06
Re: Database Functionality: What Do You Need?
Posted: Sep 11, 2006 2:05 PM PDT   in response to: Martin@AWS
 
  Click to reply to this thread Reply

I think it would be nice to have some kind of reliability. I assume that when running a db server on an instance in the cloud it will be very hard to use the S3 storage service as the filesystem for the db. That implies that when all instances crash, you lose your data. Or even when you do manage to use S3 as a filessystem, it should be possible to save at least the transaction log of the database directly to S3. And that could be problematic performence wise I think.

To summarize, I think it is of utmost importance to be able to preserve the information of the database between instance crashes, or restarts.

I don't see how this is possible today.

kind regards,

Pieter

enomaly

Posts: 444
Registered: 9/3/06
Re: Database Functionality: What Do You Need?
Posted: Sep 11, 2006 2:35 PM PDT   in response to: Martin@AWS
 
  Click to reply to this thread Reply

We have three typical setups, a stand alone mysql setup, for low volume sites, a high usage (replicated setup), and high avaliablity (load balanced)

Our MySQLreplication setup allows you to have an exact copy of a database from amaster server on another server (slave), and all updates to thedatabase on the master server are immediately replicated to thedatabase on the slave server so that both databases are in sync. Thisis not a backup policy because an accidentally issued DELETE commandwill also be carried out on the slave; but replication can help protectagainst hardware failures though. 

The other option is a clustered approach using a
MySQL 5 cluster with three nodes: two storage nodes and one managementnode. This cluster is load-balanced by a high-availability loadbalancer that has two nodes that use the Ultra Monkey package which provides heartbeat (for checking if the other node is still alive) and ldirectord (to split up the requests to the nodes of the MySQL cluster).

reuven cohen
www.enomaly.com



Brad Clements

Posts: 19
Registered: 3/15/06
Re: Database Functionality: What Do You Need?
Posted: Sep 11, 2006 3:12 PM PDT   in response to: Martin@AWS
 
  Click to reply to this thread Reply

I guess I'm probably one of the few who has not been overly impressed with MySQL over the years.

I've always needed full ACID transactions. Since MySQL couldn't supply that in the early days, I never used it.

I wonder if DB2 could be put to use. Maybe IBM would offer some extra support to setup the provisioning and management system, in exchange for a logo somewhere.

I can't afford Oracle and I think it'd be an overkill..

What about Firebird 2.0? It's simple to set up and has good standards conformance..


I guess you'll get a lot of opinions on which database to support, if you were going to support one. So instead here's my thoughts on specific features a database service should support:

1. good language support - drivers for Java, Python (and the other P languages), etc.

2. real transactions

3. Good sql standards support (though I realize that really varies)

4. ability for EC2 users to create new databases via command line tools

5. snapshot backups would be nice, otherwise some kind of backup functionality
(log snapshot plus main db dump). Backups would be sent to S3?

6. Should be able to support multiple connections to the same database instance


---

I think that a primative key-value datastore wouldn't be that useful to me. If you're going to go throught the effort of setting up an api and management tools, is it that much more difficult to provide a "real" db?




Colin Percival
RealName(TM)


Posts: 279
Registered: 4/10/06
Re: Database Functionality: What Do You Need?
Posted: Sep 11, 2006 3:41 PM PDT   in response to: Martin@AWS
 
  Click to reply to this thread Reply

If Amazon started providing a MySQL database service, I wouldn't use it; I wouldn't trust it to be secure and reliable enough. Don't bite off more than you can chew, guys -- if you can't provide a service well (and MySQL doesn't count as providing database services well) then don't do it at all.

As I've mentioned before, the only persistent storage I think EC2 should provide is to allow an instance to store a "sequence number" which can be compared to a series of transaction logs stored on S3 to ensure that the variable propagation time of S3 doesn't result in transactions geting lost between one EC2 instance dying and the next one starting.

Martin@AWS

Posts: 9
Registered: 2/9/06
Re: Database Functionality: What Do You Need?
Posted: Sep 11, 2006 4:59 PM PDT   in response to: Glenn I. Fleish...
 
  Click to reply to this thread Reply

Any instrumentation data you have on current database usage would be helpful.


William Reese
RealName(TM)

Posts: 1
Registered: 9/11/06
Re: Database Functionality: What Do You Need?
Posted: Sep 11, 2006 8:00 PM PDT   in response to: Martin@AWS
 
  Click to reply to this thread Reply

My interest in Amazon EC2 and S3 is for hosting web sites.  Typically these would begin as a single server setup that would eventually scale into a number of web, app, and db servers.

It would be nice if you could create a persistent filesystem or block device of a certain size via web services, then mount it from a single instance.  Then if the instance failed, you could kill it off, start a new one, and mount your same filesystem in the new instance.  Not all instances would need a persistent file system, but it would be nice to have as an option (or additional service) especially for database servers.  You could charge for the disk space used and the amount of IO performed.  The closest solution I can come up with on the current system is to use LVM to take snapshots and to back those snapshots up in S3.  That could work for smaller applications, but it's by no means an ideal solution and you still have the 160GB limitation per instance.

The other thing I'm interested in is the ability to obtain a static IP for an instance or even better, a static IP for a group of instances with some kind of load balancing functionality configured via web services.

A persistent file system, static IPs, and basic load balancing would provide a complete solution for hosting most websites. In addition you would have the ability to scale on demand without any up front costs.  You would have every web startup knocking down your door for an EC2 account. :)

-- Will Reese



Daniel Morrison

Posts: 3
Registered: 9/6/06
Re: Database Functionality: What Do You Need?
Posted: Sep 11, 2006 10:56 PM PDT   in response to: William Reese
 
  Click to reply to this thread Reply

I agree with William: persistent storage is key, then I can run whatever DB I want.

Storing it on S3 would be ideal. I've looked into s3fs and other solutions and am not impressed yet.

Daniel Morrison

Posts: 3
Registered: 9/6/06
Re: Database Functionality: What Do You Need?
Posted: Sep 11, 2006 11:11 PM PDT   in response to: Daniel Morrison
 
  Click to reply to this thread Reply

To expand my previous post, other than the 'alpha' status of s3fs, I don't like that I'm still stuck to a predefined volume size.

With S3, it seems like I could have a zfs-like filesystem where it can infinitely scale and I don't have to worry about how it works...kinda like what S3 was designed for, but in filesystem form.

S. Matzke
RealName(TM)


Posts: 323
Registered: 2/8/06
Re: Database Functionality: What Do You Need?
Posted: Sep 11, 2006 11:35 PM PDT   in response to: Martin@AWS
 
  Click to reply to this thread Reply

Hi Martin,

my take on "database functionality" is somewhat different. It would be more or less a distributed persistent index service provided by Amazon.

I store my objects mostly as coarse grained XML documents on Amazon S3.

So  I would like to be able to create one or more indices as XPath expressions into those documents and then when uploading an object to S3 specify what indices to be used.

Another scenario that could work would be the ability to specify indices per S3 prefix (and bucket). The "Simple Index Service" would tap into S3 and put any document that was created/updated (with a prefix that was used in an index definition) automatically into the index.

The "Simple Index Service" then would allow to start queries against the indices and maybe even combining multiple indices in one query, returning a set of S3 keys.

Oh.. And some basic locking strategy and bulk GETs for S3 would help a lot :-).

Sascha




againnickname

Posts: 232
Registered: 8/24/06
Re: Database Functionality: What Do You Need?
Posted: Sep 12, 2006 2:57 AM PDT   in response to: Martin@AWS
 
  Click to reply to this thread Reply


> - What kind of actions and functionality do you have in mind?
I use the database to store data in related tables. Therefore I need to insert, delete and update and of course select. I use SQL so I expect some functionality like count(*) max(), min(), limit... I do use some SQL constructs that are specific for a database (Postgresql in my case).


>- Are you comfortable running databases in EC2 (using clusters of MySQL
> instances mirroring record metadata between them and S3 for recordstorage),
> or do you need a separate, more robust, reliable service? 
My top priorities are: reliability and availability of the data. The database engine (Postgresql, MySQL, or other) is not that important provided that SQL support is complete enough.
If something bad happens, like a crash, I want to get up as soon as possible (not in a couple of hours, but minutes or even seconds!).
If my database size weights 160GB do not want to wait too long to have access to it. With a standard Ethernet 100MB connection it would take 213 minutes to download 160GB: too long. With the practice of storing the logs on S3 I would still need to apply all the logs to acheive a consistant state: many more minutes.

My conclusion is that S3 is not a suitable tool to store database information.

>Ifyou are ok with running this in EC2,
> what characteristics should aninstance optimized to host MySQL have?
This solution is not ideal but it is doable today. I would like to have the instance take care of all the processes of backup, restore for me with a few simple commands. The use case is the following. I want a multi-master solution (2 pcs). If one instance dies, the other should be aware of it, send an order to shutdown the instance (in case it is still up) and launch a new one, synchronize to restore state (all this with one command if possible). The objective is that in normal circumpstances there is always 2 DB instances up and running (except for the cases when the other crash).
Ideally, if both crash, something should also be able to bring them back up (this could be my app trying to access the clustered DB but fails).

> What sorts of read / write speeds would you need to see for this to
> be usable for your application?
Network latency should be low. Network speed should be LAN between DB instances and App instances. Multiple datacenters should not affect speed (or the requirement to be in the same datacenter should be explicit).




againnickname

Posts: 232
Registered: 8/24/06
Re: Database Functionality: What Do You Need?
Posted: Sep 12, 2006 3:09 AM PDT   in response to: Martin@AWS
 
  Click to reply to this thread Reply

You mention MySQL. Are you thinking about doing something specifically with MySQL? I ask this because I beleive that MySQL is not free for commercial use.


againnickname

Posts: 232
Registered: 8/24/06
Re: Database Functionality: What Do You Need?
Posted: Sep 12, 2006 3:28 AM PDT   in response to: againnickname
 
  Click to reply to this thread Reply

I'm willing to sacrify speed for both availability and reliability. but that's just my opinion.


Brad Clements

Posts: 19
Registered: 3/15/06
Re: Database Functionality: What Do You Need?
Posted: Sep 12, 2006 12:43 PM PDT   in response to: Martin@AWS
 
  Click to reply to this thread Reply

I am beginning to think that persistent disk is probably the best way out of this mess.

As others have said, restoring from S3 could take just too long on startup.

I think you'll never  make everyone happy by providing a db service.

For example, in addition to a relational db, I want to use eXist as well..

The best solution is being able to mount a volume that doesn't disappear on instance restart. I don't now how you'll do this technically.. iSCSI? 

Naturally we'll want to pick our own fs format for this persistent storage..






Point your RSS reader here for a feed of the latest messages in all forums