|
Discussion Forums
|
Thread: Location of instances
 |
This question is answered.
Helpful answers available: 1.
Correct answers available: 1.
|
|
|
|
Replies:
8
-
Pages:
1
-
Last Post:
Nov 8, 2006 11:34 AM
by: Douglass R. Cut...
|
|
|
Posts:
3
Registered:
8/30/06
|
|
|
|
Location of instances
Posted:
Aug 30, 2006 4:41 PM PDT
|
|
|
I have a couple of questions:
1) Is there any way to guarantee that two instances are not running on the same physical server? I can set up replication/failover between two instances, but if they are running on the same box, I haven't really gained anything. This is only useful if it is a hard guarantee, and I would ideally like to be able to specify that two instances don't even run in the same datacenter.
2) Is it possible to make sure that two instances are on the same switch? If I want to set up high-bandwidth, low latency communication between them, then I need to be sure that it's not going over a slow or congested link. (If everyone is designing large-scale distributed apps without any regard for network topology, I imagine the interconnects will bog down pretty quickly.)
Thx,
Ben
|
|
Posts:
69
Registered:
3/16/06
|
|
|
|
Re: Location of instances
Posted:
Aug 30, 2006 5:03 PM PDT
in response to: Ben Strong
|
|
|
I haven't checked the docs - but, is there even an assumption of different data centers being available? Especially since "internal" traffic is free, would it still be if your instances were in different data centers?
I'd actually like to add to this, though - requesting geographic locality would be of great benefit. EC2 could be useful to services like VoIP, but, with such services, latency kills. Being able to provision a server in a geographically advantaged location would be powerful - especially in response to demand increasing in a geographic location. Hopefully, this is something that can be added to the feature map in a future release.
|
|
Posts:
581
Registered:
6/22/06
|
|
|
|
Re: Location of instances
Posted:
Aug 30, 2006 11:20 PM PDT
in response to: Ben Strong
|
 |
Helpful |
|
|
We are paying close attention to control over locality of instances, and please let us know in detail what your applications require in this regard.
We currently have a placement algorithm that works as follows:
1. We will try to launch all of the instances of a single launch request in a single datacentre.
2. We will try to split your instances across at least two racks (for rack-level redundancy).
3. We will try not to place multiple instances on the same host machine.
As you mention, this is not ideal for all applications.
|
|
Posts:
3
Registered:
8/30/06
|
|
|
|
Re: Location of instances
Posted:
Aug 31, 2006 11:53 AM PDT
in response to: RolandPJ@AWS
|
|
|
Thanks for the response, Roland.
My main concern is that "We will try not to place multiple instances on the same host machine." is not a strong enough guarantee given that you don't persist local storge and the best practices guidelines state "You should use a replication strategy across multiple instances to keep your data safe". If there is any possibility that multiple instances might be running on the same machine, then replication is not going to guarantee persistence or availability. If this is in fact the case, then it seems like a pretty major problem and deserves to be called out in the docs (please correct me if I'm misinterpretting anything here).
To state that as a requirement, I would say that I need to be able to define a replication group that guarantees (no "we will try" here) that no two instances will run on the same machine. Being able to specifiy that certain instances are in different racks or datacenters would be nice-to-haves.
My concern about guaranteeing that instances were on the same switch or in the same datacenter was a little less specific, but there are two scenarios that I'm worried about. The first is that I can just imagine all sorts of cases where I would need to guarantee a low-latency connection between two instances (e.g., between a database server and an app or web server or between two instances of a clustered database). The second is that I've been bitten in the past by putting servers that exchange a lot of data on opposite ends of a shared cross-connect (either between switches or datacenters), and without any control over network topology, you could easily find yourself in a situation like that.
|
|
Posts:
581
Registered:
6/22/06
|
|
|
|
Re: Location of instances
Posted:
Aug 31, 2006 1:04 PM PDT
in response to: Ben Strong
|
|
|
Ben, I hear you on all of the above, and no you're not misinterpreting anything. Thanks for the great feedback.
|
|
Posts:
232
Registered:
8/24/06
|
|
|
|
Re: Location of instances
Posted:
Sep 2, 2006 12:40 AM PDT
in response to: Ben Strong
|
|
|
I agree with you.
Depending on the application you need to know the latency / bandwidth available.
An application server on one machine that connects to a database on another machine is the typical case where you need the machines to be located in the same datacenter (and not
across
datacenters).
Of course there must have limits to such a guarentee. For example, it is
unreasonable
to expect to be able to put in the same datacenter 10000 machines !
There will be no space available for every one.
What Amazon should do is to state some
minimal
guratentees.
What they could do is something along these lines:
1) you define a meta-instance (or group). A meta-instance is a configuration where you define 2 things: the type of instance and the number of machine.
For example1:
instance type: webserver number: 1 machine
instance type: database number: 1 machine
For example2:
instance type: webserver number: 2 machines
instance type: database number: 1 machine
2) you launch a meta-instance: Amazon would guarentee that all the machines will be on the same datacenter.
As I said, Amazon cannot put all the machines on the same datacenter. To control this, Amazon would limit the maximum number of machines of a meta-instance.
For example, they could say: "You're allowed to put at max N machines in a meta-instance".
So if they limit the number of machines to 2 (or 4,6,8), you know that you are guarenteed to have those 2 (or 4,6,8) into the same datacenter.
If you launch a meta-intance twice, you know that both meta-instance could end up in 2 different datacenters.
|
|
Posts:
13
Registered:
3/14/06
|
|
|
|
Re: Location of instances
Posted:
Sep 3, 2006 4:40 AM PDT
in response to: Ben Strong
|
|
|
Yes, the above sounds exactly right. Define a locality group that specifies instances launched together or separately must be allocated where inter-instance linkages will be fast. Perhaps not a queryable max N, but simply a hard no-more-in-this-locality-group error, on instance allocation -- first come, first serve vs preallocation of groups (or sizing therein).
My envisioned system is roughly the same: head units dealing with users, talking to processing units that expand and collapse as needed. At the point of locality group allocation failure, we make a new locality group and replicate the head unit / processing unit structure in there.
|
|
Posts:
13
Registered:
3/14/06
|
|
|
|
Re: Location of instances
Posted:
Sep 3, 2006 2:23 PM PDT
in response to: jeremywohl
|
|
|
Perhaps the locality groups can be of two types. a) those needing same network segment, i.e. software load balancers, and b) those needing fast general network udp/tcp, where same datacenter is fine (assuming roughly equidistant superswitches/routers in a center?).
|
|
Posts:
4
Registered:
11/8/06
|
|
|
|
Re: Location of instances
Posted:
Nov 8, 2006 11:34 AM PST
in response to: RolandPJ@AWS
|
|
|
Is there a supported way to tell which rack and/or datacenter an instance runs in?
Also, is S3 data stored in the same racks and/or datacenters where EC2 instances run? In particular, can one expect to more quickly access S3 data from some instances than others, and, if so, how can one determine that? A simple heuristic might be that, if one records which instance writes some data, is it likely that that instance will be able to read that data faster than other (randomly selected) instances? Which other instances are also likely to be able to read it more quickly, just those in the same datacenter, or also those in the same rack?
Thanks!
|
|
|
|