Discussion Forums



Thread: Massive (500) Internal Server Error.outage started 35 minutes ago

Welcome, Guest Help
Login Login


Permlink Replies: 116 - Pages: 8 [ Previous | 1 2 3 4 5 6 | Next ] - Last Post: Feb 21, 2008 1:43 PM by: sequoyan
beanie4242

Posts: 21
Registered: 10/4/07
Re: Massive (500) Internal Server Error.outage started 35 minutes ago
Posted: Feb 15, 2008 6:19 AM PST   in response to: A. Barbieri
  Click to reply to this thread Reply

Echoing the previous comments:
- outage started around 7:25 EST
- 404 when accessing login page for forum
- initially I could not connect to EC2 or S3, at all (100% failure)
- EC2 started working again just before 9:00 EST
- S3 is still 100% failure

An ETA would be helpful, but it's understandable that it could take a little bit to determine the problem first before providing one.  Please just keep us posted once you have new info.

Also, emergency contact information needs to be provided.  If these forums are our only way of contacting support, and they stop working due to a failure (like this morning), how can we notify you?  Errors happen, but there MUST be a fail-safe way of reporting them.

gcaetano

Posts: 5
Registered: 11/5/07
Re: Massive (500) Internal Server Error.outage started 35 minutes ago
Posted: Feb 15, 2008 6:20 AM PST   in response to: maxcc
  Click to reply to this thread Reply

Dears,

I´m really afraid to use S3 service now. My company choose to work with Amazon because of it´s reliability. We host more than 30,000 images from the number 2 TV station in Brazil. Now we are having several problems because of this S3 issues.

We should send this error to blogs and newspapers. Everyone should know what´s going on with Amazon´s Web Services.

Hope we have news soon!

Gustavo

Orion A. Richardson
RealName(TM)

Posts: 25
Registered: 1/9/07
Re: Massive (500) Internal Server Error.outage started 35 minutes ago
Posted: Feb 15, 2008 6:21 AM PST   in response to: D. Snyder
  Click to reply to this thread Reply

The whole point of S3 is that it is distributed, redundant, and backed by an SLA.  Yes it would be a good idea to have a backup solution, but that's like saying I need to back up my data with three different storage providers.  With uptime it's all about figuring out (a) what the expected uptime of a service is and (b) what redundancy you have to have to make your uptime as high as you need it given that.

That being said, I also have had only minimal issues with S3 until now - which is why we made that "expected uptime" so high.  We'll have to re-evaluate that.





libsyn

Posts: 10
Registered: 1/15/07
Re: Massive (500) Internal Server Error.outage started 35 minutes ago
Posted: Feb 15, 2008 6:21 AM PST   in response to: beanie4242
  Click to reply to this thread Reply

The errors are alternating between:
Http/1.1 Service Unavailable

and a full

<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>InternalError</Code><Message>We encountered an internal error. Please try again.</Message><RequestId>4DA9AB18343E90DA</RequestId><HostId>mik6GeTv+NChd4rFfpawsaaASfTZOdXN2FlKw5kcoJk61NnIKgpG+RaJwqUBDrUh</HostId></Error>

Don't know if that helps, but yes, this issue is really filling up our support queue with emails and complaints

techup

Posts: 5
Registered: 1/11/08
Re: Massive (500) Internal Server Error.outage started 35 minutes ago
Posted: Feb 15, 2008 6:24 AM PST   in response to: maxcc
  Click to reply to this thread Reply

Hi,
we are experiecing the same issue ...
for example, trying to access
the key: 1936.jpg
in the bucket: buddies.koinup.com
we get this error:

Error>
<Code>InternalError</Code>
<Message> We encountered an internal error. Please try again.
</Message>
<RequestId>41786F29ED914C93</RequestId>
<HostId> vWM+6yPsvhu+7ppNGiDqVNGmm2v8mxg6yoIh7rIyJ21VjWMrka9rS8xMi76a6KDj
</HostId>
</Error>

the same happens for  every key in the bucket.....
storage.koinup.com
buddies.koinup.com

some minutes ago the previous error changed in:
Http/1.1 Service Unavailable


postscript: for about 45 minutes i wasn't able to login into the forum
so i have an other delay with the communication
now is more than on hour with the issue

BTW, we faced other different issues in the last weeks.....as other users are saying
if you have Amazon, you should have also an other service for back up

this is a problem....

hope you fix this issue quickly
Message was edited by: techup

thatsmymouse

Posts: 10
Registered: 5/1/07
Re: Massive (500) Internal Server Error.outage started 35 minutes ago
Posted: Feb 15, 2008 6:25 AM PST   in response to: maxcc
  Click to reply to this thread Reply

While I'm surprised this kind of error is possible, a big thanks to Amazon for getting onto this so quickly.

I've had 100% success rate on GET requests for about 20 mins (although all other requests still seem to be failing).


taatu

Posts: 3
Registered: 12/10/07
Re: Massive (500) Internal Server Error.outage started 35 minutes ago
Posted: Feb 15, 2008 6:26 AM PST   in response to: maxcc
  Click to reply to this thread Reply


While S3 Europe is happily trotting along, indeed S3 US is completely down.
Which also means our site is completely down.

The second mayor outage in about a month, although last time it was a DNS issue right outside the S3 infrastructure.

I won't go as far as to say I'll stop using S3 as it's proven very reliable in the last 4 months and has allowed us to handle peak volume a lot better than in the past...
Nevertheless, it's clear that emergency scenarios need to be investigated.


How about AWS developing "replicating buckets" between EU and US?

A bit of load-balancing + error-correcting DNS on top and we've got a world class solution... and honestly, I wouldn't mind at all paying for the bandwidth usage between EU and US to replicate.

Heck if AWS adds loadbalancing DNS to S3, I'd be happy to do my own replication.



Ferran Gutierrez Vilaro

Posts: 1
Registered: 2/15/08
Re: Massive (500) Internal Server Error.outage started 35 minutes ago
Posted: Feb 15, 2008 6:26 AM PST   in response to: maxcc
  Click to reply to this thread Reply

Same issue here, its going now for the 3rd hour completely without service.

Sadly, that is not the first time we see those kind of failures, although past downtimes where less than 10 minutes.




Ian Connor

Posts: 29
Registered: 1/10/07
Re: Massive (500) Internal Server Error.outage started 35 minutes ago
Posted: Feb 15, 2008 6:28 AM PST   in response to: A. Barbieri
  Click to reply to this thread Reply

On a positive note, 30min ago I could not even post to this thread so it looks like work is being done. Hope it does not take too much longer.


A. Barbieri
RealName(TM)


Posts: 63
Registered: 7/7/06
Re: Massive (500) Internal Server Error.outage started 35 minutes ago
Posted: Feb 15, 2008 6:32 AM PST   in response to: thatsmymouse
  Click to reply to this thread Reply

lucky you...

3 hours into the disruption and the 'ERROR 500: Internal Server Error.' has now become 'ERROR 503: Service Unavailable.'

JungleDave

Posts: 110
Registered: 4/24/06
Re: Massive (500) Internal Server Error.outage started 35 minutes ago
Posted: Feb 15, 2008 6:32 AM PST   in response to: maxcc
  Click to reply to this thread Reply

Although it would be nice to get an update from Amazon on the issue, it does appear that some progress is being made. As a few others have mentioned you can now log into this forum, which wasn't working before. Also, the FPS service (which was also affected) has now started accepting web service requests again.
I'm guessing that whatever the issue is, it's tied to authentication which explains why it's affecting all the AWS services. Hopefully this will spur Amazon to add additional redundancy to the authentication system which appears to be a massive single point of failure right now.


Rien Swagerman

Posts: 6
Registered: 1/16/07
Re: Massive (500) Internal Server Error.outage started 35 minutes ago
Posted: Feb 15, 2008 6:36 AM PST   in response to: Ian Connor
  Click to reply to this thread Reply

We're using S3 as our single storage for 250.000 of images and growing ...and we're not the only one (for example 37signals Basecamp).

Very hard to decide if a second backup is necessary. I agree that the whole point of S3 is that it is distributed, redundant, and backedby an SLA. Why then need a fail-safe.

Hope all is solved quickly.


gcaetano

Posts: 5
Registered: 11/5/07
Re: Massive (500) Internal Server Error.outage started 35 minutes ago
Posted: Feb 15, 2008 6:40 AM PST   in response to: maxcc
  Click to reply to this thread Reply

News from Amazon?? I need to say something to my clients.

Message was edited by: gcaetano

artionet

Posts: 2
Registered: 2/15/08
Re: Massive (500) Internal Server Error.outage started 35 minutes ago
Posted: Feb 15, 2008 6:40 AM PST   in response to: A. Barbieri
  Click to reply to this thread Reply

Why amazon don't move all the traffic over the EU Network then ?

This one is working as I can see....


Adam

Posts: 31
Registered: 1/13/07
Re: Massive (500) Internal Server Error.outage started 35 minutes ago
Posted: Feb 15, 2008 6:48 AM PST   in response to: artionet
  Click to reply to this thread Reply

It would be good to have an update on progress so we can pass this on to our clients.


Point your RSS reader here for a feed of the latest messages in all forums