Discussion Forums



Thread: Massive (500) Internal Server Error.outage started 35 minutes ago

Welcome, Guest Help
Login Login


Permlink Replies: 116 - Pages: 8 [ Previous | 1 2 3 4 5 6 7 8 | Next ] - Last Post: Feb 21, 2008 1:43 PM by: sequoyan
Adknowledge

Posts: 25
Registered: 2/5/08
NotSignedUp error
Posted: Feb 15, 2008 8:54 AM PST   in response to: maxcc
  Click to reply to this thread Reply

We are seeing this error now:

<?xml version="1.0" encoding="UTF-8"?>
<Error>
<Code>NotSignedUp</Code>
<Message>Your account is not signed up for the S3 service. You must sign up before you can use S3.</Message>
<RequestId>FF2320A937DB33B9</RequestId>
<HostId>Oa6nfKEAVg93nTZHifC8qR/iV4TAI7BhyZIctQxgCYKhShd8DvCuxjFVlHyAW+ta</HostId>
</Error>

Some requests work, others fail consistently with the above.  It seems to be related to the object requested.

Kathrin@AWS

Posts: 163
Registered: 2/8/06
Re: Massive (500) Internal Server Error.outage started 35 minutes ago
Posted: Feb 15, 2008 9:09 AM PST   in response to: ronaldklip
  Click to reply to this thread Reply

This morning’s issue has been resolved and the system is continuing to recover. However, we are currently seeing slightly elevated error rates for some customers, and are actively working to resolve this.  More information on that to follow as we have it.

Also, we wanted to reiterate per our previous post that we will absolutely be posting technical information about what happened earlier this morning; our current priority of course is to ensure that the service recovers as quickly as possible and remains stable. We appreciate your patience while we do so.

Thanks,
Kathrin


Avid Gamer

Posts: 9
Registered: 2/1/08
Re: Ditto
Posted: Feb 15, 2008 9:13 AM PST   in response to: Adknowledge
  Click to reply to this thread Reply

Lost connections here in Phoenix running on Qwest.

Funny thing, I can connect from from Denver data center (running on Qwest) and the pictures load.

UPDATE: While I'm typing this it came back up.

This is going to make me sound like an idiot: I have yet to figure out exactly why, but refreshing the browser on pages that have s3 pictures to see if it's back up doesn't work. You have to close the browser and reopen the window. It's odd behavior (am I just missing something?). I get people in the office walking up to me and tell me it's back up, but I'm sitting there hitting refresh and getting nothing. Close the window and reopen, and it's wow, everything is back.

Eugene



vrmdr123

Posts: 1
Registered: 2/15/08
Re: Massive (500) Internal Server Error.outage started 35 minutes ago
Posted: Feb 15, 2008 9:17 AM PST   in response to: maxcc
  Click to reply to this thread Reply

Hi,  I am still no tabl eto access my files.  Any update?  thx

Jason Kester
RealName(TM)


Posts: 60
Registered: 6/13/06
Re: Massive (500) Internal Server Error.outage started 35 minutes ago
Posted: Feb 15, 2008 9:25 AM PST   in response to: vrmdr123
  Click to reply to this thread Reply

> Hi,  I am still no tabl eto access my files.  Any update?  thx

Hi, Ar eyo usur etha tyou typed thef ilename scorrectly? thx

kalyanindia123

Posts: 7
Registered: 2/15/08
Re: Massive (500) Internal Server Error.outage started 35 minutes ago
Posted: Feb 15, 2008 10:56 AM PST   in response to: Kathrin@AWS
  Click to reply to this thread Reply

Hi Kathrin

Still we are not able to upload any files to our account in S3... we had updated our Access Key during the downtime... is that causing any problem??


how looong is th problem going to be there.. Its just becoming too difficult for us to respond to our users.

Just tried Twitter and the upload works there..




terryray

Posts: 1
Registered: 2/15/08
Re: Ditto
Posted: Feb 15, 2008 11:47 AM PST   in response to: Avid Gamer
  Click to reply to this thread Reply

There is a difference between hitting "refresh" in your browser and hitting "shift-refresh".  (At least, it's shift-refresh in Firefox, not sure about other browsers.)  shift-refresh forces pages and images to be reloaded, normal refresh just asks the server if things have changed.

Next time, try shift-refresh.


Roberto Lozano

Posts: 1
Registered: 2/15/08
Re: Massive (500) Internal Server Error.outage started 35 minutes ago
Posted: Feb 15, 2008 12:11 PM PST   in response to: maxcc
  Click to reply to this thread Reply

Here's a post on this from an app perspective.

Bob Lozano
www.appistry.com/blogs/bob



Kathrin@AWS

Posts: 163
Registered: 2/8/06
Re: Massive (500) Internal Server Error.outage started 35 minutes ago
Posted: Feb 15, 2008 12:35 PM PST   in response to: Roberto Lozano
  Click to reply to this thread Reply

Quick note to keep everyone up to date.   The team continues to be heads down focused on getting to root cause on this morning’s problem.   One of our three geographic locations for S3 was unreachable beginning at 4:31 a.m. PST and was back to near normal performance at 6:48 a.m. PST (a small number of customers experienced intermittent issues for a short period thereafter).   Though we're proud of our uptime track record over the past two years with this service, any amount of downtime is unacceptable and we won't be satisfied until it's perfect.   We will be providing additional information on this thread as soon as we have it.

Sincerely,
The Amazon Web Services Team




ImpactGames

Posts: 10
Registered: 1/23/08
Re: Massive (500) Internal Server Error.outage started 35 minutes ago
Posted: Feb 15, 2008 1:43 PM PST   in response to: maxcc
  Click to reply to this thread Reply

Odd, we have had no outages on our services.  We monitor our instances inside and outside of the cloud and have not experienced anything that caused an alert.


Jorge Oliveira
RealName(TM)


Posts: 420
Registered: 12/22/07
Re: Massive (500) Internal Server Error.outage started 35 minutes ago
Posted: Feb 15, 2008 3:23 PM PST   in response to: ImpactGames
  Click to reply to this thread Reply

Yes, it was not a generalized failure, only 1(yes, it was to much) out of 3 failed.
We also did not have any outages.

JO


Tommy

Posts: 16
Registered: 1/30/08
Re: Massive (500) Internal Server Error.outage started 35 minutes ago
Posted: Feb 15, 2008 3:37 PM PST   in response to: maxcc
  Click to reply to this thread Reply

outages are expected, everyone will atleast have one outage every year, nothing new, this is common in the servers and hosting business, even the harddrive industry has outages to

its atleast good that amazon fixed the issue within 2 hours thats fast if you compare other companys that might fix it in a day or two

keep up the good work aws staff

Best regards
Tommy



"kappaknight"

Posts: 11
Registered: 10/13/07
Re: Massive (500) Internal Server Error.outage started 35 minutes ago
Posted: Feb 15, 2008 3:51 PM PST   in response to: maxcc
  Click to reply to this thread Reply

So what's the plan to ensure something like this doesn't happen again?  Something tells me 1 out of 3 datacenter failing should still allow the traffic to re-route via the other 2.  Are there multiple location backups?


tastyeng

Posts: 3
Registered: 5/6/06
Re: Massive (500) Internal Server Error.outage started 35 minutes ago
Posted: Feb 15, 2008 7:48 PM PST   in response to: Kathrin@AWS
  Click to reply to this thread Reply

Failures are to be expected with any system, but I do think this episode points to the need for the AWS forums to be isolated from the services themselves. Although you got the forums working again quickly, well before the services were back in good order, for the future you should try to ensure that something which affects the services we rely on will not affect our ability to contact you (or fellow users) or find news about what's happening.

Kathrin@AWS

Posts: 163
Registered: 2/8/06
Re: Massive (500) Internal Server Error.outage started 35 minutes ago
Posted: Feb 15, 2008 8:56 PM PST   in response to: tastyeng
  Click to reply to this thread Reply

Here’s some additional detail about the problem we experienced earlier today.

Early this morning, at 3:30am PST, we started seeing elevated levels of authenticated requests from multiple users in one of our locations.  While we carefully monitor our overall request volumes and these remained within normal ranges, we had not been monitoring the proportion of authenticated requests.  Importantly, these cryptographic requests consume more resources per call than other request types.

Shortly before 4:00am PST, we began to see several other users significantly increase their volume of authenticated calls.  The last of these pushed the authentication service over its maximum capacity before we could complete putting new capacity in place.  In addition to processing authenticated requests, the authentication service also performs account validation on every request Amazon S3 handles.  This caused Amazon S3 to be unable to process any requests in that location, beginning at 4:31am PST.  By 6:48am PST, we had moved enough capacity online to resolve the issue.

As we said earlier today, though we're proud of our uptime track record over the past two years with this service, any amount of downtime is unacceptable.  As part of the post mortem for this event, we have identified a set of short-term actions as well as longer term improvements.  We are taking immediate action on the following:  (a) improving our monitoring of the proportion of authenticated requests; (b) further increasing our authentication service capacity; and (c) adding additional defensive measures around the authenticated calls.  Additionally, we’ve begun work on a service health dashboard, and expect to release that shortly.

Sincerely,
The Amazon Web Services Team



Point your RSS reader here for a feed of the latest messages in all forums