|
Discussion Forums
|
Thread: Massive (500) Internal Server Error.outage started 35 minutes ago
|
|
Posts:
25
Registered:
2/5/08
|
|
|
|
NotSignedUp error
Posted:
Feb 15, 2008 8:54 AM PST
in response to: maxcc
|
|
|
We are seeing this error now:
<?xml version="1.0" encoding="UTF-8"?>
<Error>
<Code>NotSignedUp</Code>
<Message>Your account is not signed up for the S3 service. You must sign up before you can use S3.</Message>
<RequestId>FF2320A937DB33B9</RequestId>
<HostId>Oa6nfKEAVg93nTZHifC8qR/iV4TAI7BhyZIctQxgCYKhShd8DvCuxjFVlHyAW+ta</HostId>
</Error>
Some requests work, others fail consistently with the above. It seems to be related to the object requested.
|
|
Posts:
163
Registered:
2/8/06
|
|
|
|
Re: Massive (500) Internal Server Error.outage started 35 minutes ago
Posted:
Feb 15, 2008 9:09 AM PST
in response to: ronaldklip
|
|
|
This morning’s issue has been resolved and the system is continuing to recover. However, we are currently seeing slightly elevated error rates for some customers, and are actively working to resolve this. More information on that to follow as we have it.
Also, we wanted to reiterate per our previous post that we will absolutely be posting technical information about what happened earlier this morning; our current priority of course is to ensure that the service recovers as quickly as possible and remains stable. We appreciate your patience while we do so.
Thanks,
Kathrin
|
|
Posts:
9
Registered:
2/1/08
|
|
|
|
Re: Ditto
Posted:
Feb 15, 2008 9:13 AM PST
in response to: Adknowledge
|
|
|
Lost connections here in Phoenix running on Qwest.
Funny thing, I can connect from from Denver data center (running on Qwest) and the pictures load.
UPDATE: While I'm typing this it came back up.
This is going to make me sound like an idiot: I have yet to figure out exactly why, but refreshing the browser on pages that have s3 pictures to see if it's back up doesn't work. You have to close the browser and reopen the window. It's odd behavior (am I just missing something?). I get people in the office walking up to me and tell me it's back up, but I'm sitting there hitting refresh and getting nothing. Close the window and reopen, and it's wow, everything is back.
Eugene
|
|
Posts:
1
Registered:
2/15/08
|
|
|
|
Re: Massive (500) Internal Server Error.outage started 35 minutes ago
Posted:
Feb 15, 2008 9:17 AM PST
in response to: maxcc
|
|
|
Hi, I am still no tabl eto access my files. Any update? thx
|
|
Posts:
60
Registered:
6/13/06
|
|
|
|
Re: Massive (500) Internal Server Error.outage started 35 minutes ago
Posted:
Feb 15, 2008 9:25 AM PST
in response to: vrmdr123
|
|
|
> Hi, I am still no tabl eto access my files. Any update? thx
Hi, Ar eyo usur etha tyou typed thef ilename scorrectly? thx
|
|
Posts:
7
Registered:
2/15/08
|
|
|
|
Re: Massive (500) Internal Server Error.outage started 35 minutes ago
Posted:
Feb 15, 2008 10:56 AM PST
in response to: Kathrin@AWS
|
|
|
Hi Kathrin
Still we are not able to upload any files to our account in S3... we had updated our Access Key during the downtime... is that causing any problem??
how looong is th problem going to be there.. Its just becoming too difficult for us to respond to our users.
Just tried Twitter and the upload works there..
|
|
Posts:
1
Registered:
2/15/08
|
|
|
|
Re: Ditto
Posted:
Feb 15, 2008 11:47 AM PST
in response to: Avid Gamer
|
|
|
There is a difference between hitting "refresh" in your browser and hitting "shift-refresh". (At least, it's shift-refresh in Firefox, not sure about other browsers.) shift-refresh forces pages and images to be reloaded, normal refresh just asks the server if things have changed.
Next time, try shift-refresh.
|
|
Posts:
1
Registered:
2/15/08
|
|
|
|
Re: Massive (500) Internal Server Error.outage started 35 minutes ago
Posted:
Feb 15, 2008 12:11 PM PST
in response to: maxcc
|
|
|
Here's a post
on this from an app perspective.
Bob Lozano
www.appistry.com/blogs/bob
|
|
Posts:
163
Registered:
2/8/06
|
|
|
|
Re: Massive (500) Internal Server Error.outage started 35 minutes ago
Posted:
Feb 15, 2008 12:35 PM PST
in response to: Roberto Lozano
|
|
|
Quick note to keep everyone up to date.
The team continues to be heads down focused on getting to root cause on this morning’s problem.
One of our three geographic locations for S3 was unreachable beginning at 4:31 a.m. PST and was back to near normal performance at 6:48 a.m. PST (a small number of customers experienced intermittent issues for a short period thereafter).
Though we're proud of our uptime track record over the past two years with this service, any amount of downtime is unacceptable and we won't be satisfied until it's perfect.
We will be providing additional information on this thread as soon as we have it.
Sincerely,
The Amazon Web Services Team
|
|
Posts:
10
Registered:
1/23/08
|
|
|
|
Re: Massive (500) Internal Server Error.outage started 35 minutes ago
Posted:
Feb 15, 2008 1:43 PM PST
in response to: maxcc
|
|
|
Odd, we have had no outages on our services. We monitor our instances inside and outside of the cloud and have not experienced anything that caused an alert.
|
|
Posts:
420
Registered:
12/22/07
|
|
|
|
Re: Massive (500) Internal Server Error.outage started 35 minutes ago
Posted:
Feb 15, 2008 3:23 PM PST
in response to: ImpactGames
|
|
|
Yes, it was not a generalized failure, only 1(yes, it was to much) out of 3 failed.
We also did not have any outages.
JO
|
|
Posts:
16
Registered:
1/30/08
|
|
|
|
Re: Massive (500) Internal Server Error.outage started 35 minutes ago
Posted:
Feb 15, 2008 3:37 PM PST
in response to: maxcc
|
|
|
outages are expected, everyone will atleast have one outage every year, nothing new, this is common in the servers and hosting business, even the harddrive industry has outages to
its atleast good that amazon fixed the issue within 2 hours thats fast if you compare other companys that might fix it in a day or two
keep up the good work aws staff
Best regards
Tommy
|
|
Posts:
11
Registered:
10/13/07
|
|
|
|
Re: Massive (500) Internal Server Error.outage started 35 minutes ago
Posted:
Feb 15, 2008 3:51 PM PST
in response to: maxcc
|
|
|
So what's the plan to ensure something like this doesn't happen again? Something tells me 1 out of 3 datacenter failing should still allow the traffic to re-route via the other 2. Are there multiple location backups?
|
|
Posts:
3
Registered:
5/6/06
|
|
|
|
Re: Massive (500) Internal Server Error.outage started 35 minutes ago
Posted:
Feb 15, 2008 7:48 PM PST
in response to: Kathrin@AWS
|
|
|
Failures are to be expected with any system, but I do think this episode points to the need for the AWS forums to be isolated from the services themselves. Although you got the forums working again quickly, well before the services were back in good order, for the future you should try to ensure that something which affects the services we rely on will not affect our ability to contact you (or fellow users) or find news about what's happening.
|
|
Posts:
163
Registered:
2/8/06
|
|
|
|
Re: Massive (500) Internal Server Error.outage started 35 minutes ago
Posted:
Feb 15, 2008 8:56 PM PST
in response to: tastyeng
|
|
|
Here’s some additional detail about the problem we experienced earlier today.
Early this morning, at 3:30am PST, we started seeing elevated levels of authenticated requests from multiple users in one of our locations. While we carefully monitor our overall request volumes and these remained within normal ranges, we had not been monitoring the proportion of authenticated requests. Importantly, these cryptographic requests consume more resources per call than other request types.
Shortly before 4:00am PST, we began to see several other users significantly increase their volume of authenticated calls. The last of these pushed the authentication service over its maximum capacity before we could complete putting new capacity in place. In addition to processing authenticated requests, the authentication service also performs account validation on every request Amazon S3 handles. This caused Amazon S3 to be unable to process any requests in that location, beginning at 4:31am PST. By 6:48am PST, we had moved enough capacity online to resolve the issue.
As we said earlier today, though we're proud of our uptime track record over the past two years with this service, any amount of downtime is unacceptable. As part of the post mortem for this event, we have identified a set of short-term actions as well as longer term improvements. We are taking immediate action on the following: (a) improving our monitoring of the proportion of authenticated requests; (b) further increasing our authentication service capacity; and (c) adding additional defensive measures around the authenticated calls. Additionally, we’ve begun work on a service health dashboard, and expect to release that shortly.
Sincerely,
The Amazon Web Services Team
|
|
|
|