Discussion Forums



Thread: Lots of internal errors on SQS over the last few days

Welcome, Guest Help
Login Login


Permlink Replies: 16 - Pages: 2 [ 1 2 | Next ] - Last Post: Sep 4, 2008 1:11 PM by: Paul Dowman Threads: [ Previous | Next ]
Paul Dowman
RealName(TM)


Posts: 32
Registered: 6/10/07
Lots of internal errors on SQS over the last few days
Posted: Sep 2, 2008 2:28 PM PDT
  Click to reply to this thread Reply

I've been seeing a *lot* of internal errors (and various other errors) on SQS over the last few days, but there's no mention on the AWS status page ( http://status.aws.amazon.com/).

Is anyone else finding the same thing?


ylastic

Posts: 86
Registered: 6/2/08
Re: Lots of internal errors on SQS over the last few days
Posted: Sep 3, 2008 4:59 AM PDT   in response to: Paul Dowman
  Click to reply to this thread Reply

We are seeing the same thing. It seems to happen a lot over the last two days.

thanks
the Ylastic team

Michael@AWS

Posts: 159
Registered: 5/30/08
Re: Lots of internal errors on SQS over the last few days
Posted: Sep 3, 2008 7:13 AM PDT   in response to: ylastic
  Click to reply to this thread Reply

Hi,

Can you provide RequestIds for the internal errors so that we can take a look into this?

Thanks,
Michael


Paul Dowman
RealName(TM)


Posts: 32
Registered: 6/10/07
Re: Lots of internal errors on SQS over the last few days
Posted: Sep 3, 2008 12:51 PM PDT   in response to: Michael@AWS
  Click to reply to this thread Reply

Well I can't really take the time to dig through the last five days worth of logs at the moment but here are two from within the last hour or so:

fd75ad5b-2440-4467-b642-6c89ff9d6ded
e90384c8-fcc1-404d-98e8-bb5089425306

The response is 500 Internal Server Error


Paul Dowman
RealName(TM)


Posts: 32
Registered: 6/10/07
Re: Lots of internal errors on SQS over the last few days
Posted: Sep 3, 2008 3:07 PM PDT   in response to: Michael@AWS
  Click to reply to this thread Reply

I looked into it further and it seems that only certain types of messages are causing it. I have found a message right now that can cause it every time. I don't have time at the moment but I'll try to narrow down what the issue might be.

What's odd is that this only started a few days ago, maybe Friday if I recall correctly.


Justin@AWS

Posts: 913
Registered: 12/13/06
Re: Lots of internal errors on SQS over the last few days
Posted: Sep 3, 2008 3:48 PM PDT   in response to: Paul Dowman
  Click to reply to this thread Reply

Hi Paul,

Thank you for providing request ids.  I just sent you a private message regarding your use case.  We'd like to get a little more info from you.

Regards,
Justin



tipmobilemirko

Posts: 10
Registered: 4/28/08
Re: Lots of internal errors on SQS over the last few days
Posted: Sep 3, 2008 5:24 PM PDT   in response to: Justin@AWS
  Click to reply to this thread Reply

I've been getting similar intermittent errors when reading from SQS queues since last Thursday. This is one of the request IDs:

e573b4b0-656a-4e84-97fb-63d188971b0c

Also see this thread: http://developer.amazonwebservices.com/connect/thread.jspa?messageID=99951&#99951

-Mirko

Joel@AWS

Posts: 50
Registered: 7/6/07
Re: Lots of internal errors on SQS over the last few days
Posted: Sep 3, 2008 5:33 PM PDT   in response to: tipmobilemirko
  Click to reply to this thread Reply

Hi Mirko,

Thanks for letting us know - can you provide the date/hour when this request occurred?  We can research it in the SQS logs.


Thanks,

Joel



Paul Dowman
RealName(TM)


Posts: 32
Registered: 6/10/07
Re: Lots of internal errors on SQS over the last few days
Posted: Sep 4, 2008 8:08 AM PDT   in response to: Paul Dowman
  Click to reply to this thread Reply

Well since AWS support haven't solved this yet and it's causing a lot of failures on my production systems I've been forced to investigate. The answer is 7891.

The problem seems to be that requests of 7891 bytes or larger cause SQS to fail.

Here's a little ruby script that shows the problem:

s3_config = YAML::load_file("#{RAILS_ROOT}/config/s3.yml")[RAILS_ENV]
sqs = RightAws::SqsGen2.new(s3_config['aws_access_key'], s3_config['aws_secret_access_key'])
q = RightAws::SqsGen2::Queue.create(sqs, "pauldowman-test", true)

# succeeds every time:
100.times do
  if (q.send_message("x" * 7890) rescue false)
    puts "succeeded."
  else
    puts "FAIL!"
  end
end

# fails every time:
100.times do
  if (q.send_message("x" * 7891) rescue false)
    puts "succeeded."
  else
    puts "FAIL!"
  end
end

I'm going to run this failing loop for the rest of the day in the hopes that if I cause enough 500 errors Amazon will update the status on the service health dashboard. ;-)

Seriously though, this looks like a massive SQS failure from my point of view, it's been going on for about a week now, and while I appreciate the quick response from AWS support requesting more info, they haven't solved it yet. My analysis (assuming it's correct) wasn't too difficult and I'd expect them to have come to the same conclusion much sooner.




Paul Dowman
RealName(TM)


Posts: 32
Registered: 6/10/07
Re: Lots of internal errors on SQS over the last few days
Posted: Sep 4, 2008 8:21 AM PDT   in response to: Paul Dowman
  Click to reply to this thread Reply

Oh, and FYI, if I give a message body larger than 8192 bytes I do get an error as expected: "InvalidParameterValue: Value for parameter MessageBody is invalid. Reason: Message body must be shorter than 8192 characters"


Justin@AWS

Posts: 913
Registered: 12/13/06
Re: Lots of internal errors on SQS over the last few days
Posted: Sep 4, 2008 8:41 AM PDT   in response to: Paul Dowman
  Click to reply to this thread Reply

Hi Paul,

We can confirm the problem that you are experiencing.  The fix is in the final stage of testing and we will let you know when it has been fully rolled out.

Regards,
Justin



ylastic

Posts: 86
Registered: 6/2/08
Re: Lots of internal errors on SQS over the last few days
Posted: Sep 4, 2008 8:52 AM PDT   in response to: Paul Dowman
  Click to reply to this thread Reply

Thanks Paul for the terrific detective work. Wish the status dashboard was updated the minute AWS is aware that there is an issue, so everyone is aware of it ...

thanks
the Ylastic team

Paul Dowman
RealName(TM)


Posts: 32
Registered: 6/10/07
Re: Lots of internal errors on SQS over the last few days
Posted: Sep 4, 2008 8:53 AM PDT   in response to: Justin@AWS
  Click to reply to this thread Reply

Great, thanks for the quick response!

I've already adjusted my max message size anyway. :-)


Paul Dowman
RealName(TM)


Posts: 32
Registered: 6/10/07
Re: Lots of internal errors on SQS over the last few days
Posted: Sep 4, 2008 8:58 AM PDT   in response to: Paul Dowman
  Click to reply to this thread Reply

However, the service health dashboard still says "Service is operating normally".

tipmobilemirko

Posts: 10
Registered: 4/28/08
Re: Lots of internal errors on SQS over the last few days
Posted: Sep 4, 2008 10:16 AM PDT   in response to: Joel@AWS
  Click to reply to this thread Reply

Hi Joel,

Thanks for looking into this.

This particular request (with request id e573b4b0-656a-4e84-97fb-63d188971b0c) occurred on 9/2 at around 18:11 UTC. See the log excerpt below. The error message was received on 18:12:38, but the request expiration is stated as 18:11:08, so the request must have been submitted some time before.

My first suspicion was that the clock might be off on the instances, but I have confirmed that this is not the case.

Going through my logs, I have not found any errors since 9/2, but we also haven't send or received any messages since then, so I can't safely conclude that the problem has been fixed yet. I wonder if the problem is somehow related to specific messages. In my case, all messages are short, so I am definitely not bumping into the message length problem mentioned on this thread.

E, [2008-09-02T18:12:38.646990 #17428] ERROR -- : RequestExpired: Request has expired. Expires date is 2008-09-02T18:11:08Z. (RightAws::AwsError)
/usr/lib/ruby/gems/1.8/gems/right_aws-1.7.3/lib/awsbase/right_awsbase.rb:259:in `request_info_impl'
/usr/lib/ruby/gems/1.8/gems/right_aws-1.7.3/lib/sqs/right_sqs_gen2_interface.rb:151:in `request_info'
/usr/lib/ruby/gems/1.8/gems/right_aws-1.7.3/lib/sqs/right_sqs_gen2_interface.rb:243:in `receive_message'
/usr/lib/ruby/gems/1.8/gems/right_aws-1.7.3/lib/sqs/right_sqs_gen2.rb:163:in `receive_messages'


Point your RSS reader here for a feed of the latest messages in all forums