|
Discussion Forums
|
Thread: Intermittent internal connectivity failures
|
|
|
Replies:
6
-
Pages:
1
-
Last Post:
Apr 29, 2008 7:34 AM
by: rtdev
|
|
|
Posts:
2
Registered:
3/10/08
|
|
|
|
Intermittent internal connectivity failures
Posted:
Apr 26, 2008 5:39 PM PDT
|
|
|
For the last 18 hours or so, we've been seeing intermittent connectivity failures between some of our load balancer instances and some of our web server instances. One instance (i-d7e227be) has lost connectivity several times during that period and another (i-d4e227bd) a few times around midnight.
The monitoring instances (i-0de12464, i-c1c100a8, i-cbe025a2) are doing health checks on the internal interfaces on ports 80 and 10050 and several times have reported a loss of connectivity (either connect or read failures). Each time, connectivity seems to been restored within a couple of minutes.
The instance itself seems to be healthy and we don't see anything alarming in the logs that would indicate that a user mode process has crashed or the instance was restarted. Everytime a connectivity failure has been reported, we've been able to SSH in successfully afterward.
Unfortunately, we haven't been able to time it so that we attempt to connect while it's still being reported as unavailable and therefore don't have any further information that would help localize the failure.
We'll probably just launch a new instance to take its place but wanted to report it in case it's indicative of failing hardware on the host or network infrastructure.
Thanks,
Erik
|
|
Posts:
371
Registered:
7/17/07
|
|
|
|
Re: Intermittent internal connectivity failures
Posted:
Apr 26, 2008 5:59 PM PDT
in response to: erikols
|
|
|
We are investigating...
|
|
Posts:
913
Registered:
12/13/06
|
|
|
|
Re: Intermittent internal connectivity failures
Posted:
Apr 26, 2008 9:12 PM PDT
in response to: erikols
|
|
|
After investigation, an intermittent connectivity issue was found and has been resolved. The issue was limited to a small subset of instances.
|
|
Posts:
2
Registered:
3/10/08
|
|
|
|
Re: Intermittent internal connectivity failures
Posted:
Apr 26, 2008 10:11 PM PDT
in response to: Justin@AWS
|
|
|
Thanks much for the prompt help, that's fantastic!
|
|
Posts:
163
Registered:
2/8/06
|
|
|
|
Re: Intermittent internal connectivity failures
Posted:
Apr 28, 2008 1:52 PM PDT
in response to: erikols
|
|
|
This performance issue, affecting a small number of instances in a single availability zone, was the result of a customer applying a very large set of firewall rules while simultaneously launching a very large number of instances.
The high volume of firewall rule changes, combined with an usual rule configuration, exposed a performance degradation bug in the distributed firewall that lives on the physical hosts. The issue has been resolved. In addition, we are also increasing the density of our monitoring to detect and isolate issues in this area of our infrastructure more rapidly.
Thanks,
Kathrin
|
|
Posts:
94
Registered:
6/6/06
|
|
|
|
Re: Intermittent internal connectivity failures
Posted:
Apr 28, 2008 2:22 PM PDT
in response to: Kathrin@AWS
|
|
|
Why doesn't the Service Health Dashboard page reflect these network failures?
|
|
Posts:
147
Registered:
3/28/08
|
|
|
|
Re: Intermittent internal connectivity failures
Posted:
Apr 29, 2008 7:34 AM PDT
in response to: erikols
|
|
|
Erik - sounds like your monitoring instances are doing a great job! I was wondering if you would be willing to share your monitoring scripts or discuss how you have gone about this.
We are in the same boat - running multiple load balancers with multiple upstream web servers. Our load balancer (nginx) is doing an incredible job, but unfortunately doesn't have any good way to notify us when it is unable to connect to an instance. So we need to do this externally. Rather than starting from scratch it would be great to leverage what others have done and of course I'll be happy to share the enhancements we make to it. Thanks in advance for considering this.
|
|
|
|