Current Time: 15:26:04 PDT

What's Up?

If you are experiencing a problem that has not been reported here, check our web panel for more information.

(Please remember, posting in the comments here IS NOT an official way to contact DreamHost.)

Search

Pages

Categories

Other Stuff

5:31 am

Network blip for some customers

Posted (July 31st, 2008 at 5:31 am PST) by Kelly

This is directly related to the network turn-up we did late last night referenced here:

http://www.dreamhoststatus.com/2008/07/30/network-maintenance-tonight-0730/

It appears that the IP access lists were not matching with GlobalCrossing and ourselves, causing a subset of our users whom routed in via GlobalCrossing to a specific subset of our IP’s to not be able to reach DreamHost. The GlobalCrossing link has been shut down until we can rectify this with their BGP team. All traffic is flowing over our other internet connection for now. We apologize for not catching this sooner, we simply did not see any support volume related until people began to wake up about 30 minutes ago!

Update (Minutes later!): I just got off the phone with GlobalCrossing and it seems that some interaction between our two Cisco routers was causing some form of CPU spike on both sides. We are both in the process of opening cases with Cisco support to figure out what was causing this problem. The plot thickens! We will be leaving these uplinks offline until this bug can be resolved.

This entry was posted on Thursday, July 31st, 2008 at 5:31 am and is filed under General Outages. You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.

14 Responses to “Network blip for some customers”

So that’s why it’s taking my site about a week and a half to load? I feel special!

Not *that* special, Bill, my websites are experiencing the same problem too! :(

“we simply did not see any support volume related until people began to wake up about 30 minutes ago” — not really a good excuse on DH’s part. What I do at my work is write a simple HTTP testing script that runs once every X number of minutes, attempting to connect to a few of my business critical sites and tries to parse some HTML. If I cannot do it, I email myself a warning. If Dreamhost can do this on one site per server, then it wouldn’t have to wait until customers noticing problems before attempting to fix! It’s called being proactive.

I’d like to add that is’t pretty hard to submit trouble tickets through the dreamhost panel when there’s about 50% packet loss to that server, so I don’t think they actually received that many reports.

But anyway, my piece of the internet is working again, and the speed seems normal.

My site stats are being reported Sept 6 2144 as the date on my sites. Support is aware of it and working on it.

Thank god I turned off my adwords. I tried to log in to my site this morning just to see how it looks and after like 2 and half hours, it said, failed to load! Now it comes up but it’s too slow. I am not going to burn any cash on Google for people to see a blank page. :(

Dear Colleagues

I’m just plain relieved … it is much more comforting to know that it is DH, GlobalCrossing, Cisco et al that are having a glitch rather than little me with my very limited technical capacity. I have had a worrying 24 hours or so wondering why my “dates” are all over the place! Good luck DH friends!

Hmm… My site seems to be working fine. A little on the slow end, but it loads anyway. However, I can’t login to my FTP account…

Connection attempt failed with “EAI_NONAME - Neither nodename nor servname provided, or not known”.

Looks like another day of non-productivity… I’m going golfing.

My dates are fixed………

@Pete
you may be right… but as it turns out, this is only affecting users whose internet connections are following a specific routing
to add complexity, it only affected that number of customers if they tried to access a small subset of servers inside DH

when something like that happens, you either find out because your customers start telling you, or because the simple HTTP tester pings every server in Dreamhost (in the tens of thousands IIRC) from internet connections from every ISP out there. Not very practical…

in fact, it’s possible that your script running from inside DH would have never noticed any failure

Here we go again
Sites down every half hour now. Before was every 10 minutes…lol

From here it looks like the GBLX link is online again, this time without the packet loss. Let’s hope it keeps working this time!

Why next to Resolved it says Yes? This isn’t resolved is it? I still can’t get into my FTP account???

Should not be resolved.
Sites and FTP are going down every 30 minutes now…LOL. getting better. It was every 10 minutes

@Viking:

Well, where I work, when something goes wrong, the first question my manager would ask after fixing everything is “how do we prevent this in the future?” I’m sure your manager would behave similarly. In this case, the problem is that some sites went down with DH’s realization, and the down time was long enough for customers to notice. How can DH prevent this in the future? Run a HTTP tester to parse one web page on each one of the web servers. Maybe this can run once every 10 minutes or whatever interval deemed more appropriate/feasible.

Is this a lot of servers? Yes. Is it going to be resource intensive? Not really, given it will take a fraction of a second for each test. How long will it take to write such a script or program? 30 minutes or less for those familiar with Perl, Java, or whatever scripting/programming language that’s up to the task. What’s the potential gain from this? Infinite. Imagine being able to tell the customers that “hey we found a networking issue affecting your web server 5 minutes ago, but we’ve already taken care of it” instead of waiting for the customer to file a ticket and go “oh thanks for noticing us that our server is down”.

Small amount of work for infinite gains, don’t you think? At least that’s how things are working at my workplace.

By the way, that HTTP test I was talking about — I wrote it to monitor our many customer-facing extranet sites. It’s been running for a couple of years now, and it has saved from dealing with customer complaints many times.

 
© 1996-2008, DreamHost.com
Entries (RSS) and Comments (RSS).