Current Time: 15:26:04 PDT

What's Up?

If you are experiencing a problem that has not been reported here, check our web panel for more information.

(Please remember, posting in the comments here IS NOT an official way to contact DreamHost.)

Search

Pages

Categories

Other Stuff

9:59 am

Frisky Cluster Downtime

Posted (July 18th, 2008 at 9:59 am PST) by brians

We are currently experiencing on outage on one of the fileservers that your machines are mounted to. This will explain the connection issues that you are encountering when attempting to access your website, email, etc. Our Admins are currently working to resolve the issue. We sincerely apologize for any inconvenience this may have caused. Please check back here for future updates.

UPDATE: Had to rebuild a few drives. The fileserver is now fully functional. You should not experience any further issues with website, email, etc access. Again, we apologize for the inconvenience the downtime may have caused.

This entry was posted on Friday, July 18th, 2008 at 9:59 am and is filed under General Outages. You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.

39 Responses to “Frisky Cluster Downtime”

Strange behavior - all my services are hung. Connections don’t fail, they just try forever… Any network gurus know what that means?

I would call it a HIGH severity issue since sites are down in PACIFIC TIME PRIME AT-WORK VIEWING TIME ….we were having a great morning and then KABOOM.

@wsb - I totally agree. DreamHost Severity levels are NOT useful to end-users. “Medium” probably means that they don’t expect it’ll be too hard to fix. I’ve written previously regarding changing this to Severity (how bad to end users), Impact (how many end users affected), and Effort (estimated difficulty/time to fix).

July 18th, 2008 at 10:09 amRoger Snyder Says:

I only see frisky listed as my email server. Esprit is the file and web server, but all my sites are down. Frisky affects them too?

DEAR GOD I hope you are right about the severity level interpretation. We were having a better-than-usual morning because we were somehow fifth Google result for “Starbucks closure list” because of a news item last night - probably losing half a dozen pageviews per minute on that alone.

I like how it doesn’t matter what cluster you say goes down, my website always seems to be included in it.

@Roger Snyder - For some reason (I’m not a network admin) the “email server” is also the “cluster” server. I think they should just relabel the panel to call it “Cluster/Email” to prevent confusion.

@wsb - Half a dozen pageviews per minute! I was in the middle of writing a blog post, but I don’t get half a dozen pageviews per _day_ (yet). Here’s hoping it’s back up soon.

Well, that explains why webmail has been timing out all morning, LOL.

Is there an estimate on how long this could take to resolve, by any chance?

From our end, all Dreamhost services are being blocked at ATT, at least for uVerse. Submitted a trouble ticket, and they just suggested I contact ATT

My site seems to be back up at the moment. Pretty darn good if you ask me. By the time I had submit my ticket this blog post was already up. Other hosts will leave you in the dark for hours.

My e-mail is on frisky and i haven’t had issues :S

I just had a small glitch a few momments ago.

Whatever it is, we were doing OK till about 40 minutes ago. Then it died. StatCounter gives us the “live” picture of who’s coming into the site and when, in real time, and it verifies nobody’s getting thru. The six pageviews/minute I’m probably losing on the g-searches for sbux closure list are in addition to the dozen per minute we usually get at this time of day. (Small potatoes for regional/national sites, we are a frequently updated neighborhood news site.) I have already written the apology post to put up … followed by the news story I was working on when it died … While the site is simply hourglassing for me, a reader just wrote in to say he finally got the 500 ISE message. My favorite (not) …

My site is part of that cluster as well.. and I also can’t understand how somebody can call downtime of an entire cluster on a friday morning a medium severity issue. This is really serious.

But anyway.. hopefully they’ll fix this soon.

July 18th, 2008 at 10:26 amHere We Go Again Says:

When submitting my ticket to support, I get “Your website doesn’t appear in the apache config file”. Anyone else get this message?

DH - Medium is laughable. Anytime anyone loses business because of an outage it’s severe.

My site was down for the past 90 minutes or so, just seems to have came back. Up this is the end of the issues.

How is the severity Medium. Our site is dead. That’s like saying to a mangled corpse ‘ah atleast it could have been worse”.

Maybe you should wake up Chuckles and ask him to fix your rust-bucket servers.

Thanks Nyhm.

I see other sites are back up, but not mine. :(

We’re not back yet either. Anybody have an idea, and yes I KNOW WE ARE PAYING INEXPENSIVE PRICES AND THEREFORE WE GET WHAT WE DESERVE ETC. but - technically, theoretically, should a company have backups so that if machine X dies all its sites can be operated off machine Y till it’s back? Please forgive me for being too low-tech to know the answer. I worked in newspapers for a long time and didn’t understand how to run the printing press, either, just knew that if it was down, we were down. Not this same kind of crisis, tho!!!!

Mine is still down as well. :(

Is there no redundancy?

Any estimation on a fix time?

I seem to be back up, but I’m not confident…

BACK
15 pageviews/minute right now, which means I may have lost 900ish. Good luck to all …

If this is medium …what would high be?

@wbs - that’s what a cluster (usually) is for. The idea is that you have 10 machines running at the same time, being able to serve the same thousand web sites. If one or two of those go down, its not a big deal, the other ones can handle the traffic..

The problem is when the entire cluster (all 10 machines, for example), go down, like what seems to be happening…

Thank you, Eduardo!!!

BACK! WOOHOO!!

Still down for me.. Sigh..

I’m still getting a “Failed to Connect”. Is this related?

@warren - it must be.. Sometimes I get a connection refused, sometimes a cannot connect..

It looks like Frisky went down again around the time of this post. My stuff on hummer is offline again.

I’m still down. getting “Your website does not appear in the Apache configuration file.” in my Status Check under support for every one of my domains…

All sites are still down…

The status on the post was changed to Resolved, but my site is still down!!!

This is what I get when I ssh into the machine:
Could not chdir to home directory /home/escoz: No such file or directory

Anybody else with the same problem?

I am still down too guys!

RESOLVED? What are you guys smoking? My clients would like their sites back up now, please …

They are now saying Esprit is dead,

“D’oh!! Esprit’s not coming back from a reboot. I’m moving it over to new ahrdware right now, should be back in about an hour. Sorry about that!”

the ticket I opened this morning was just moved to some new queue.. Lets see how long it’ll take for them to restore my home folder.

What happened guys? ddos? Hardware malfunction? Nuclear bomb..? lol

 
© 1996-2008, DreamHost.com
Entries (RSS) and Comments (RSS).