Filer down - causing high machine loads on our spunky cluster.
Posted (May 12th, 2007 at 12:06 pm PST) by mirWe are currently working on our of our central filer machines that has gone down - unfortunately I do not have details currently on the exact cause but our admin team is trying to get it back up as quickly as possible as it causes rather severe issues on any machines that have that have data from that machine mounted. As soon as it is back up everything will return to normal so we’ll post again as soon as we have more detailed information (we are expecting no more than an hour and hopefully much less for this outage).
Update: the filer is back up and appears to be fine, we are making sure all machines mounting data from that filer are working properly again and everything should be back to normal within the next 5-10 minutes (only two servers appear to have been down completely from this but we’re rebooting other servers that had high loads as a result of the outage).
Update: It seems that one of the fibrechannel cards was misseated/crashed in the file server. After reseating everything and replacing out a couple of adapters the machine booted up. All in all none of the hardware wound up needing replacing, but the items showing the fault lights were not what was broken, as is sometimes the case! I am not certain where the “two” number came from, it was closer to 12-20 machines directly affected. We believe most of them to be back online now. Perhaps a typo along the chain of information!
37 Responses to “Filer down - causing high machine loads on our spunky cluster.”
How many day without down this year???
This should be preventable with a failover backup server. Is there a reason why something this important is not implemented?
How would I even know if I had stuff on spunky? I know what server my site lives on…but since things are down now (like they seem to be a few times every month….),
Well, if you have anything on the razzle ftp server, you are shot out of luck right now.
:
I’ve been wondering what’s been going on. I sent in a support ticket and never got a reply (it’s been almost an hour ;x)
I hope you guys fix it soon :>
I’m coming round to the idea of changing hosts. There seems to be a lot of regular downtime. I’m wondering how you can compare downtime between hosts. May look at Hostmonster - see how they do.
spunky is my email server….why would that bring my sites to a dead stop???
Your email server represents the cluster you’re in. So, if you use spunky as your email server, that means you are in the spunky cluster.
@Lunesse
Your email server is also your file server
spunky is you cluster
i.e. i am on server razzle which is in the spunky cluster so my websites are down
^ Sorry for the dupe
Well, I had to freeze my Google AdWords campaign since all my sites are down (again). I’m on hoover, so I don’t understand why this affects my sites… that said, I’m fed up with the downtime and going to start researching other webhosts.
thanks guys! GRRR! My client has a knack for seeing this when it happens, and it’s a very busy musician site. *sigh*
This happens a LOT.
Will this outage affect a Database in a sever within the cluster?
I’m guessing this is what’s causing those Internal Server Error stuff for all of my sites as well. I’m on Web Server Razzle too.. so yep.
How in the world does this keep happening with Dreamhost? Doesn’t a cluster provide redundancy? If it’s software having the issue, don’t you have backups to restore from?
In the past, I sort of ignored all the continuous problems with Dreamhost (even though most of my friends have left now) because my sites were never that important. Now I actually want to create a site that I need to have up and it’s currently down. ![]()
I understand that people say “If you want to host something important, go someplace else or get a dedicated server”, but I guess I was hoping I could use Dreamhost’s shared hosting with reliability. I guess I’m wrong. I literally only need my new site for a couple of weeks (I’m just selling a car) and now I’m off to look at other options. I won’t be renewing in the future.
This is currently causing my website to “Timout”, and has been for about 1.5 hours. It would be nice if this were fixed ASAP!
Thanks for fixing things pretty quickly.
still everything down here again
already for several hours boehoe :-(((((((((((
same. everything for me is still very much down.
I’m still having problems even though it says the problem is resolved…
I had been planning on moving to DreamHost from my old host but I quit after watching this site for about 2 weeks (and having montastic pinging some sites I knew were on DreamHost). I’ve since moved to Media Temple and I must say I like it.
Their GPU limit is off for the time being (because it didn’t work right the first time they set it up) and it’s just excellent so far. (No need to calls to support - and the one time I did call - billing though - it took under 2 minutes to get in touch with an operator)
I only came to the forum now to find out why one of the blogs I usually read (which is hosted here) is down.
PS: Media Temple is a lot more expensive. But if you do a search for discount codes you should get the price lowered a little.
I will be switching here shortly:
http://www.lunarpages.com/basic-hosting/
I can’t seem to FTP (for a few hours now) to the Raiden server…. is this part of the cluster as well?
P.S. I have been with Dreamhost for 8 years and I must say that I have no complaints on downtime. I see no more downtime here than any other provider and the price is still the best out there. So although I can get frustrated at times like these, I don’t think coming in here and complaining is the way to get your point across since dreamhost staff may never read it and not too many customers will see it either. Just my two cents
Panel down. Can’t report that it’s down because the panel is required.
This is a FUNDAMENTAL problem of requiring the panel to report outages.
DH *MUST* put a method in place to report via email ASAP or go to a DRASTICALLY SIMPLIFIED web submission system.
What’s the point of asking for a callback if you can’t get in to request it?
The panel is not down.
A cluster does not imply redundancy.. it implies a group of servers which are independant of other clusters. You don’t seem to understand that to have duplication of every single machine that DH owns would at the very least DOUBLE your fees.
Machines go down. Hardware failures are not the fault of DH. Their response times to fix said broken hardware is excellent.
If you are basing what host to choose based on this blog then it should be a positive part of DH; being fully open and forward about failures and system status’s. It’s been less than 20 minutes from this post being created to the second update being made
it worked for about 10 seconds, 1/2 an hour ago, and then went back to timing out, and still has not been fixed! aaa i need my website! please fix!
Those of you whining about lack of redundancy: Think about this.
Suppose you wanted to set up a webserver with similar capabilities as DreamHost currently provides for it’s L1 customers (7.95/mo.). Well, I suppose first you’d have to plunk down a few g’s for redundant OC-3 lines. Then you’d need to hook those into redundant routers (and I’m not talking about the Linksys routers ya get for 50 bucks at Target, now). Then you’d have to build a server, give it redundant EVERYTHING. Then of course you need another one. Probably need offsite DNS and email servers. Maybe you even build yourself a redundant international high-speed internet backbone, in case your provider goes down. Of course redundant UPS and redundant generators are a must-have.
This does not come at $7.95/mo for your sites.
Matthew and Matt (if you’re not the same guy) - Dude(s) - the panel is REQUIRED. DH needs it to do their work so of all the systems hosted by DH it is the one that should be clustered and configured with failover. It’s their LIFEBLOOD. It’s how people sign up and how customers stay happy. Especially when there is NO ALTERNATIVE WAY TO CONTACT SUPPORT.
My statement that the panel was not available had not a damn thing to do with customer sites and if I can’t retrieve information because a DB’s borked (see newly posted message by DH), dood - that qualifies as DOWN. I NEVER report an issue until I’ve duplicated it multiple times (it’s the software QA background in me).
For 8$ a month, I’m happy that it’s up MOST of the time. But the panel had better be available as much as possible (and I KNOW that DH wants the panel up 2x more than I do).
No need to protect DH from me - what we need to do is make sure to protect DH from itself. It’d be a shame for the whole thing to implode on itself. We’d all be stuck on windows boxes. Yech.
You shouldn’t be complaining about lack of redundancy, unless someone is holding a gun to your head and telling you not to handle your own redundancy.
A few people complaining about it isn’t a reason for DH to buy two of everything and double the prices for everyone.
Get a second hosting account. Get an account at dnsmadeeasy.com for about $20/year. There you go… redundancy.
If you’re not willing to pay that little amount, it’s not important to you. But, as cheap as it is, I’m not willing to pay double for my hosting plan because someone else wants it.
Also, some of you are complaining about it like all other hosts offer it. They don’t. Many don’t even offer a nice backup system like DH.
For true redundancy, not only are you asking DH to buy two of everything, but you’re asking them to run two different data centers.
Like backups, there’s no reason to push redundancy off on your host.
The spammers promoting their new hosts here clearly didn’t do much research if they feel they made good choices.
At Mike: The choice doesn’t make a difference. first time their ISP’s route to their new host breaks they’ll cry about it being down and be back at DH.
Mike: DID YOU *READ* the message I posted? I don’t give a rat’s buttock if my sites are redundant. What I have and do does not need 99.9% uptime and for the price I surely don’t expect it.
BUT - and this is what I was REALLY discussing - the DH Panel needs to be fully redundant with good failover. There are, I’m guessing, conservatively between 7500 and 10000 customers of DH and any one of them could/might need access to the panel at any time - especially when things aren’t working on their sites. It is important that the panel be up and available and not use “gee-whiz” technology to accomplish simple basic stuff.
@whiner:
You weren’t the first person to bring up redundancy–and it hasn’t all been directed at the panel.
But… I have yet to see a case where someone here said the panel was down, where it actually was. So, where’s the problem that requires panel redundancy?
Example of the panel not being down:
Person 1: The panel is down!
Person 2: No it’s not. I’m logged in right now and it’s fine.
Person 1: If I can’t get to it, then it’s down. Duh!
And why would the panel be more important than sites? I’d be much more concerned about site uptime than panel uptime. Site downtime can equal loss of money for some people, but a little bit of panel downtime is just the inconvenience of rescheduling certain tasks.
Anyway, the panel could be mirrored in 5 different data centers, and as soon as someone was having problems with their ISP, they’d be here claiming that all 5 data centers were down.
and agian site is down :-((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((


Would be this what is causing the “Internal server errors”, on one of my sites?