Filer problems with blingy cluster. |
We are currently having a problem with a filer which has crashed and is recovering at this time. While this is happening some customers in the blingy cluster will experience problems loading their websites/email. We apologize for the outage and service is expected to return to normal as soon as the filer recovers.
UPDATE 3:01:AM PDT
The filer has finished recovering and all services are back up and running. We are working with the filer vendor to find the source of the crash to prevent any further outages.
Update 24/03/08 10am: We’re working on the file server again to alleviate the load that’s causing problems with web, mail and mysql services. Sorry about that.
Update 27/03/08: We are doing emergency data moves to quell the stem of problems recently caused by your file server. During these moves, your data may be inaccessible. We are moving as we can off as fast as possible. Very sorry about the continued inconvenience!
Update 27/03/08 This series of moves has finished. We are going to keep an eye on things to see how much it helped and may have to do more moves tonight and tomorrow morning to get everything working smoothly again. This post will be updated with more information as soon as possible.
Update 29/03/08
We are continuing to move data off of the problematic file server but it’s a bit of a catch-22 because customers on that machine are continuing to add data at a very high rate. It filled up this morning for a while causing device full errors as well as mail problems and issues serving websites (when these fill up it causes problems across the board). To explain in more detail, when we move data it does not immediately disappear (there is a ’snapshot’ created of the old data that remains in case there was a problem with the move - that ensures that we do not lose customer data but until the admin team can check the move to make sure it went through properly we cannot delete the old data). We just did some of that and have some breathing room again and of course more moves are still in progress but we are asking customer on this cluster to help us by holding up on any non-essential uploads of data for the next couple of days. As soon as we have a significant portion of the data removed the problematic file server will begin to function properly (and additional moves will go much more quickly and smoothly) but right now we’re having trouble moving data more quickly than it’s being added by people. If everyone could please limit uploads to absolutely essential data until we reach the turning point where everything is working this will be resolved much more quickly (in other words if for example you are setting up a repository of large files you’ll actually be better off waiting a couple of days and getting the all clear from us on this issue because you’ll be able to access that data reliably instead of cramming it on there now and slowing the recovery process).
In the meantime we’ll be doing everything we can to safely and quickly move data off and get things back to normal.
Added information: Some of the people recently moved to the new file server are seeing errors because the data did not get set up completely (loading the site will work but just show an empty index). The admin team has been running an rsync that will fully restore all data and should hopefully finish by 9 PM PST - once that is finished all site and email data will be available for those users.
Update 30/03/08
We’re still racing to keep ahead of new data being added so any help we can get on that front is greatly appreciated (we’re still asking for customers to limit uploads as much as possible to speed up the recovery process). Some customers who are being moved are seeing blank directories still but those are due to moves in progress and the data will be fully restored when those complete.
Update April 1, 2008
We seem to be ahead of the curve right now, we are moving data off the primary volume and on to a secondary one faster than new data is being uploaded. The volume hasn’t filled up completely in a few days. We are working closely with the technical support team to see how we can speed up the process further. Thank you for your patience.
Update, April 1, 2008
I apologize for the late update but we’ve been going over our options (while moving data of course). While we’re not seeing any real relief in terms of data uploads we do have some very large moves that are almost complete. Once those finish we can start deleting the data (for example one is around a half TB or around 500 GB which will be 4-5% of the total but it’s going to take until around Friday to delete it all, so we’re dealing with a ton of data). Tomorrow’s update should be earlier in the day and hopefully we’ll have some progress to reports from the large moves being complete.
Update, April 3, 2008
The data moves to other file servers has been running constantly, but last night and this morning some complications happened with the moves, requiring admin attention. To clear up some space there had to be a short interruption in file serving, this is now finished, space is available and the moves are continuing. The admins are fixing up the last of the web servers which were having issues after file serving was restored. Our apologies again for the continued issues.
Update, April 4, 2008
Today has been a pretty good day of progress. We were able to complete even more moves and free up more data from the file server. Moves have been going quicker and stability is dramatically improving. Monitoring of the servers and email in the blingy cluster today have shown a significant decrease in problems. Issues do still exist but the problem is noticeably getting better. We are also pleased to note that we have more storage that will be coming early next week. We believe that this will go a long way in helping us fix this major problem.
Update, April 4, 2008
Things are continuing to improve today - when I got in I was pleased to see that we had held firm and even gained a percent (the effected file server was down to 95% which is as low as I have seen it in the last week and since I have been working it has dropped to 94%). Performance should improve as we gain ground (this will speed up moving data off as well). This progress, along with the added storage space we are expecting early next week should hopefully allow us to restore service for our customers to normal.
5:45 PM PST : The moves we started just a while ago seem to be causing server problems, we’re looking into it and should have it resolved shortly (they were run just like the ones that had completed so we have to determine why these specifically caused an issue). Update: this resolved itself before we could detect the cause but we’re monitoring the situation to ensure that it’s not a recurring issue (we have no indication that it will be).
Update, April 8, 2008
Please see our other posting for details on the work we did on the effected file server:
http://www.dreamhoststatus.com/2008/04/06/30-min-blingy-downtime-tonight/
We are also continuing to offload data and are making good progress (it’s never as fast as we would like it to be of course). There’s excellent detail here in case you missed it:
http://blog.dreamhost.com/2008/04/07/another-anatomy/
which chronicles the situation and fills you in pretty much up to today. We’re seeing the data dip to 90% so we’re hoping to have it down in the 80’s by the end of the week (every percent we gain helps and as performance improves we can speed up the rate of moving but we’re still looking to hit critical mass where you get the proper level of performance).
Update, April 9, 2008
As we had hoped progress is speeding up as we free up more space - while the file server is showing 95% usage, around 11% of that is data that has already been moved and is no longer in use. Due to a software issue we haven’t been able to remove it yet (the admin team is working on the best way to execute that), but once that is gone we should be around 85% usage which is another large step forward.
In terms of effect, I have already seen improvement in site function for many customers as well as greatly increased speed in moving chunks of data off as well as receiving reports that mail is functioning quite a bit better. That said this issue remains at the High severity rating and in unresolved status as we have not reached a normal level of service. I can’t stress enough how sorry I am that our customers have had to put up with this but I thank those of you who have stuck with us (check the newsletter for details on what we’re doing for Blingy customers) and look forward to providing you with the level of service we strive for at DreamHost.
Update, April 9th, 2008 21:59 PDT
Unfortunately, we need to unmount the volume again to kill these snapshots before they leave us with 0 bytes of free space. In 2 hours (midnight) I will be taking the problematic volume offline to delete the phantom snapshots. Total downtime will be between 10 and 30 minutes. Sorry for the short notice and additional outage!
Update, April 10, 2008
Well the snapshot mentioned yesterday is gone and we’re actually at 83% used today which is below where we were hoping to see marked improvement (85%). Of course we’re still moving data off (which increases the usage on the file server) so that won’t fully translate to customer usage improvement but it should be quite a bit better and keep improving until we stop moving data.
Update, April 14, 2008
Okay, we’re finally getting ready to mark this as resolved.. things have seemed pretty much okay for a while now. But, just to be sure, we’re dropping the severity to Medium for now and leaving it as unresolved.
Update, April 17, 2008
We’re still hearing some reports of site slowness - we were able to resolve an issue causing high loads today which should help but we’re not going to consider this resolved until everyone is receiving good service.
| Severity: | Medium | Resolved: | No |
March 28th, 2008 at 5:48 am
According to my account summary I’m only on “blingy” for email (”clank” for web pages and “rous” for MySQL). However, email is the only thing that seems to be working for me. Maybe the problem is more widespread as it seems clank and rous are experiencing high loads (round 80-100 mark!)
March 28th, 2008 at 6:01 am
How long?
March 28th, 2008 at 6:31 am
I’m pretty sure it’s not yet resolved for me, as my email is still not working. Hopefully soon.
j
March 28th, 2008 at 6:33 am
Holy shit I’m finally receiving emails from 4 fucking days ago.
March 28th, 2008 at 6:42 am
For a short time yesterday evening and this morning email was working. Not quick but at least it worked. Now i’m getting ERROR: Connection dropped by IMAP server.
Query: FETCH 1:* (FLAGS UID RFC822.SIZE INTERNALDATE BODY.PEEK[HEADER.FIELDS (Date To Cc From Subject X-Priority Importance Priority Content-Type)])
March 28th, 2008 at 6:44 am
So has anyone tried to visit “Astonishinghost.com” to leave complaints and been highly amused at what actually pops up?
Seriously.
Goes to a YouTube video… check out the WhoIs, good for another laugh. Looks like someone over at Astonishinghost forgot to renew their domain…
March 28th, 2008 at 6:51 am
Como tienen tanta cara estos malnacidos de dreamhost???
Ardereis en el infierno hijos de la gran puta.
March 28th, 2008 at 7:03 am
That really suxxx! I think I’ll move my domains to another hoster…damn, three days without accessability ;-(
March 28th, 2008 at 7:11 am
All I can say is, check this site out. http://www.webhostingtalk.com/forumdisplay.php?f=1
Apparently the problems DH are having are rather common amongst low-cost hosting providers, although not all low-cost providers have such catastrophic issues like the one I am dealing with now. I think I might have finally found an alternative and will leave you to do your own research on that site.
Telling a client who has had no email for 4 1/2 days to ‘wait it out’ while I have not been given any useful info from DH is bad for everyone involved. DH will certainly not reimburse me for the week of struggling to provide workarounds and researching dozens of new hosts.
I don’t think anyone here is impressed.
March 28th, 2008 at 7:28 am
This server is toast — I ssh’d in and made a tarball of my site (to move to a new host!), and it took 5 minutes to compress a simple Wordpress folder! 5 minutes!!! Is my site hosted on Commodore 64? Is the cassette tape jamming up or something?
March 28th, 2008 at 8:00 am
Still getting insanely high server load, page load times of 1-5 minutes and no word at all from support. I’m about to have a look at other hosts.
When moving between hosts, what would be the best way to minimise downtime due to DNS propagation? I was thinking of setting www2 to the new host which should propagate immediately as that subdomain has never been used, then using htaccess to redirect www to www2 while waiting for www’s new record to propagate. Would this work? Is there a better way? Any advice would be appreciated.
March 28th, 2008 at 8:07 am
Having been a Dreamhost client for some years now, I’ve grown used to the occasional outages, having learned to be patient given the cheap pricing schedule. However it seems this time around I’ve reached my patience threshold, having yet again rolled into the office this morning only to learn I’m unable to check my email for the 5th day running. It’s like playing craps, press the Send/Receive button and hope you get a decent roll. Sometimes it works, and sometimes it doesn’t. This is simply ridiculous.
I doubt I’m the first on this comment list to state as much, but next week I’ll start looking for another hosting provider, with the goal of moving my site off of Dreamhost entirely within the next two weeks.
Jason
March 28th, 2008 at 8:16 am
Get a new filer from the vendor, or better yet get a new server to host our sites
March 28th, 2008 at 8:37 am
Well, we are having problems as well and we are sweating it as well, but I figure no one is sweating it as hard as the people with skin in the game at DH. The ones pulling double shifts and sleeping under desks, I mean. And you know that they are.
I have used a few hosting providers and I remember what their customer and tech support was like. If you come from an abusive background, you will recognize it immediately. All I have to do is look at my support history messages and know that this is a problem that has their complete and undivided attention at DH.
I used to be a roadie for some of the biggest bands in the world. I know what it is like to stand “naked” on a stage with 30,000 screaming fans, none screaming louder than Mr. Petulant Rockstar, while your gear is going up in “flames” and you are trying to keep focused and centered and fix the problem. Fixing problems takes the time it takes and not one second less.
I think we’ll stick. Weather the storm and all that. We’ll survive.
March 28th, 2008 at 8:38 am
I doubt they allow embedded images in these posts so here ya go:
ttp://98.130.145.104/img/sleeping_on_the_job.jpg
This is going up on every site I own. I am transitioning to IXWebHosting ( www.ixwebhosting.com) as fast as DreamHost’s Yo-Yo FTP access will allow me to.
I’m through with this company.
March 28th, 2008 at 8:47 am
http://98.130.145.104/img/sleeping_on_the_job.jpg
bump!
March 28th, 2008 at 9:41 am
this is insane,
have not been able to update sites in over three days!
You are a JOKE!!!!
March 28th, 2008 at 9:43 am
p.s.
shawn, show your face…….
I got a few ideas…..
March 28th, 2008 at 9:44 am
keep an eye on things? my site is down AGAIN for the 5th day in a row. Ridiculous
March 28th, 2008 at 9:45 am
@Paul (#469): don’t go with a hosting who provider more than 10 GB space for less than $10. Don’t take any lesson here ? I’ve just posted something there http://www.webhostingtalk.com/showthread.php?t=682124
I’ve gone. Hope that DH be back with more serious plans.
March 28th, 2008 at 9:51 am
i am a fairly new customer so after reading most of the posts, i am dreading the moment when (like most of you) i will have to highly depend on my hosting to work. is there a way for a domain to point to more than one host - such as in this case?
March 28th, 2008 at 10:03 am
bump!
http://98.130.145.104/img/sleeping_on_the_job.jpg
March 28th, 2008 at 10:14 am
My DreamHost PS is down:
Internal Server Error
The server encountered an internal error or misconfiguration and was unable to complete your request.
Please contact the server administrator, webmaster@adventuresinparenting.org and inform them of the time the error occurred, and anything you might have done that may have caused the error.
More information about this error may be available in the server error log.
Additionally, a 500 Internal Server Error error was encountered while trying to use an ErrorDocument to handle the request.
Rebooting fixed it for now, but this downtime is unacceptable.
March 28th, 2008 at 10:30 am
Yeah, I agree the mail service in the last few days siince I been on has been absolutely horrendous. At least my clients’ web sites have been up.
March 28th, 2008 at 10:34 am
I’m still having problems with my website. I hope this will be resolved soon =/
March 28th, 2008 at 10:44 am
It’s cute when really when people publicly display that they are incapable of thinking.
I see you really did your research, since you don’t seem to realize that they are heading into 48 hours of downtime (at least) for a data center move, like the one Dreamhost just did in 12 hours.
Do you really think going to a crappier host makes things better?
Just write down every host on an individual piece of paper, stick all of them up your ass, then reach in and pull out one. That’s your new host. YOU have just as much chance of picking a good one that way as you seem to using whatever method made you think IX was the way to go.
March 28th, 2008 at 10:49 am
I just have 3 days in this host and it is a nightmare!
how can I ask for my money back?
March 28th, 2008 at 10:50 am
This is ridiculous.
I haven’t been able to access email all day and its incredibly importnat for my work. We are supposed to be launching a site that is the livelihood of our business this coming week but now I’m just going to move servers.
The least Dreamhost could do would be to update us every hour with ata least some kind of update so we know they aren’t just taking a nap over there. Way to go Dreamhost. I can handle downtime, but not downtime AND silence. Also the ridiculously slow speeds I’ve been getting since joining you a little over a month ago are completely unacceptible.
Anyone got a good server suggestion that sounds similar to Dreamhost but actually works as expected?
March 28th, 2008 at 10:52 am
Dreamhost sucks beyond belief - nothing but constant problems. I want a refund immediately.
Can someone reccomend a host that is reliable?
March 28th, 2008 at 10:55 am
Hey, how about an update????!?!?!?!
March 28th, 2008 at 11:01 am
My website seems to be functioning fine, although it has been a bit slow sometimes, but my email keeps going down intermittently, which is incredibly annoying. However, even when it is up, it is really slow, That might be because I’m accessing it from Norway right now, but it still seems excessively slow. I hope you can fix this soon and do something about the speed.
March 28th, 2008 at 11:07 am
This is beyond ridiculous now and I’m also considering taking my hosting somewhere else. Communication with your customers would go a long way toward appeasing them. All week I’ve had email and webmail issues during the day, and when you go home at night and stop screwing with it it works fine.
March 28th, 2008 at 11:07 am
I am most definitely NOT amused, THIS IS AN OUTRAGE! They figure they have time and technicians to spare for going around the net hacking other companies and hijacking their DNS as their customers struggle to keep their own clients because their host is too busy playing pranks to fix their hardware, so I do not think I would be laughing if I were in your shoes either my friend. Our legal department are the only ones who should be happy about this, it’s a goldmine for them! DreamHost will pay dearly for their puerile behavior, and in the meantime our techs assure me that we will be back online shortly, and then we can finish saving the rest of you from this clearly very evil company.
March 28th, 2008 at 11:10 am
email down again today. 7+ days and counting. My site has been fine the whole time but it’s on the mario server. It’s laughable. 7 days!
March 28th, 2008 at 11:14 am
All my sites down from 36 hours ago…Do you know any alternatives to dreamhost? I like the simple interface they have here, but i don’t mind to pay even double to assure a REAL host service
March 28th, 2008 at 11:15 am
Geez! This crap happens like every other month and totally messes up my business. I’m going to have to move to another provider, I can’t have all these email issues constantly!
March 28th, 2008 at 11:17 am
While you wait, Astonishing Host, we would greatly appreciate it if your company could make a donation at http://dreamhost-classaction-lawsuit.com/donate.html to help those of us trying to get compensation.
March 28th, 2008 at 11:19 am
I wasn’t kidding, I want my money back.
Also, can someone please recommend another provider - SERIOUSLY! I can’t continue with unreliable email.
March 28th, 2008 at 11:39 am
I am moving to MediaTemple. Not as cheap, but worth the reliability. I have worked with several hosts in the past. They all have issues. However, this has just been too many issues for me to put up with.
March 28th, 2008 at 11:58 am
I see you really did your research, since you don’t seem to realize that they are heading into 48 hours of downtime (at least) for a data center move, like the one Dreamhost just did in 12 hours.
Do you really think going to a crappier host makes things better?
Just write down every host on an individual piece of paper, stick all of them up your ass, then reach in and pull out one. That’s your new host. YOU have just as much chance of picking a good one that way as you seem to using whatever method made you think IX was the way to go.
March 28th, 2008 at 12:01 pm
I see you really did your research, since you don’t seem to realize that they are heading into 48 hours of downtime (at least) for a data center move, like the one Dreamhost just did in 12 hours. This affects almost ALL of their customers–not one cluster like Dreamhost.
Do you really think going to a crappier host makes things better?
Just write down every host on an individual piece of paper, stick all of them up your ass, then reach in and pull out one. That’s your new host. YOU have just as much chance of picking a good one that way as you seem to using whatever method made you think IX was the way to go.
March 28th, 2008 at 12:04 pm
Are you really that stupid? You’re asking a bunch of spammers and people with financial interest in the companies they recommend, pretending to be actual customers.
If you’re not smart enough to do some actual research, or get recommendations from people you actually trust, then follow the advice I gave in my last post for choosing a host.
March 28th, 2008 at 12:22 pm
I’ve been with DH a few months now and I’m sick and tired of all the outages and system failures. I agree with the previous poster, I feel like I’ve wasted my money by purchasing from DH. As soon as I get my dedicated server, I’m going to relocate all my sites out of DH.
March 28th, 2008 at 12:26 pm
not to mention that my sites have serious performance issues. it’s quite ironic, that free hosts like the 110mb give faster and more responsive performance. and they have way better uptime too.
from the last 4 months experience with DH, i must say DH’s paid service is actually worse than 110mb’s free service.
March 28th, 2008 at 12:31 pm
and why does DH experience so many hardware issues? do they buy all their machines from cheap-ass chinese manufacturers? i’ve read on some blogs that the shared server machines DH deploys are outdated. and they cram as many sites as 1000 into a single machine.
March 28th, 2008 at 12:52 pm
An update is certainly required at this point. It has been over half a day since the last.
I have several customers who are pointing fingers and I really would love to have a time frame on their e-mail access. I understand downtime is inevitable, but we need some sort of idea as to when we’ll be back up. Thanks for working hard on this,
-Scott
March 28th, 2008 at 12:57 pm
Because they have so many. This is common sense. Go buy between 1,500 and 2,000 servers and let us know if none of them ever have problems.
If a server broke every single day, that would still be like 3 - 4 year per problem per server.
If they’re just parked domains, that’s nothing. And if you mean users instead of sites, that’s meaningless since 1 customer can create unlimited users… doesn’t mean they’re actually doing anything.
Most sites use hardly any resources and the ones that really hammer the server can take it down even if there aren’t any other sites on it.
Since everything you learned seems meaningless, you might want to look for better sources of info than random blogs.
March 28th, 2008 at 1:01 pm
having to reboot my private server every 10 minutes is getting old
March 28th, 2008 at 1:10 pm
I don’t think so they’re outdated megaman is Dual-Core AMD Opteron(tm) Processor 1218 HE . The problem i think is wrongly designed architecture. But it’s not the real problem. The real problem is to keep the money, not spending to new hw when the trouble comes. If they have 100,000 account, they have around $15M . You can buy a very nice server farm for 10M, 5M for salary. And the next year you can pay the investors.The problem is that they don’t have hot-swap replacements, dunno why. They have the money and the chance (in California, there should be enough good vendor).
March 28th, 2008 at 1:12 pm
The drama continues. There’s a really whiny anti-dreamhost video on YT at youtube.com/watch?v=qS7nqwGt4-I