Filer problems with blingy cluster. |
We are currently having a problem with a filer which has crashed and is recovering at this time. While this is happening some customers in the blingy cluster will experience problems loading their websites/email. We apologize for the outage and service is expected to return to normal as soon as the filer recovers.
UPDATE 3:01:AM PDT
The filer has finished recovering and all services are back up and running. We are working with the filer vendor to find the source of the crash to prevent any further outages.
Update 24/03/08 10am: We’re working on the file server again to alleviate the load that’s causing problems with web, mail and mysql services. Sorry about that.
Update 27/03/08: We are doing emergency data moves to quell the stem of problems recently caused by your file server. During these moves, your data may be inaccessible. We are moving as we can off as fast as possible. Very sorry about the continued inconvenience!
Update 27/03/08 This series of moves has finished. We are going to keep an eye on things to see how much it helped and may have to do more moves tonight and tomorrow morning to get everything working smoothly again. This post will be updated with more information as soon as possible.
Update 29/03/08
We are continuing to move data off of the problematic file server but it’s a bit of a catch-22 because customers on that machine are continuing to add data at a very high rate. It filled up this morning for a while causing device full errors as well as mail problems and issues serving websites (when these fill up it causes problems across the board). To explain in more detail, when we move data it does not immediately disappear (there is a ’snapshot’ created of the old data that remains in case there was a problem with the move - that ensures that we do not lose customer data but until the admin team can check the move to make sure it went through properly we cannot delete the old data). We just did some of that and have some breathing room again and of course more moves are still in progress but we are asking customer on this cluster to help us by holding up on any non-essential uploads of data for the next couple of days. As soon as we have a significant portion of the data removed the problematic file server will begin to function properly (and additional moves will go much more quickly and smoothly) but right now we’re having trouble moving data more quickly than it’s being added by people. If everyone could please limit uploads to absolutely essential data until we reach the turning point where everything is working this will be resolved much more quickly (in other words if for example you are setting up a repository of large files you’ll actually be better off waiting a couple of days and getting the all clear from us on this issue because you’ll be able to access that data reliably instead of cramming it on there now and slowing the recovery process).
In the meantime we’ll be doing everything we can to safely and quickly move data off and get things back to normal.
Added information: Some of the people recently moved to the new file server are seeing errors because the data did not get set up completely (loading the site will work but just show an empty index). The admin team has been running an rsync that will fully restore all data and should hopefully finish by 9 PM PST - once that is finished all site and email data will be available for those users.
Update 30/03/08
We’re still racing to keep ahead of new data being added so any help we can get on that front is greatly appreciated (we’re still asking for customers to limit uploads as much as possible to speed up the recovery process). Some customers who are being moved are seeing blank directories still but those are due to moves in progress and the data will be fully restored when those complete.
Update April 1, 2008
We seem to be ahead of the curve right now, we are moving data off the primary volume and on to a secondary one faster than new data is being uploaded. The volume hasn’t filled up completely in a few days. We are working closely with the technical support team to see how we can speed up the process further. Thank you for your patience.
Update, April 1, 2008
I apologize for the late update but we’ve been going over our options (while moving data of course). While we’re not seeing any real relief in terms of data uploads we do have some very large moves that are almost complete. Once those finish we can start deleting the data (for example one is around a half TB or around 500 GB which will be 4-5% of the total but it’s going to take until around Friday to delete it all, so we’re dealing with a ton of data). Tomorrow’s update should be earlier in the day and hopefully we’ll have some progress to reports from the large moves being complete.
Update, April 3, 2008
The data moves to other file servers has been running constantly, but last night and this morning some complications happened with the moves, requiring admin attention. To clear up some space there had to be a short interruption in file serving, this is now finished, space is available and the moves are continuing. The admins are fixing up the last of the web servers which were having issues after file serving was restored. Our apologies again for the continued issues.
Update, April 4, 2008
Today has been a pretty good day of progress. We were able to complete even more moves and free up more data from the file server. Moves have been going quicker and stability is dramatically improving. Monitoring of the servers and email in the blingy cluster today have shown a significant decrease in problems. Issues do still exist but the problem is noticeably getting better. We are also pleased to note that we have more storage that will be coming early next week. We believe that this will go a long way in helping us fix this major problem.
Update, April 4, 2008
Things are continuing to improve today - when I got in I was pleased to see that we had held firm and even gained a percent (the effected file server was down to 95% which is as low as I have seen it in the last week and since I have been working it has dropped to 94%). Performance should improve as we gain ground (this will speed up moving data off as well). This progress, along with the added storage space we are expecting early next week should hopefully allow us to restore service for our customers to normal.
5:45 PM PST : The moves we started just a while ago seem to be causing server problems, we’re looking into it and should have it resolved shortly (they were run just like the ones that had completed so we have to determine why these specifically caused an issue). Update: this resolved itself before we could detect the cause but we’re monitoring the situation to ensure that it’s not a recurring issue (we have no indication that it will be).
Update, April 8, 2008
Please see our other posting for details on the work we did on the effected file server:
http://www.dreamhoststatus.com/2008/04/06/30-min-blingy-downtime-tonight/
We are also continuing to offload data and are making good progress (it’s never as fast as we would like it to be of course). There’s excellent detail here in case you missed it:
http://blog.dreamhost.com/2008/04/07/another-anatomy/
which chronicles the situation and fills you in pretty much up to today. We’re seeing the data dip to 90% so we’re hoping to have it down in the 80’s by the end of the week (every percent we gain helps and as performance improves we can speed up the rate of moving but we’re still looking to hit critical mass where you get the proper level of performance).
Update, April 9, 2008
As we had hoped progress is speeding up as we free up more space - while the file server is showing 95% usage, around 11% of that is data that has already been moved and is no longer in use. Due to a software issue we haven’t been able to remove it yet (the admin team is working on the best way to execute that), but once that is gone we should be around 85% usage which is another large step forward.
In terms of effect, I have already seen improvement in site function for many customers as well as greatly increased speed in moving chunks of data off as well as receiving reports that mail is functioning quite a bit better. That said this issue remains at the High severity rating and in unresolved status as we have not reached a normal level of service. I can’t stress enough how sorry I am that our customers have had to put up with this but I thank those of you who have stuck with us (check the newsletter for details on what we’re doing for Blingy customers) and look forward to providing you with the level of service we strive for at DreamHost.
Update, April 9th, 2008 21:59 PDT
Unfortunately, we need to unmount the volume again to kill these snapshots before they leave us with 0 bytes of free space. In 2 hours (midnight) I will be taking the problematic volume offline to delete the phantom snapshots. Total downtime will be between 10 and 30 minutes. Sorry for the short notice and additional outage!
Update, April 10, 2008
Well the snapshot mentioned yesterday is gone and we’re actually at 83% used today which is below where we were hoping to see marked improvement (85%). Of course we’re still moving data off (which increases the usage on the file server) so that won’t fully translate to customer usage improvement but it should be quite a bit better and keep improving until we stop moving data.
Update, April 14, 2008
Okay, we’re finally getting ready to mark this as resolved.. things have seemed pretty much okay for a while now. But, just to be sure, we’re dropping the severity to Medium for now and leaving it as unresolved.
Update, April 17, 2008
We’re still hearing some reports of site slowness - we were able to resolve an issue causing high loads today which should help but we’re not going to consider this resolved until everyone is receiving good service.
| Severity: | Medium | Resolved: | No |
March 25th, 2008 at 2:37 pm
Woo-hoo! It’s getting better! Now, my email client reports: “Your IMAP server wishes to alert you to the following: Fatal error: No space left on device”
SWEET!
March 25th, 2008 at 2:44 pm
IMAP error during logout (command CLOSE illegal in state AUTH)
IMAP error during logout (command CLOSE illegal in state NONAUTH)
getmailrc: socket error ((104, ‘Connection reset by peer’))
getmailrc: socket error ((8, ‘EOF occurred in violation of protocol’))
March 25th, 2008 at 2:54 pm
totale frechheit!!!!
March 25th, 2008 at 2:55 pm
For anyone who’s having trouble with email I would recommend signing up for google apps, and letting google handle the email for your domain. It’s free and they have instructions on what you need to do with the dreamhost panel to perform the migration over. It basically means if DH are having issues your email will still work fine because it never touches their servers.
March 25th, 2008 at 3:00 pm
Bueller? Bueller? Bueller? Bueller?
Stuff the 80’s movie references I just want to check my mail and have my hosting company treat me as well I treat my clients.
March 25th, 2008 at 3:02 pm
Unbelievable.
March 25th, 2008 at 3:08 pm
jidanni@spyro:~$ uptime
06:03:12 up 1 day, 8:03, 3 users, load average: 217.15, 213.44, 266.88
jidanni@spyro:~$ df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sda1 2071416 1395052 571140 71% /
/dev/sda3 7224600 4536308 2321208 67% /usr/local
/dev/sda8 2071416 104628 1861564 6% /tmp
and then the output stalls. There’s your problem Dreamhost, it’s in the NFS connection.
That’s what’s causing the high load average, etc. Good thing you are reading this message so now have the answer.
March 25th, 2008 at 3:10 pm
I first requested support for a slow server on 2/12/08. I waited 42 days for this issue to be resolved and decided to get a refund before my 97 days ran out. It’s easy to get the refund. Just close your account in the panel (after you get your websites working somewhere else, of course) and they automatically refund your credit card if you are within 97 days of opening your hosting account.
I encourage all of you poor souls on Blingy to find new hosting. Its been great for my mental health. If you read all the Blingy posts on this blog, you will find that some folks are reporting problems as far back as Dec 07. Even if they fixed it today, are you willing to wait 3-4 months for every problem to be resolved?
Plus, this blog is actually fun to read when your websites are working.
March 25th, 2008 at 3:25 pm
Does anyone know what is the correct procedure to claim the money back? (I am still in the 97 days period)
how long does it take to get the money?
March 25th, 2008 at 3:35 pm
The all *BLINGY FILER IS FULL* now ! 0 byte of 13 TB available. 100% space used. That cause major problems that I can’t create temporary files
But the support staff told me that they were moving from blingy to another filer :-s
March 25th, 2008 at 3:48 pm
I’m also getting errors “No space left” when trying to upload files using FTP or SFTP. I’m on ermac…
HEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEELP
March 25th, 2008 at 3:53 pm
Hey Dreamhost, I figured out your problem;
The disk is full.
rsync: mkstemp “/home/jidanni/jidanni.org/geo/taiwan_datums/.index_en.html.m04Ei9″ failed: No space left on device (28)
sitecopy: Updating site `dreamhost’ (on spyro.dreamhost.com in ~/jidanni.org/)
Uploading geo/taiwan_datums/index.html: [Couldn’t get handle: Failure
March 25th, 2008 at 4:14 pm
This is flippin ridiculous, I have a business to run here and I need email to communicate with my customers!
BTW, email is my specialty, do you want me to fix it for you, seeing as you can’t seem to get your heads out of your arses?
March 25th, 2008 at 4:15 pm
BLOODY HELL I STILL DON’T HAVE E-MAIL!
March 25th, 2008 at 4:16 pm
I’ve been with DH for three weeks. This kind of problem is EXACTLY why I switched from my old host. I’m done. You had your chance, DH. I just signed up with a new host, and all my clients are going with me.
March 25th, 2008 at 4:17 pm
I can’t log into my email client nor can I log into webmail. It’s been like this for the past two days. However, I have my main account set to forward to one of my personal Yahoo accounts and all of my messages are coming through. That leads me to believe that the emails aren’t going to be lost when they get things worked out but we’ll see.
March 25th, 2008 at 4:25 pm
Here’s a solution for you folks: Buy your own web server and a static IP address, and host your website yourself! Then, when your system goes down, you can fix it! WOOT!
March 25th, 2008 at 4:28 pm
Hi I am one of the investors of dreamhost. Please have patience, the techi crew is looking in to the issue since last 70hrs non stop. Things are expected to be fine by half an hour from now.
March 25th, 2008 at 4:51 pm
mr dreamhost.com, it is funny that the other clusters of dreamhost are having problems. tsk not very promising. unless you can rebate the whole amount of my 1 year hosting in dreamhost then I might consider staying. Im not, im sorry i guess i better look for other more reliable hosting.
March 25th, 2008 at 4:52 pm
I’d like to know why they kept putting new accounts on this thing all the while we were logging tickets about its poor performance.
March 25th, 2008 at 4:59 pm
“Sub1 Says:
March 25th, 2008 at 2:55 pm
For anyone who’s having trouble with email I would recommend signing up for google apps, and letting google handle the email for your domain. It’s free and they have instructions on what you need to do with the dreamhost panel to perform the migration over. It basically means if DH are having issues your email will still work fine because it never touches their servers.”
Sub,
Can you provide more information on how to go about doing this?
thanks!
March 25th, 2008 at 5:10 pm
Dammit! I can’t even take my forum offline to protect data integrity. The inability to write files is generating errors that abort code midway through.
March 25th, 2008 at 5:14 pm
to be clear before i do this
if i follow these instructions: http://www.google.com/support/a/bin/answer.py?hl=en&answer=37956
my email will start pointing to WHERE?
a gmail account?
my existing gmail? or one that i should start? do i lose the emails i am already losing because of DH incompetence?
M
March 25th, 2008 at 5:39 pm
Now I get 403 Forbidden for any file I try to access…great.
March 25th, 2008 at 5:46 pm
Does anyone know a hosting service that has a feature-set comparable to Dreamhost’s but… works? Don’t mind paying more. Thanks!
March 25th, 2008 at 5:55 pm
Disappointed: me too. So I gave ssh a whirl…and suddenly, my account has nothing, including no bash profile. I can only hope that means my site is currently being moved. Which is a bit pointless, since I’ve already switched to another host, hopefully only a few more hours before DNS gets propagated, then I’ll be DH free.
I only hope I’ll be able to delete my pages *and* get my refund before DH goes bankrupt from paying everyone else’s refunds.
March 25th, 2008 at 5:56 pm
So much for the 1/2 hour……so much for “under promise and over deliver.”
March 25th, 2008 at 6:08 pm
has anyone actually talked to dreamhost (callback or the like) to find out wtf is going on for real? please inform we who are in the pitch dark…
March 25th, 2008 at 6:18 pm
Don’t get mad, get compensated!
Put your complaint on file at http://dreamhost-classaction-lawsuit.com/plaintiffs.php
Please contribute to the cause, this process is very expensive! http://dreamhost-classaction-lawsuit.com/donate.html
March 25th, 2008 at 6:25 pm
Response to a support ticket I recieved:
Hello,
Note: This message may seem like a message you have already seen, please read it there is added information about the status of the cluster you are in.
Sorry about the downtime. A lot of you have already been receiving blingy status messages. Blingy is your cluster of machines for web, mysql, and email. The main file server for this cluster is having serious problems.
We have been working on this file server for a while and unfortunately our efforts have not produced a permanent solution for the problems we have been dealing with. We currently are moving users and data from the file server onto a new one. This unfortunately takes time to complete but we are working towards giving the file server more breathing room. We are also in the process of adding a third file server to the cluster and we will be moving even more data there as well.
Today we have also added more disk space to the file server as well and we had the vendors do the install themselves. This should help with the stability
Please hang in there, we will get this resolved as soon as we can. If you would like an update on the status of this issue, since it is considered an ongoing issue, please send support messages to blingystatus@dreamhost.com. You will receive a message similar to this one with an update on steps we are taking. Please also keep in mind that you must write from a contact address listed on your account or the message will bounce.
If problems occur we will address them right away to get your sites back up, but we want to get this issue resolved rather than just fixing it as problems happen.
Sorry for the downtime and overall poor service caused by this file server.
For those of you who wish to cancel your service and receive a refund, please contact me directly at ralph@dreamhost.com and I can give you further information on that.
Some of you are also asking when this issue will be fixed, I wish I had concrete information for you, but if I had to estimate I would say by the end of the week. Unfortunately I can not guarantee the issue will be fixed by then, but we are focusing on this issue presently and that will go a long way to us finding a solution.
Thanks!
DreamHost
March 25th, 2008 at 6:27 pm
dreamhost.com - well its more than 2 hours since your post and your company still can’t provide me access to my ftp account - all I get is an empty directory! Guess you need new techi’s
March 25th, 2008 at 6:38 pm
In post 284… “I would say by the end of the week”
End of the week? Seriously?!?!
Between that comment and “we will get this resolved as soon as we can” I think these guys have shown their true abilities. They don’t appear to understand the nature of the service they provide, and their comments seem flippant and trivialize the impact of their failure to my clients.
I have been patient and have been eating lots of sh*t from my clients for 48 hours, but it’s time for me to move my clients to a new hosting company.
The saddest part is, since DH won’t respond to me directly I have to read this email on their ’status’ website.
If I ran my business like this I would be dead in a month. For their lack of apparent concern and preparation, I hope karma takes care of them.
March 25th, 2008 at 6:39 pm
thanks jason p, very helpful
March 25th, 2008 at 10:08 pm
I signed on with DH in Feb….and have a snail paced site since.
We are talking loads around the 200 mark. How the server hasnt completely BLOWN up is beyond me!
I have a friend who also uses DH (yep theyre on a different cluster) and theyve never had any probs.
Thats why I joined…..GRRRRRR
Why do they continue to put new customers on a server that is obviously FULL! Thats just pathetic!
Even if they fix this….obviously Blingy has some major issues and something else will happen.
Im in the 97 day period, so Im getting a refund and Im out.
March 25th, 2008 at 11:25 pm
My email appears to be back up and running without issue… well, apart from downloading a whole weeks worth of email which is a major pain in the ass of course.
March 26th, 2008 at 5:40 am
I have a theory.
Things are SO screwed up with this company that it must be intentional. There must be bad blood in the company. Sabotage. Somebody had an affair, got passed on promotion, who knows, but the end result is this.
How else could the whole service be locking up? It’s an inside job, ladies and gentlemen, and it’s probably time we abandon ship.
March 26th, 2008 at 6:19 am
Popinsky is now DOWN again again.
We have do anything, we can’t keep stoped.
regards
March 26th, 2008 at 6:30 am
Hello,
Because I am a new DH customer (since 2 weeks..), il will cancel my account : This will be faster than wait for I wish I had concrete information for you, but if I had to estimate I would say by the end of the week.…..
Can someone give me some hints to find a similar US hosting (similar prices, similar offer, but more efficient ?….)
March 26th, 2008 at 6:35 am
E-mail was up yesterday afternoon and now it’s down again this morning. WHY DO YOU GIVETH AND TAKETH AWAY?!
March 26th, 2008 at 7:00 am
Eu sou Brasileiro e nao desisto nunca!!!!
March 26th, 2008 at 7:05 am
Hi Syklop, I have can canceled my account on dreamhost. I did a lot of research about different web hosts. I saw that href = “hostgators.co.cc” hostgator has many positive reviews, without much negative reviews, and the most important thing is that we can pay them monthly. I trying them out now. The services, support and features are great. As of now the sites are loading much much faster than they were loading on dreamhost. Since I don’t have to pay them for an year in advance, I feel secured. May be you too can try them out.
March 26th, 2008 at 7:07 am
hostgators.co.cc
March 26th, 2008 at 7:13 am
Ho long is this going to last?
this is a bullshit host
March 26th, 2008 at 7:16 am
Yes sam, I too am trying Hostgator
March 26th, 2008 at 7:22 am
Hostgator looks good value and blend of features.
A site I work for is using (mt) Media Temple (www.mediatemple.net) that looks like it’s more industrial strength (and a little more $)
Both sites cost more than Dreamhost but given the fact that DH costs very little and I depend on my e-mail and web as a huge component of my business it’s really a no-brainer.
Ugh, I’ve had my e-mail all perfectly configured and now I’ve been rerouting, redirect clients to different e-mails, this is a complete mess. Unacceptable from Monday and now just kind of ridiculous to Wednesday.
NOW JUST FIX THIS SO I CAN GET THROUGH THE WEEK.
March 26th, 2008 at 7:32 am
Popinsky is down 1:30 hour ago
March 26th, 2008 at 7:56 am
How is media temple services? Any bad experience anyone there? I am currently trying hostgator
March 26th, 2008 at 8:07 am
We suck at hosting. Please find another host.
March 26th, 2008 at 8:09 am
Come on, this is annoying…
March 26th, 2008 at 8:19 am
*sigh*. I just moved here from 1and1. I was excited initially at the level of control I have now. Now I’m having all kinds of problems with my website loading and my email coming through. That’s super annoying, I use my email for everything. Is any email getting lost while the servers are jacked up?