Filer problems with blingy cluster. |
We are currently having a problem with a filer which has crashed and is recovering at this time. While this is happening some customers in the blingy cluster will experience problems loading their websites/email. We apologize for the outage and service is expected to return to normal as soon as the filer recovers.
UPDATE 3:01:AM PDT
The filer has finished recovering and all services are back up and running. We are working with the filer vendor to find the source of the crash to prevent any further outages.
Update 24/03/08 10am: We’re working on the file server again to alleviate the load that’s causing problems with web, mail and mysql services. Sorry about that.
Update 27/03/08: We are doing emergency data moves to quell the stem of problems recently caused by your file server. During these moves, your data may be inaccessible. We are moving as we can off as fast as possible. Very sorry about the continued inconvenience!
Update 27/03/08 This series of moves has finished. We are going to keep an eye on things to see how much it helped and may have to do more moves tonight and tomorrow morning to get everything working smoothly again. This post will be updated with more information as soon as possible.
Update 29/03/08
We are continuing to move data off of the problematic file server but it’s a bit of a catch-22 because customers on that machine are continuing to add data at a very high rate. It filled up this morning for a while causing device full errors as well as mail problems and issues serving websites (when these fill up it causes problems across the board). To explain in more detail, when we move data it does not immediately disappear (there is a ’snapshot’ created of the old data that remains in case there was a problem with the move - that ensures that we do not lose customer data but until the admin team can check the move to make sure it went through properly we cannot delete the old data). We just did some of that and have some breathing room again and of course more moves are still in progress but we are asking customer on this cluster to help us by holding up on any non-essential uploads of data for the next couple of days. As soon as we have a significant portion of the data removed the problematic file server will begin to function properly (and additional moves will go much more quickly and smoothly) but right now we’re having trouble moving data more quickly than it’s being added by people. If everyone could please limit uploads to absolutely essential data until we reach the turning point where everything is working this will be resolved much more quickly (in other words if for example you are setting up a repository of large files you’ll actually be better off waiting a couple of days and getting the all clear from us on this issue because you’ll be able to access that data reliably instead of cramming it on there now and slowing the recovery process).
In the meantime we’ll be doing everything we can to safely and quickly move data off and get things back to normal.
Added information: Some of the people recently moved to the new file server are seeing errors because the data did not get set up completely (loading the site will work but just show an empty index). The admin team has been running an rsync that will fully restore all data and should hopefully finish by 9 PM PST - once that is finished all site and email data will be available for those users.
Update 30/03/08
We’re still racing to keep ahead of new data being added so any help we can get on that front is greatly appreciated (we’re still asking for customers to limit uploads as much as possible to speed up the recovery process). Some customers who are being moved are seeing blank directories still but those are due to moves in progress and the data will be fully restored when those complete.
Update April 1, 2008
We seem to be ahead of the curve right now, we are moving data off the primary volume and on to a secondary one faster than new data is being uploaded. The volume hasn’t filled up completely in a few days. We are working closely with the technical support team to see how we can speed up the process further. Thank you for your patience.
Update, April 1, 2008
I apologize for the late update but we’ve been going over our options (while moving data of course). While we’re not seeing any real relief in terms of data uploads we do have some very large moves that are almost complete. Once those finish we can start deleting the data (for example one is around a half TB or around 500 GB which will be 4-5% of the total but it’s going to take until around Friday to delete it all, so we’re dealing with a ton of data). Tomorrow’s update should be earlier in the day and hopefully we’ll have some progress to reports from the large moves being complete.
Update, April 3, 2008
The data moves to other file servers has been running constantly, but last night and this morning some complications happened with the moves, requiring admin attention. To clear up some space there had to be a short interruption in file serving, this is now finished, space is available and the moves are continuing. The admins are fixing up the last of the web servers which were having issues after file serving was restored. Our apologies again for the continued issues.
Update, April 4, 2008
Today has been a pretty good day of progress. We were able to complete even more moves and free up more data from the file server. Moves have been going quicker and stability is dramatically improving. Monitoring of the servers and email in the blingy cluster today have shown a significant decrease in problems. Issues do still exist but the problem is noticeably getting better. We are also pleased to note that we have more storage that will be coming early next week. We believe that this will go a long way in helping us fix this major problem.
Update, April 4, 2008
Things are continuing to improve today - when I got in I was pleased to see that we had held firm and even gained a percent (the effected file server was down to 95% which is as low as I have seen it in the last week and since I have been working it has dropped to 94%). Performance should improve as we gain ground (this will speed up moving data off as well). This progress, along with the added storage space we are expecting early next week should hopefully allow us to restore service for our customers to normal.
5:45 PM PST : The moves we started just a while ago seem to be causing server problems, we’re looking into it and should have it resolved shortly (they were run just like the ones that had completed so we have to determine why these specifically caused an issue). Update: this resolved itself before we could detect the cause but we’re monitoring the situation to ensure that it’s not a recurring issue (we have no indication that it will be).
Update, April 8, 2008
Please see our other posting for details on the work we did on the effected file server:
http://www.dreamhoststatus.com/2008/04/06/30-min-blingy-downtime-tonight/
We are also continuing to offload data and are making good progress (it’s never as fast as we would like it to be of course). There’s excellent detail here in case you missed it:
http://blog.dreamhost.com/2008/04/07/another-anatomy/
which chronicles the situation and fills you in pretty much up to today. We’re seeing the data dip to 90% so we’re hoping to have it down in the 80’s by the end of the week (every percent we gain helps and as performance improves we can speed up the rate of moving but we’re still looking to hit critical mass where you get the proper level of performance).
Update, April 9, 2008
As we had hoped progress is speeding up as we free up more space - while the file server is showing 95% usage, around 11% of that is data that has already been moved and is no longer in use. Due to a software issue we haven’t been able to remove it yet (the admin team is working on the best way to execute that), but once that is gone we should be around 85% usage which is another large step forward.
In terms of effect, I have already seen improvement in site function for many customers as well as greatly increased speed in moving chunks of data off as well as receiving reports that mail is functioning quite a bit better. That said this issue remains at the High severity rating and in unresolved status as we have not reached a normal level of service. I can’t stress enough how sorry I am that our customers have had to put up with this but I thank those of you who have stuck with us (check the newsletter for details on what we’re doing for Blingy customers) and look forward to providing you with the level of service we strive for at DreamHost.
Update, April 9th, 2008 21:59 PDT
Unfortunately, we need to unmount the volume again to kill these snapshots before they leave us with 0 bytes of free space. In 2 hours (midnight) I will be taking the problematic volume offline to delete the phantom snapshots. Total downtime will be between 10 and 30 minutes. Sorry for the short notice and additional outage!
Update, April 10, 2008
Well the snapshot mentioned yesterday is gone and we’re actually at 83% used today which is below where we were hoping to see marked improvement (85%). Of course we’re still moving data off (which increases the usage on the file server) so that won’t fully translate to customer usage improvement but it should be quite a bit better and keep improving until we stop moving data.
Update, April 14, 2008
Okay, we’re finally getting ready to mark this as resolved.. things have seemed pretty much okay for a while now. But, just to be sure, we’re dropping the severity to Medium for now and leaving it as unresolved.
Update, April 17, 2008
We’re still hearing some reports of site slowness - we were able to resolve an issue causing high loads today which should help but we’re not going to consider this resolved until everyone is receiving good service.
| Severity: | Medium | Resolved: | No |
March 20th, 2008 at 1:20 am
@wtf
“SO if blingy is my mail server, how come my website will not load?”
Your sites are probably on blingy cluster. And your webserver voldo in the blingy cluster. Irrespective of whichever server your sites are hosted on, all of them use the same mail server blingy
March 20th, 2008 at 1:21 am
ok now it is loading, are yours too?
March 20th, 2008 at 1:21 am
@skh.com: no that site is not with 1and1. FYI I have a VPS package with 1and1 and host tools.jcisio.com there, but that’s a static site. I think you missed the point: If I want to host tools.jcisio.com so I must point jcisio.com NS entries to there nameserver (not just create a tools.jcisio.com A 1and1-ip-address they don’t accept that).
http://4u.jcisio.com is a big dynamic site (200-300 MB database) and run fast, and is on Dreamhost (look at traceroute).
March 20th, 2008 at 1:26 am
Hmm looks like this is a recurring problem, I also signed up in January I guess we all coped a bad cluster.
March 20th, 2008 at 1:31 am
How can I see if my servers are in the Blingy cluster? In my account info none of the servers is named Blingy (web: holt, mail: spunky, sql: scooby) but my mail isn’t working (all mail gets bounced).
March 20th, 2008 at 1:31 am
this is realy bad guys, im trying to work on one of my sites and i cant do it locally zzz…
March 20th, 2008 at 1:33 am
It’s up now
Good work support ^^
March 20th, 2008 at 1:34 am
@Jcisio
I understood what you said.. I was just asking for some other informations..
Except the ill fate of being on blingy, I am happy about everything else on DH. I just wanted to know on which cluster/server your http://4u.jcisio.com is hosted, so that I can request DH to shift my account from blingy to there, or some other cluster they think is good. If they are ready to do that, I will sing their glories and stay with them. Else I will leave. I am already trying elsewhere. Enough is enough with blingy cluster.
March 20th, 2008 at 2:02 am
Images still aren’t showing up on my site, and one just isn’t working at all
March 20th, 2008 at 2:11 am
my site still isnt working at all…
March 20th, 2008 at 2:13 am
Yep. I concur that for those of us on ‘blingy’, dreamhost sucks - big time.
I’ve only been with them a few months, but my experience, Dreamhost is the most unreliable hosting provider in ten years.
Why can’t they just buy another filer to fill in while they fix this damn blingy???
I don’t call 2 months of continuous downtime and broken promises anything like customer service!
Rant over, for now
March 20th, 2008 at 2:29 am
Happy to see that it’s back. Thank you, DH !
@skh.com: web/mail/mysql is deblume/randy/chimchim. I don’t think they get free space on that old cluster, even PS is not available
Hope everything will change after the cluster moving this we.
March 20th, 2008 at 2:31 am
@Skillz: I’m user since Februrary, but I contacted with Dreamhost by a user for years. I haven’t experienced such outages and high loads before, that’s why I choosed Dreamhost. But now I’m very disappointed, I hope they really gonna make something.
March 20th, 2008 at 7:05 am
since yesterday I can’t delete anything on my server. I have permissions but dreamhost just ignores it or goes crazy if I do it on the cpanal
March 20th, 2008 at 8:54 am
Im on Iris, its down. None of my sites work. My client’s sites dont work either. What is going on?
March 20th, 2008 at 8:54 am
This is NOT resolved. ALL of my websites are down!!
March 20th, 2008 at 8:58 am
problem is still there, atleast it is on Iris host, my websites are down,down,dow,down,down ….
all of my websites are down…..
March 20th, 2008 at 1:17 pm
I’m getting this error now, at 5:15pm EST. Internal Error: Unable to check domain availability at this time. Please try again later!
Is this related?
March 22nd, 2008 at 9:46 pm
FWIW the server response time is still awful at this time.
March 23rd, 2008 at 9:51 pm
Well, I guess that’s it for me…. need a refund. Anyone else still getting the incredibly slow response times? Can you say amateur? These guys have absolutely no clue what they’re doing.
Incompetent is the word of this year 2008 for Dream/Nightmare host
March 23rd, 2008 at 10:44 pm
Our site has been up and down. Sucking around for weeks!!!! We have never expeirenced such kind of hosting in our life except this time!!!! This is absolutely sucking nightmare hosting!!!! Need a refund and shout to everybody never ever use dream host.
March 24th, 2008 at 7:22 am
We signed up January 24th. We did a major, global, press release today and our site has been down for the last two plus hours. Press are impatient people. Dreamhost should be called NIGHTMARE. The lack of uptime is causing irreparable harm to our business. After today’s failures, I’m afraid we may have to move to another host.
March 24th, 2008 at 7:48 am
I agree with scott. Same case here. Once my server is up (and I can access my files) I am backing up all my files and moving to a new hosting service. This has been a shit experience. At first it was supposed to last for hours and now it has been more than a month and problems keeps arising. My site has been down for more than 70% of the time !
March 24th, 2008 at 9:40 am
If I lose site changes again like last week - I’m gone and want my $$ back
March 24th, 2008 at 9:45 am
Dreamhost, WTF?!?! I’m hosting all my clients here and they are ringing off the hook now. I have nothing to tell them.
March 24th, 2008 at 9:50 am
I think you guys may be jumping the gun on Dreamhost. I’ve been using them for well over a year with three different sites on different servers (cutlass, altair, and I can’t remember the first one). Until today, they’ve been excellent compared the crappy host I had last time (iPowerWeb).
Granted, the company I work for is now hosting our site on the Altair server, but it’s up and running fine. Our problem is that our email server is not functional and we have a salesman on the road trying to access his email but cannot… so that’s definitely a big problem.
On another note, I’ve heard that there is a massive influx of spam this morning, across many different email hosts (Yahoo! primarily). Perhaps this has something to do with this cluster being down?
March 24th, 2008 at 9:53 am
It may still be early morning in LA, but this is prime time on the east coast and I was supposed to have a client review their site today. I am fuming right now.
March 24th, 2008 at 9:54 am
If people are looking for a place to vent: http://www.web-hosting-top.com/web-hosting/web-hosting-top.dreamhost.com-reviews
March 24th, 2008 at 9:56 am
Please, make the montly $15,00 but PLEASE, Keep my site UP!!!!
I pay more to a better service and no Overloads!
Please, think about it!
March 24th, 2008 at 9:57 am
Blingy still sucks. Can receive but not send email. Four days and counting for this problem! Come on DreamHost. Four days!
March 24th, 2008 at 10:05 am
f me sideways….and yes, I have nothing constructive to say. Just get this crap taken care of….my GOODNESS….
March 24th, 2008 at 10:08 am
COMPLETELY UNACCEPTABLE.
March 24th, 2008 at 10:10 am
Is everyone’s mail still down, can’t access mine still?
March 24th, 2008 at 10:13 am
Can receive but not send.
March 24th, 2008 at 10:15 am
Maybe you should bump this notice to the top while the issue is ongoing — just to clear up any confusion (i.e., for newbs like me).
March 24th, 2008 at 10:15 am
So this is high severity now?
I thought it was high severity back when you were my host.
March 24th, 2008 at 10:17 am
For those who are bitching and moaning, please cancel your Dreamhost shared account and get a more expensive service (be it Dreamhost PS or another provider, I don’t care). If you have something potentially useful to say, please speak up. For example….
How exactly can I tell if I am on the blingly cluster? Sorry, I may have missed this info through all the chaff-comments.
March 24th, 2008 at 10:28 am
lee, go to your account and look for the account status button on the top right of the page. It should expand to show you your server, your sql server, and your email server. Whatever name is listed for your email server is also your file server.
March 24th, 2008 at 10:38 am
This may seem like a stupid question but would this outage cause all mail sent to my domain to be returned? ’cause that’s what’s happening.
March 24th, 2008 at 10:47 am
My mails are working, but my site is down.
I really need it to be working NOW ! because of SVN
Please fix it!
March 24th, 2008 at 10:47 am
I`ve heard from a friend that this host is good, i had a host for free bcause my site has more than 20k visitors everyday and i`ve moved to dreamhost searching better service and now i see how dunk i am! DREAMHOST SUCKS and i am searching for a new server as fast as i can!
March 24th, 2008 at 10:51 am
Fix it now! i need my sites working now!
March 24th, 2008 at 10:51 am
error Your website does not appear in the Apache configuration file.
warning Your website took longer than 5 seconds to respond.
helpppp
March 24th, 2008 at 10:56 am
Come on guys this is my second year with DreamHost and everytime a problem came up they fixed it.
I have major problems as well right now. And my clients are lashing out on me.
Just give them time to fix the problem.
Just so you know I have Private Server here so I am definately more in problem than you are.
March 24th, 2008 at 11:06 am
well, my problems started today morning, and I already have my gmail inbox filled with emails from people telling me: “omg! your site is down! OMG! fix it please!”
Well, i dont blame them,they need the content i am offering them badly. lol
i cannot access via FTP nor SSH nor SFTP, and my site is down and so is my mail :/
the weird thing is, my webserver is shinobi, so i think it is within the cluster or somthing :S
Been with DH since November 2007, not many problems, i think this is the third one or so, the previous ones were fixed pretty quickly
DH guys please try to fix this soon.
March 24th, 2008 at 11:20 am
What is the time frame on getting this resolved? I just got started and am extremely dissapointed with your company so far.
Lisa
March 24th, 2008 at 11:23 am
Please, please, please, please, pretty please fix it.
March 24th, 2008 at 11:24 am
>>> DOWN FOR 8+ HOURS WITHOUT ANY RESPONSE FROM DREAMHOST SUPPORT!!! >> DOWN FOR 8+ HOURS WITHOUT ANY RESPONSE FROM DREAMHOST SUPPORT!!!
March 24th, 2008 at 11:25 am
tracert blingy.mail.dreamhost.com
Tracing route to blingy.mail.dreamhost.com [208.113.200.47]
over a maximum of 30 hops:
1
March 24th, 2008 at 11:31 am
Hello All,
I am brand new to hosting my site on dreamhost. I would really appreciate it if someone wouldn’t mind speaking with me offlist at dhlevit@mindspring.com
Thanks,
Donny