Filer problems with blingy cluster. |
We are currently having a problem with a filer which has crashed and is recovering at this time. While this is happening some customers in the blingy cluster will experience problems loading their websites/email. We apologize for the outage and service is expected to return to normal as soon as the filer recovers.
UPDATE 3:01:AM PDT
The filer has finished recovering and all services are back up and running. We are working with the filer vendor to find the source of the crash to prevent any further outages.
Update 24/03/08 10am: We’re working on the file server again to alleviate the load that’s causing problems with web, mail and mysql services. Sorry about that.
Update 27/03/08: We are doing emergency data moves to quell the stem of problems recently caused by your file server. During these moves, your data may be inaccessible. We are moving as we can off as fast as possible. Very sorry about the continued inconvenience!
Update 27/03/08 This series of moves has finished. We are going to keep an eye on things to see how much it helped and may have to do more moves tonight and tomorrow morning to get everything working smoothly again. This post will be updated with more information as soon as possible.
Update 29/03/08
We are continuing to move data off of the problematic file server but it’s a bit of a catch-22 because customers on that machine are continuing to add data at a very high rate. It filled up this morning for a while causing device full errors as well as mail problems and issues serving websites (when these fill up it causes problems across the board). To explain in more detail, when we move data it does not immediately disappear (there is a ’snapshot’ created of the old data that remains in case there was a problem with the move - that ensures that we do not lose customer data but until the admin team can check the move to make sure it went through properly we cannot delete the old data). We just did some of that and have some breathing room again and of course more moves are still in progress but we are asking customer on this cluster to help us by holding up on any non-essential uploads of data for the next couple of days. As soon as we have a significant portion of the data removed the problematic file server will begin to function properly (and additional moves will go much more quickly and smoothly) but right now we’re having trouble moving data more quickly than it’s being added by people. If everyone could please limit uploads to absolutely essential data until we reach the turning point where everything is working this will be resolved much more quickly (in other words if for example you are setting up a repository of large files you’ll actually be better off waiting a couple of days and getting the all clear from us on this issue because you’ll be able to access that data reliably instead of cramming it on there now and slowing the recovery process).
In the meantime we’ll be doing everything we can to safely and quickly move data off and get things back to normal.
Added information: Some of the people recently moved to the new file server are seeing errors because the data did not get set up completely (loading the site will work but just show an empty index). The admin team has been running an rsync that will fully restore all data and should hopefully finish by 9 PM PST - once that is finished all site and email data will be available for those users.
Update 30/03/08
We’re still racing to keep ahead of new data being added so any help we can get on that front is greatly appreciated (we’re still asking for customers to limit uploads as much as possible to speed up the recovery process). Some customers who are being moved are seeing blank directories still but those are due to moves in progress and the data will be fully restored when those complete.
Update April 1, 2008
We seem to be ahead of the curve right now, we are moving data off the primary volume and on to a secondary one faster than new data is being uploaded. The volume hasn’t filled up completely in a few days. We are working closely with the technical support team to see how we can speed up the process further. Thank you for your patience.
Update, April 1, 2008
I apologize for the late update but we’ve been going over our options (while moving data of course). While we’re not seeing any real relief in terms of data uploads we do have some very large moves that are almost complete. Once those finish we can start deleting the data (for example one is around a half TB or around 500 GB which will be 4-5% of the total but it’s going to take until around Friday to delete it all, so we’re dealing with a ton of data). Tomorrow’s update should be earlier in the day and hopefully we’ll have some progress to reports from the large moves being complete.
Update, April 3, 2008
The data moves to other file servers has been running constantly, but last night and this morning some complications happened with the moves, requiring admin attention. To clear up some space there had to be a short interruption in file serving, this is now finished, space is available and the moves are continuing. The admins are fixing up the last of the web servers which were having issues after file serving was restored. Our apologies again for the continued issues.
Update, April 4, 2008
Today has been a pretty good day of progress. We were able to complete even more moves and free up more data from the file server. Moves have been going quicker and stability is dramatically improving. Monitoring of the servers and email in the blingy cluster today have shown a significant decrease in problems. Issues do still exist but the problem is noticeably getting better. We are also pleased to note that we have more storage that will be coming early next week. We believe that this will go a long way in helping us fix this major problem.
Update, April 4, 2008
Things are continuing to improve today - when I got in I was pleased to see that we had held firm and even gained a percent (the effected file server was down to 95% which is as low as I have seen it in the last week and since I have been working it has dropped to 94%). Performance should improve as we gain ground (this will speed up moving data off as well). This progress, along with the added storage space we are expecting early next week should hopefully allow us to restore service for our customers to normal.
5:45 PM PST : The moves we started just a while ago seem to be causing server problems, we’re looking into it and should have it resolved shortly (they were run just like the ones that had completed so we have to determine why these specifically caused an issue). Update: this resolved itself before we could detect the cause but we’re monitoring the situation to ensure that it’s not a recurring issue (we have no indication that it will be).
Update, April 8, 2008
Please see our other posting for details on the work we did on the effected file server:
http://www.dreamhoststatus.com/2008/04/06/30-min-blingy-downtime-tonight/
We are also continuing to offload data and are making good progress (it’s never as fast as we would like it to be of course). There’s excellent detail here in case you missed it:
http://blog.dreamhost.com/2008/04/07/another-anatomy/
which chronicles the situation and fills you in pretty much up to today. We’re seeing the data dip to 90% so we’re hoping to have it down in the 80’s by the end of the week (every percent we gain helps and as performance improves we can speed up the rate of moving but we’re still looking to hit critical mass where you get the proper level of performance).
Update, April 9, 2008
As we had hoped progress is speeding up as we free up more space - while the file server is showing 95% usage, around 11% of that is data that has already been moved and is no longer in use. Due to a software issue we haven’t been able to remove it yet (the admin team is working on the best way to execute that), but once that is gone we should be around 85% usage which is another large step forward.
In terms of effect, I have already seen improvement in site function for many customers as well as greatly increased speed in moving chunks of data off as well as receiving reports that mail is functioning quite a bit better. That said this issue remains at the High severity rating and in unresolved status as we have not reached a normal level of service. I can’t stress enough how sorry I am that our customers have had to put up with this but I thank those of you who have stuck with us (check the newsletter for details on what we’re doing for Blingy customers) and look forward to providing you with the level of service we strive for at DreamHost.
Update, April 9th, 2008 21:59 PDT
Unfortunately, we need to unmount the volume again to kill these snapshots before they leave us with 0 bytes of free space. In 2 hours (midnight) I will be taking the problematic volume offline to delete the phantom snapshots. Total downtime will be between 10 and 30 minutes. Sorry for the short notice and additional outage!
Update, April 10, 2008
Well the snapshot mentioned yesterday is gone and we’re actually at 83% used today which is below where we were hoping to see marked improvement (85%). Of course we’re still moving data off (which increases the usage on the file server) so that won’t fully translate to customer usage improvement but it should be quite a bit better and keep improving until we stop moving data.
Update, April 14, 2008
Okay, we’re finally getting ready to mark this as resolved.. things have seemed pretty much okay for a while now. But, just to be sure, we’re dropping the severity to Medium for now and leaving it as unresolved.
Update, April 17, 2008
We’re still hearing some reports of site slowness - we were able to resolve an issue causing high loads today which should help but we’re not going to consider this resolved until everyone is receiving good service.
| Severity: | Medium | Resolved: | No |
April 3rd, 2008 at 4:39 pm
So I’ve ended up moving to another host and have learned my lesson: if you rely on e-mail/hosting in any capacity you probably need 2 hosts.
I have a more expensive one now (MT) that has a fraction of the storage capacity but I believe should be more reliable that will handle my main site and e-mail (and my client’s).
I’ll sign with a second host (maybe AN Hosting?) for my online storage needs, spillover, and as a backup.
So yes, this outage is fairly outrageous but I have to commend Dreamhost - I asked for a refund from ‘Ralph’ and he took 7+ days to get back to me, but when he did he refunded the entire hosting fee I paid 4 months ago immediately. Now the problem is that he did it without confirming with me so my e-mail went from crappy to none at all. But I e-mailed him back and he’s re-activated my account to allow time for me to setup with my new host and for the DNS to propagate.
So while the technical service from Dreamhost is unreasonably poor, they refunded the money so I’d potentially consider them as a second host…
LESSONS HERE:
-If you’re still complaining here, switch hosts and ask for a refund
-BEFORE you ask for a refund make arrangements with another host and change your DNS settings ASAP
-If you don’t want this to happen again, get a ‘good’ host as a main, and a ‘cheap’ host as a backup
Hope you can learn from my mistakes!
April 3rd, 2008 at 4:42 pm
“Could not connect to database - DB Error: connect failed”
*sigh*
(and i’m not on blingy. neither is my mysql)
April 3rd, 2008 at 5:19 pm
So don’t post here idiot.
April 3rd, 2008 at 5:34 pm
bob my knob, dolt.
i’ve had problems since this “blingy” crap started. it is related, they just won’t say so.
April 3rd, 2008 at 5:39 pm
Spacey has been having similar problems as well here and there but I haven’t gotten a clear answer on what’s going on there either…
April 3rd, 2008 at 6:02 pm
ROFL
So, if your car won’t start in the morning, it’s because your neighbor’s car has a blown engine?
Here’s a tip: cut back on the stupid-pills.
When I can’t get the LOLLERCOASTER up to speed, you don’t see me blaming it on the ROFLCOPTER’s fouled spark plugs, do you?
OMGLMFAO!!!!111!!!!1ONE!!@!!!!!!
April 3rd, 2008 at 7:34 pm
Reading all of these comments about people losing thousands of dollars due to website downtime really makes me wonder whether they should be in this business at all. Any area of IT will ALWAYS have downtime, and the only thing to do about it is make sure there is a “Plan B”.
Dreamhost offers tons of storage, but let’s be realistic, for what they charge I am amazed they have the customer service level they do. As it happens, Dreamhost is my “Plan B” for all my websites and my clients’ websites. All my sites are on dedicated servers which offer much greater reliability, typically at about $200-$250 per month for a server with 100-160GB of storage. Dreamhost is my backup, so if any of the dedicated servers dies, I can be back up and running using Dreamhost very quickly. So, right now my website backups aren’t available. Not a big deal, but I had to check why there were issues with updating the backups and lo and behold, Dreamhost is having a rackfart.
I can’t believe people would invest thousands developing a website that earns income in the thousands and then only spend $5 or $10 per month on hosting. Foolish.
April 3rd, 2008 at 7:42 pm
space is slowly trickling away, it’s starting to get annoying again, at 97% full.
April 3rd, 2008 at 7:59 pm
If they weren’t flat out lying about their income, the answer would be: No.
True, but not likely the case with the people that whine here.
On this site, “multi-million dollar corporation” translates to “made for Adsense blog that generates $4/month.”
The people smart enough to make the money are smart enough to not let it instantly fall apart and whine in blog comments about it.
Their lying, crying and general lack of common sense is what fuels the ROFLCOPTER.
April 3rd, 2008 at 8:06 pm
What about the lulz?
Which is a corruption of LOL…
Which stands for “Laughing Out Loud”
* Yellow van explodes *
April 3rd, 2008 at 8:13 pm
We still get a sweet index page on loading our site. Is this the common theme among people who are on this forum? All E-mail that is on Blingy seems to be fine, but Webhosting is TKO’d.
April 3rd, 2008 at 8:20 pm
LULZ!!11!
April 3rd, 2008 at 8:24 pm
I would rather you guys just turn off the FTP so that you can move the files faster and without interruption from new data.
April 3rd, 2008 at 8:30 pm
I just opened an account on Blingy & I just keep uploading my entire hard drive as fast as I can through my home T3 line. I figured that would help get things synced up faster. I also setup a cronjob that backs Google up to my account every 3 minutes.
April 3rd, 2008 at 8:32 pm
Stop whining like a bunch of babies and grow up and learn what redundancy is, what do you expect when you clog their servers. Damn whiners are the ones to blame, all of my sites are up and still use DH
Cry me a river baby 
April 3rd, 2008 at 8:46 pm
If they lose tons of money, wtf they got a shared hosting account for? Total Newish mofos
April 3rd, 2008 at 9:05 pm
Seattle DJ Says:
April 3rd, 2008 at 9:07 pm
[backquote]#1114
Martin Says:
April 3rd, 2008 at 10:10 am
Well my site is loading now, still taking 3+ minutes to load, at least that’s a start compared to being dead for two days…[/backquote]
Well my site is still up and running but slow as hell.
April 3rd, 2008 at 10:05 pm
I stoped all works on my account, please, do what you need
Best wishes to our support, we wait! Sites benediktxvi.ru benedictxvi.tv sofit.info
(Sorry, didn’t see this information before, now I understand. We need good speed.)
April 3rd, 2008 at 11:33 pm
My site costs me about $20 a day so DH actually saved me about $15 before they fixed my site!
April 3rd, 2008 at 11:58 pm
sorry guys — these kind of issues need to be fixed faster than this. if a filer is down, it should not take more than 1 day to fix it.
April 4th, 2008 at 12:06 am
@1028 (Jerry):
> Jason P - The other hosts aren’t “stealing” these customers (I too have switched my hosting for my busiest sites) - it’s how
> free markets work - when someone sucks at what they do, you have to expect that others are going to take advantage of it.
> If dreamhost doesn’t want others stealing their customers, they should get their act together - there are so many software
> packages that can monitor for the problems they’re experiencing - the fact that it made it here and they did nothing till this
> point tells you something about their management or their tech workers.
Sorry I didn’t see your response at first. First off, when I say stealing, I am referring to the ones who find it suitable to go and spam this blog post, which SHOULD be only for comments about this issue, not for some hosting company to try and make a few extra bucks. I apologize, I guess I did not word that correctly. As well, DreamHost does not suck at what they do, if they did, how could they still be around after 10 years? So you do know, because apparently you don’t, this issue was caused by a faulty filler which did not meet the specs that were guaranteed by the manufacture. Now that it is repaired the downtime is being caused by the transfer of files from the old system to the new one. This could take only 3-6 hours to accomplish the entire transfer, however because of the rate that people are uploading files to the old server (to my knowledge, it is not possible to upload directly to the new system without disabling everything on the old) DreamHost is unable to gain very much ground on the transfer. For every GB of data they transfer, another GB is uploaded.
April 4th, 2008 at 12:26 am
Since i’m not an expert when it comes to whatever ‘clusters” are… (sounds more like a turd, or nuts in a candy bar…) and since support is obviously not answering my questions, maybe someone here can help…
My account is only 2 weeks old. I installed wordpress 2.5 a couple days ago. Everything was great then yesterday I got an “Error establishing a database connection” Files are all still there though.
What I don’t understand is, is EVERYTHING on my account reliant on the damaged cluster? If I just build a new database and reinstall everything will it install on a different cluster or am I just SOL and my 2 week old account is dead in the water for what they are saying could be another 2 weeks?
Can somebody help? We have deadlines and NO answers…
April 4th, 2008 at 12:37 am
I propose we nominate nightmare host for a guiness book of world records award for the longest time taken to recover from whatever the shit that happened…
April 4th, 2008 at 1:05 am
Bing!
April 4th, 2008 at 1:06 am
Holy Shit! Dreamhost do webhosting?! WTF?
April 4th, 2008 at 1:36 am
Anyone knows of a host for adult content?
I m going to leave as well
Shit…I wish i knew about FTP and all that crap
April 4th, 2008 at 1:57 am
ok my website is down again :/
thanks DH
April 4th, 2008 at 2:12 am
We hosted some projects on DH and we are still waiting to show them to our clients. How the hell could we show our job if you can only guarantee this pathetic quality? I heard positive things about DH, but after being here for a month I really think that the service you’re offering is a total waste of money… and time.
April 4th, 2008 at 2:24 am
As you can see in Wikipedia, “a computer cluster is a group of loosely coupled computers that work together closely so that in many respects they can be viewed as though they are a single computer”.
Therefore, if the machine in which all your file reside is a part of the affected cluster… you have approximately 500 Gb of disk space, and every byte of those Gb are in the same cluster, so don’t waste your time by deleting and re-uploading contents.
April 4th, 2008 at 2:37 am
ROFL
LMAO
OMGLOLCAT
LOLLERSKATING
LMFAO
LULZ
LOL
April 4th, 2008 at 3:49 am
Did anyone notice that nightmarehost.com is actually owned by MediaTemple?
April 4th, 2008 at 4:11 am
owned? wtf?
April 4th, 2008 at 5:59 am
DreamHost still exists? I’m surprised people still up up with their bullshit. Tisk tisk.
April 4th, 2008 at 6:01 am
My website is down yet
Can somebody help?
April 4th, 2008 at 6:16 am
I don’t understand why, a big company let websites down, all my eggs are in the same place.
So when my websites are down : all are down too…
They work 2 hours by day : so why ??? i have not so much files : maybe 100 mo for all my websites of storage.
So i don’t understand why Dreamhost tell that the transfer is long.
I’d like to be cool and tells me that i understand because of numbers of websites on clusters, but now i can’t wait anymore.
During 2 weeks all my websites work 2hours by day and slow, with not so much transfer on…
It’s unbelievable !!
April 4th, 2008 at 6:35 am
So, regarding the obvious fact that this has been borked for a little over 2 weeks, is there going to be any changes to our billing to underline the fact that we’ve not been getting what we paid for?
Or is the dreamhost motto “Go fuck yourself” ?
April 4th, 2008 at 6:37 am
You guys don’t get it. There is no such a thing as a specific “blingy filer problem”.
This month is blingy, next moth is spunky and next one is whothef*ck knows. Now, get this: that is how dreamhost is gonna be from now on, they have to make up for the “fat finger” losses (see: http://blog.dreamhost.com/2008/03/21/good-reminiscing-friday ) with cheap and unreliable hardware and cheap and unreliable service.
Get used to it or go elsewhere.
April 4th, 2008 at 6:46 am
@1186 MArk
> we’ve not been getting what we paid for?
You think so? What do you think less than 8$/month entitles you to? We get what we pay for man…
Again, if you want good service and are willing to pay more… go elsewhere.
April 4th, 2008 at 6:59 am
Well damn… I just lost my highest-paying client.
Oh well. They were shitwads anyways.
April 4th, 2008 at 7:06 am
@1188 Bob
> What do you think less than 8$/month entitles you to? We get what we pay for man…
Um, well, that’s the problem - $8/month is *something*, whereas sites impacted by this problem have received *negative* value for the money. DH has not even come close to living up to their side of the service contract - the combination is the actual technical issue (bad, but sh..stuff happens), the poor judgment on “how long to fix” and “we’ll try to keep everything going while we fix it” (amateurish missteps), and the terrible service sense in failing to keep us apprised of progress (deadly).
April 4th, 2008 at 7:18 am
@1190 Vex
You are absolutely right. What I’m trying to say is that overselling is DH’s primary businness model. Overselling is when a business (or individual) offers more of an product or service than they currently have. I know that, you know that and DH know that. Overselling works well as long as nothing goes wrong, it’s *cheap* and it’s good for customers, or so they say… but when shit happens, you can’t really complain, you know what the game was from the begining, don’t you?
April 4th, 2008 at 8:08 am
I agree. I worked for a tour operator and they usually overbooked the hotels they managed… overbooking is fantastic, as long as nothing goes wrong… but when you have tourists who paid for a service which you’re not able to give them… things begin to be awful.
Anyway… it’s not important how cheap could be a given service: you paid something because of a promise of an enterprise and with some expectation about the quality you’re going to receive. If you insist that you’re very good doing something and, afterwards, aññ goes wrong… the fact that the service for which you paid was very cheap is not a matter.
April 4th, 2008 at 8:12 am
It took 5 days to delete 500GB of data, and it took 2 days for us to fill it up again. Nice work DH.
April 4th, 2008 at 8:37 am
#
[blockquote]#1167
Martin Says:
April 3rd, 2008 at 9:07 pm
[blockquote]#1114
Martin Says:
April 3rd, 2008 at 10:10 am
Well my site is loading now, still taking 3+ minutes to load, at least that’s a start compared to being dead for two days…[/blockquote]
Well my site is still up and running but slow as hell.[/blockquote]
My site is dead again, it was working for a couple hours at most. Thankyou dreamhost. My site has been dead for three days and running shit for two weeks. Not happy.
Been making enquiries at a couple other hosts.
April 4th, 2008 at 9:12 am
Well, it’s not like I have to run a business or anything. Just go ahead, Dreamhost, and get to fixing it when you feel like it.
April 4th, 2008 at 9:15 am
Does anybody know how that spring groundhog thing went this year?
April 4th, 2008 at 9:42 am
@1196 The other Bob said
> Does anybody know how that spring groundhog thing went this year?
Fuck LOL. This comment is hilarious!
April 4th, 2008 at 9:55 am
@1193
No, they only freed up 500 Meg.
Instead of running 1 massive job which frees up 500 Gig, they should first run a job that frees up
1 gig. 400 meg remaining seemed to be when the server started slowing down.
April 4th, 2008 at 10:13 am
To all blingy users:
Please e-mail support and ask them to shut down ftp uploading until the performance problem is fixed.
I suspect this will only take 1 or two days of ftp shutdown to get 1 gig free (which should be enough to
allow the rest of the jobs to finish)
If they get enough e-mail from different users, they will hopefully do it. I think they are running on
some blind faith that users will stop uploading on their own. E-mail them every day telling them
to shut down uploading until they get the point!!