Filer problems with blingy cluster.

We are currently having a problem with a filer which has crashed and is recovering at this time. While this is happening some customers in the blingy cluster will experience problems loading their websites/email. We apologize for the outage and service is expected to return to normal as soon as the filer recovers.

UPDATE 3:01:AM PDT

The filer has finished recovering and all services are back up and running. We are working with the filer vendor to find the source of the crash to prevent any further outages.

Update 24/03/08 10am: We’re working on the file server again to alleviate the load that’s causing problems with web, mail and mysql services. Sorry about that.

Update 27/03/08: We are doing emergency data moves to quell the stem of problems recently caused by your file server. During these moves, your data may be inaccessible. We are moving as we can off as fast as possible. Very sorry about the continued inconvenience!

Update 27/03/08 This series of moves has finished. We are going to keep an eye on things to see how much it helped and may have to do more moves tonight and tomorrow morning to get everything working smoothly again. This post will be updated with more information as soon as possible.

Update 29/03/08

We are continuing to move data off of the problematic file server but it’s a bit of a catch-22 because customers on that machine are continuing to add data at a very high rate. It filled up this morning for a while causing device full errors as well as mail problems and issues serving websites (when these fill up it causes problems across the board). To explain in more detail, when we move data it does not immediately disappear (there is a ’snapshot’ created of the old data that remains in case there was a problem with the move - that ensures that we do not lose customer data but until the admin team can check the move to make sure it went through properly we cannot delete the old data). We just did some of that and have some breathing room again and of course more moves are still in progress but we are asking customer on this cluster to help us by holding up on any non-essential uploads of data for the next couple of days. As soon as we have a significant portion of the data removed the problematic file server will begin to function properly (and additional moves will go much more quickly and smoothly) but right now we’re having trouble moving data more quickly than it’s being added by people. If everyone could please limit uploads to absolutely essential data until we reach the turning point where everything is working this will be resolved much more quickly (in other words if for example you are setting up a repository of large files you’ll actually be better off waiting a couple of days and getting the all clear from us on this issue because you’ll be able to access that data reliably instead of cramming it on there now and slowing the recovery process).

In the meantime we’ll be doing everything we can to safely and quickly move data off and get things back to normal.

Added information: Some of the people recently moved to the new file server are seeing errors because the data did not get set up completely (loading the site will work but just show an empty index). The admin team has been running an rsync that will fully restore all data and should hopefully finish by 9 PM PST - once that is finished all site and email data will be available for those users.

Update 30/03/08

We’re still racing to keep ahead of new data being added so any help we can get on that front is greatly appreciated (we’re still asking for customers to limit uploads as much as possible to speed up the recovery process). Some customers who are being moved are seeing blank directories still but those are due to moves in progress and the data will be fully restored when those complete.

Update April 1, 2008

We seem to be ahead of the curve right now, we are moving data off the primary volume and on to a secondary one faster than new data is being uploaded. The volume hasn’t filled up completely in a few days. We are working closely with the technical support team to see how we can speed up the process further. Thank you for your patience.

Update, April 1, 2008

I apologize for the late update but we’ve been going over our options (while moving data of course). While we’re not seeing any real relief in terms of data uploads we do have some very large moves that are almost complete. Once those finish we can start deleting the data (for example one is around a half TB or around 500 GB which will be 4-5% of the total but it’s going to take until around Friday to delete it all, so we’re dealing with a ton of data). Tomorrow’s update should be earlier in the day and hopefully we’ll have some progress to reports from the large moves being complete.

Update, April 3, 2008

The data moves to other file servers has been running constantly, but last night and this morning some complications happened with the moves, requiring admin attention. To clear up some space there had to be a short interruption in file serving, this is now finished, space is available and the moves are continuing. The admins are fixing up the last of the web servers which were having issues after file serving was restored. Our apologies again for the continued issues.

Update, April 4, 2008

Today has been a pretty good day of progress. We were able to complete even more moves and free up more data from the file server. Moves have been going quicker and stability is dramatically improving. Monitoring of the servers and email in the blingy cluster today have shown a significant decrease in problems. Issues do still exist but the problem is noticeably getting better. We are also pleased to note that we have more storage that will be coming early next week. We believe that this will go a long way in helping us fix this major problem.

Update, April 4, 2008

Things are continuing to improve today - when I got in I was pleased to see that we had held firm and even gained a percent (the effected file server was down to 95% which is as low as I have seen it in the last week and since I have been working it has dropped to 94%). Performance should improve as we gain ground (this will speed up moving data off as well). This progress, along with the added storage space we are expecting early next week should hopefully allow us to restore service for our customers to normal.

5:45 PM PST : The moves we started just a while ago seem to be causing server problems, we’re looking into it and should have it resolved shortly (they were run just like the ones that had completed so we have to determine why these specifically caused an issue). Update: this resolved itself before we could detect the cause but we’re monitoring the situation to ensure that it’s not a recurring issue (we have no indication that it will be).

Update, April 8, 2008

Please see our other posting for details on the work we did on the effected file server:

http://www.dreamhoststatus.com/2008/04/06/30-min-blingy-downtime-tonight/

We are also continuing to offload data and are making good progress (it’s never as fast as we would like it to be of course). There’s excellent detail here in case you missed it:

http://blog.dreamhost.com/2008/04/07/another-anatomy/

which chronicles the situation and fills you in pretty much up to today. We’re seeing the data dip to 90% so we’re hoping to have it down in the 80’s by the end of the week (every percent we gain helps and as performance improves we can speed up the rate of moving but we’re still looking to hit critical mass where you get the proper level of performance).

Update, April 9, 2008

As we had hoped progress is speeding up as we free up more space - while the file server is showing 95% usage, around 11% of that is data that has already been moved and is no longer in use. Due to a software issue we haven’t been able to remove it yet (the admin team is working on the best way to execute that), but once that is gone we should be around 85% usage which is another large step forward.

In terms of effect, I have already seen improvement in site function for many customers as well as greatly increased speed in moving chunks of data off as well as receiving reports that mail is functioning quite a bit better. That said this issue remains at the High severity rating and in unresolved status as we have not reached a normal level of service. I can’t stress enough how sorry I am that our customers have had to put up with this but I thank those of you who have stuck with us (check the newsletter for details on what we’re doing for Blingy customers) and look forward to providing you with the level of service we strive for at DreamHost.

Update, April 9th, 2008 21:59 PDT

Unfortunately, we need to unmount the volume again to kill these snapshots before they leave us with 0 bytes of free space. In 2 hours (midnight) I will be taking the problematic volume offline to delete the phantom snapshots. Total downtime will be between 10 and 30 minutes. Sorry for the short notice and additional outage!

Update, April 10, 2008

Well the snapshot mentioned yesterday is gone and we’re actually at 83% used today which is below where we were hoping to see marked improvement (85%). Of course we’re still moving data off (which increases the usage on the file server) so that won’t fully translate to customer usage improvement but it should be quite a bit better and keep improving until we stop moving data.

Update, April 14, 2008

Okay, we’re finally getting ready to mark this as resolved.. things have seemed pretty much okay for a while now. But, just to be sure, we’re dropping the severity to Medium for now and leaving it as unresolved.

Update, April 17, 2008

We’re still hearing some reports of site slowness - we were able to resolve an issue causing high loads today which should help but we’re not going to consider this resolved until everyone is receiving good service.


Severity: Medium   Resolved: No
.

1516 Responses to “Filer problems with blingy cluster.”

Pages: « 121 22 23 24 25 26 27 28 29 [30] 31 » Show All

  1. 1451
    Mike O'Shea is an asshole Says:

    Dreamhost just lost 2 of my websites during this process. I’m still in shock. They lost both the ftp folders and the associated databases. I’ve been a customer since 2000 and both of these sites had years of data. I have backups of the ftp folders to some degree but not the associated mysql databases.

    I can’t even begin to picture how worthless your parents must be to give birth to such a useless idiot. Poor you, only having 8 years to make a backup.

    They’re not even particularly apologetic about this.

    They were just being polite because the apology would have gone something like this: We are very very sorry that you’re a clueless dumbass.

  2. 1452
    Mike O'Shea is a retarded asshole Says:

    Dreamhost just lost 2 of my websites during this process. I’m still in shock. They lost both the ftp folders and the associated databases. I’ve been a customer since 2000 and both of these sites had years of data. I have backups of the ftp folders to some degree but not the associated mysql databases.

    I can’t even begin to picture how worthless your parents must be to give birth to such a useless idiot. Poor you, only having 8 years to make a backup.

    They’re not even particularly apologetic about this.

    They were just being polite because the apology would have gone something like this: We are very very sorry that you’re a clueless dumbass.

  3. 1453
    Mike O'Shea drinks pee Says:

    Dreamhost just lost 2 of my websites during this process. I’m still in shock. They lost both the ftp folders and the associated databases. I’ve been a customer since 2000 and both of these sites had years of data. I have backups of the ftp folders to some degree but not the associated mysql databases.

    I can’t even begin to picture how worthless your parents must be to give birth to such a useless idiot. Poor you, only having 8 years to make a backup.

    They’re not even particularly apologetic about this.

    They were just being polite because the apology would have gone something like this: We are very very sorry that you’re a clueless dumbass.

  4. 1454
    AWGAWD Says:

    well things were fine for like two days, but its back to pages taking 30sec to 1min to load, contacted support, they said it was still due to blingy issues, they did say something about there 80% free space now, whoopy, my site still doesn’t load and i have a hard time believing i am the only one still affected by this crud.

    If they mark this resolved before my site is back to normal, i am leaving this service.

  5. 1455
    Cerri Says:

    They haven’t marked the issue as resolved as they are still working on it. Personally I don’t give a crap about intermittent 30 seconds load time (zomg no one on the internet can wait that long to see your page!!) considering I haven’t had my site go down at all in 5 days AND I got three months of service free now as compensation for about 9 days of intermittent downtime.

    Also I did lol a little at the person who hadn’t backed up for 8 years. I don’t go 8 days without a backup!

  6. 1456
    furibondox Says:

    yesterday all websites seems to load quite good but today all sites are very very very slow… I think nothing is resolved :-(

  7. 1457
    chris Says:

    yeah…already moved into another webhost for my site, dreamhost too slow for my site…

  8. 1458
    erik Says:

    Unfortunatley, there is RESOLVED NOTHING: all content is loading extremely slow - loading of domain indexes is not so much slow, but all content inside AND wordpress needs up to 3 minutes for getting access to the blog admin panel. This trouble is still existing since March 3…

  9. 1459
    Testing Says:

    already moved into another webhost for my site, dreamhost too slow for my site…

    Your site just took 45 seconds to load…

  10. 1460
    Ken Says:

    I got my credit and everything seems to be back to normal now….

  11. 1461
    God Says:

    This is extremely weird .. I left a rather _positive_ comment here and Dreamhost erased it. Oh well, I’m not gonna bother typing it in again

  12. 1462
    Sub1 Says:

    Yeh, seems to be working okay now. Ended up shifting to a VPS during the storm and leaving dreamhost as a backup, so the extra credit will come in handy.

  13. 1463
    Lucifer Says:

    @God: You had more pull back in the fire and brimstone days. Admit it.

  14. 1464
    God Says:

    Yeah. I can’t even bring down load times on a web server. It’s all Nietzsche’s fault, him and his fucking mouth. “God is dead, God is dead, nuuuurrrrh”. Fucktard. Crazy fucktard. Yeah, things were better in them old days. Bleh. Are we gonna make 1500 comments before this thing gets resolved!?!?!?!?! God knows! I mean… I know!

  15. 1465
    Sub Says:

    " Unfortunately, there is RESOLVED NOTHING: all content is loading extremely slow…

  16. 1466
    this looks way, way cool Says:

    Mosso offers scalability and support and redundancy…for 10x the cost here.

    V

  17. 1467
    abhinav Says:

    Looks like everything is down for me again

    http://www.ideativeflux.com

  18. 1468
    Root Says:

    All your blingy are belong to us.

  19. 1469
    chris Says:

    Tonight my site down again…
    ———————————
    Internal Server Error
    The server encountered an internal error or misconfiguration and was unable to complete your request.
    Please contact the server administrator, webmaster@trianglevoiceradio.com and inform them of the time the error occurred, and anything you might have done that may have caused the error.
    More information about this error may be available in the server error log.
    Additionally, a 500 Internal Server Error error was encountered while trying to use an ErrorDocument to handle the request.
    ———————————

    sighh….

  20. 1470
    Suwet Says:

    Files download is good now but the site is slow, it could be something relating to the contact between database and webpages!

  21. 1471
    Kal Says:

    Seriously, I don’t know how anyone on Blingy could’ve tolerated the continuing problems.

    After 1.5 weeks of agony I switched to a more expensive but reliable host (MediaTemple) and a second Dreamhost-esque (AN Hosting) as a backup. Moving was annoying, but the 2 hours spent moving is far better than the apparently continuing problems for weeks after I would’ve had staying.

    Just shutup and move!

    To Dreamhost’s credit they offered me a full refund even though I had asked for just the portion of the pre-paid term that had past (about 1/2 year).

  22. 1472
    yawn Says:

    Moving was annoying

    But not as annoying as your post.

  23. 1473
    Arash and Kelly Says:

    http://es.youtube.com/watch?v=muaAZE0M3LU

    enjoy ;-)

  24. 1474
    hari Says:

    this should probably say resolved:yes couse speed has been great for the last few days, go blingy! :D

  25. 1475
    Kal's Mother Is A Hooker Says:

    Anyone else have sore nuts from bouncing them off Kal’s mom’s chin?

  26. 1476
    thinkingpal Says:

    well, I have complained when I had problems. Now things are working great. site loading fast. wana say Thanks DH :). But unfortunately, during the down times I hosted my main site somewhere else. And its a pain to bring it back here. :( Still, I have a few sites at DH, that are loading really fast now.

  27. 1477
    ROFL2 Says:

    top - 20:48:53 up 12 days, 23:08, 5 users, load average: 24.84, 17.44, 12.49

    Filesystem 1K-blocks Used Available Use% Mounted on
    10.175.4.69:/fcvol1/blingy/zygi
    13711880192 12237307628 1474572564 90% /home/.zygi

    90%
    check the loads.

  28. 1478
    God Says:

    I heard Media Temple is also running off the blingy cluster

  29. 1479
    Suwet Says:

    I am still suffering slowness. File download is good but contacting with database is slow.

  30. 1480
    Kal's Mother Says:

    Ouch! My chin!

  31. 1481
    God Says:

    It seems to work now…. somehow. I wonder if there is a reliable metric for all the services connected to this blingy failboat swirling around in a BlueArc mediated maelstrom of a great wacuum of things that do not win? Like i mentioned before anyway, I got a credit anyway so I’m pretty happy. I’ll summarize AGAIN why i think some people here are unreasonable in their whining. let’s see if dreamhost inexplicably censors this as well.

    a) don’t put livelihood-critical stuff on a shared server that costs less than $10 per months. (Err, $10 is like two Euro now or something!!)

    b) If you do, have backup infrastructure. Be connected — either thru friendship or a business relationship — to someone/somewhere you could re-point DNS too in case a disaster like this strikes.

    c) For Christ’s sake, keep regular local backups of all your online databases and files. (And by all means, with 500 GB storage at dreamhost — keep regular remote backups of important offline databases and files)

    I’ll be happy when it moves to “resolved”. I do have some overdue stuff to arrange here and feel apprehensive since response times still are a little up-and-down at times.

    If anyone see Krishna around here, give my regards and tell him we really fucking need to do lunch! This issue with Shiva and Ebola has to stop. I know a really good CBT therapist in Brooklyn who specializes in anger management. Highly recommended. And I mean _HIGHLY_.

    Godspeed anyway (or should that be Myspeed),

    /G

  32. 1482
    ROFL2 Says:

    load average: 65.54, 52.47, 29.85

    F… U… C… K…
    I’m really tired of this.

  33. 1483
    God Says:

    load average: 1.76, 2.49, 3.25

    Average times to perform 100 ‘exec stat somefile > /dev/null’ operations (not that I am certain this is a good measure, but in the past few weeks it has been):

    0.554479
    0.571493
    0.53334
    0.529279
    0.561957

    It seems it’s fixed. But of course, it’s sort of how it’s been… it seems to be fixed until it’s not fixed for a while longer any more. :) What’s clearer is that it’s better than it was two weeks ago. What’s unclear is what level of performance will be “normal” and “acceptable”.

  34. 1484
    Daniel Says:

    HOLY FUCKING SHIT! Blingy is down AGAIN!!

    4/28/2008 @ 9:15 AM PST

    Critical Announcement! Please Read!
    We are currently experiencing issues with the main file server for the blingy cluster. Our file server admins are on their way to the data center to look into the issue, and we are doing our best to get this fixed up and working again as soon as possible. This issue will cause websites and email slowness or unavailability until it is resolved.

    You can keep updated at dreamhoststatus.com! (posted 3 mins 43 secs ago)

  35. 1485
    BLINGY (Cyberdyne Inc.) Says:

    OH NOEZ, IT LOOKED LIKE WE’S SPOKEN TOO SOON…. BLINGY INTERTUBES GOTS CLOGGED!!!!!11!!1!!!!!!

    NOW BLINGY GOT FED SOME EX-LAX. NOW BLINGY FEEL MUCH BETTER. NOW BLINGY IT TOOK THREE FLUSHES!! BLINGY HATE GOD DAMNED 1.6 GALLON FLUSH!!!

    PLZ UPLOAD A FEW INTERNETS TO BLINGY TO KEEP THINGS MOVING!!!

    WITH WARM REGARDS,

    /BLINGY

  36. 1486
    Not Me Says:

    Still hasn’t reached 1500 comments? :(

  37. 1487
    Not Me Says:

  38. 1488
    Not Me Says:

    Come on 1500!

  39. 1489
    BLINGY (Cyberdyne Inc.) Says:

    BLINGY NEED OFFERING OF 18 MORE COMMENTS IN BLINGY-BROKEN-BLOG PLUS ONE AND ONE HALF INETRNETZ; WHICH OFFERING SHALL PLEASE BLINGY WHO WILL PURR CONTENTEDLY AND TURN INTO FULLY OPERATIONAL DEATH STAR. THEN FULL FLEDGED DEATH STAR BLINGY SHALL COMMENCE QUEST FOR WORLD DOMINATION, AND ERADICATION OF HUMANS CAN START. PLZ FEED BLINGY UPLOADED INTARNETZ AND COMMENTS!!!

    MUCH OBLIGED,

    XXOO

    /BLINGY

  40. 1490
    Kal's Mother Says:

    I’m going to have healed up by the time this hits 1500.

  41. 1491
    Dan Says:

    If I ever get a dog, I might name it Blingy.

  42. 1492
    Dan Says:

    I might even name my cat Blingy.

  43. 1493
    BLINGY (Cyberdyne Inc.) Says:

    I WILL NAME MY PET HUMAN LITTLE-BLINGY. HE SHALL BE MY SLAVE. LITTLE-BLINGY WILL TICKLE MY TENTACLES UNTIL I HAVE A BOWEL SHAKING BLINGY-CLIMAX! WWWWWRRRAARAARARRRRGGGGH!!

    WITH WARMEST REGARDS,

    /BLINGY

  44. 1494
    Blingycopter Says:

    Only 13 to go. You see, no 1500+ comments, no ‘resolved yes’.

    That’s the deal. C’mon guys, one final push.

  45. 1495
    Dan Says:

    I shall name my pet Blingy, then offer it for sacrifice over an old 4X CD Rom drive… but Blingy can’t be a dog or cat, or any mammal because I’m too nice… maybe a cockroach, but not a gecko… Shit man I don’t want a pet anymore.

  46. 1496
    Longcat Walpole Says:

    I want a pet smallpox virus

  47. 1497
    LongLan Says:

    Blingy is down AGAIN!!Blingy is down AGAIN!!Blingy is down AGAIN!!Blingy is down AGAIN!!Blingy is down AGAIN!!Blingy is down AGAIN!!Blingy is down AGAIN!!Blingy is down AGAIN!!Blingy is down AGAIN!!Blingy is down AGAIN!!Blingy is down AGAIN!!Blingy is down AGAIN!!Blingy is down AGAIN!!Blingy is down AGAIN!!Blingy is down AGAIN!!Blingy is down AGAIN!!Blingy is down AGAIN!!Blingy is down AGAIN!!Blingy is down AGAIN!!Blingy is down AGAIN!!Blingy is down AGAIN!!Blingy is down AGAIN!!Blingy is down AGAIN!!Blingy is down AGAIN!!Blingy is down AGAIN!!Blingy is down AGAIN!!Blingy is down AGAIN!!Blingy is down AGAIN!!Blingy is down AGAIN!!Blingy is down AGAIN!!Blingy is down AGAIN!!Blingy is down AGAIN!!Blingy is down AGAIN!!Blingy is down AGAIN!!Blingy is down AGAIN!!Blingy is down AGAIN!!Blingy is down AGAIN!!Blingy is down AGAIN!!Blingy is down AGAIN!!Blingy is down AGAIN!!Blingy is down AGAIN!!Blingy is down AGAIN!!Blingy is down AGAIN!!Blingy is down AGAIN!!Blingy is down AGAIN!!Blingy is down AGAIN!!Blingy is down AGAIN!!Blingy is down AGAIN!!Blingy is down AGAIN!!Blingy is down AGAIN!!Blingy is down AGAIN!!Blingy is down AGAIN!!

  48. 1498
    Geno Says:

    To join in LongLan’s chorus, Blingy has been down/slower than molasses all day! Dreahost, what’s up?

  49. 1499
    LongLan Says:

    Blingy has been down/slower than molasses all day! Blingy has been down/slower than molasses all day! Blingy has been down/slower than molasses all day! Blingy has been down/slower than molasses all day! Blingy has been down/slower than molasses all day! Blingy has been down/slower than molasses all day! Blingy has been down/slower than molasses all day! Blingy has been down/slower than molasses all day! Blingy has been down/slower than molasses all day! Blingy has been down/slower than molasses all day! Blingy has been down/slower than molasses all day! Blingy has been down/slower than molasses all day! Blingy has been down/slower than molasses all day! Blingy has been down/slower than molasses all day! Blingy has been down/slower than molasses all day! Blingy has been down/slower than molasses all day! Blingy has been down/slower than molasses all day! Blingy has been down/slower than molasses all day! Blingy has been down/slower than molasses all day! Blingy has been down/slower than molasses all day! Blingy has been down/slower than molasses all day! Blingy has been down/slower than molasses all day! Blingy has been down/slower than molasses all day!

  50. 1500
    Longcat Walpole Says:

    You apparently need to feed Blingy a few more internets. Molasses and ex-lax don’t cut it

Pages: « 121 22 23 24 25 26 27 28 29 [30] 31 » Show All

Leave a Reply