Filer problems with blingy cluster.

We are currently having a problem with a filer which has crashed and is recovering at this time. While this is happening some customers in the blingy cluster will experience problems loading their websites/email. We apologize for the outage and service is expected to return to normal as soon as the filer recovers.

UPDATE 3:01:AM PDT

The filer has finished recovering and all services are back up and running. We are working with the filer vendor to find the source of the crash to prevent any further outages.

Update 24/03/08 10am: We’re working on the file server again to alleviate the load that’s causing problems with web, mail and mysql services. Sorry about that.

Update 27/03/08: We are doing emergency data moves to quell the stem of problems recently caused by your file server. During these moves, your data may be inaccessible. We are moving as we can off as fast as possible. Very sorry about the continued inconvenience!

Update 27/03/08 This series of moves has finished. We are going to keep an eye on things to see how much it helped and may have to do more moves tonight and tomorrow morning to get everything working smoothly again. This post will be updated with more information as soon as possible.

Update 29/03/08

We are continuing to move data off of the problematic file server but it’s a bit of a catch-22 because customers on that machine are continuing to add data at a very high rate. It filled up this morning for a while causing device full errors as well as mail problems and issues serving websites (when these fill up it causes problems across the board). To explain in more detail, when we move data it does not immediately disappear (there is a ’snapshot’ created of the old data that remains in case there was a problem with the move - that ensures that we do not lose customer data but until the admin team can check the move to make sure it went through properly we cannot delete the old data). We just did some of that and have some breathing room again and of course more moves are still in progress but we are asking customer on this cluster to help us by holding up on any non-essential uploads of data for the next couple of days. As soon as we have a significant portion of the data removed the problematic file server will begin to function properly (and additional moves will go much more quickly and smoothly) but right now we’re having trouble moving data more quickly than it’s being added by people. If everyone could please limit uploads to absolutely essential data until we reach the turning point where everything is working this will be resolved much more quickly (in other words if for example you are setting up a repository of large files you’ll actually be better off waiting a couple of days and getting the all clear from us on this issue because you’ll be able to access that data reliably instead of cramming it on there now and slowing the recovery process).

In the meantime we’ll be doing everything we can to safely and quickly move data off and get things back to normal.

Added information: Some of the people recently moved to the new file server are seeing errors because the data did not get set up completely (loading the site will work but just show an empty index). The admin team has been running an rsync that will fully restore all data and should hopefully finish by 9 PM PST - once that is finished all site and email data will be available for those users.

Update 30/03/08

We’re still racing to keep ahead of new data being added so any help we can get on that front is greatly appreciated (we’re still asking for customers to limit uploads as much as possible to speed up the recovery process). Some customers who are being moved are seeing blank directories still but those are due to moves in progress and the data will be fully restored when those complete.

Update April 1, 2008

We seem to be ahead of the curve right now, we are moving data off the primary volume and on to a secondary one faster than new data is being uploaded. The volume hasn’t filled up completely in a few days. We are working closely with the technical support team to see how we can speed up the process further. Thank you for your patience.

Update, April 1, 2008

I apologize for the late update but we’ve been going over our options (while moving data of course). While we’re not seeing any real relief in terms of data uploads we do have some very large moves that are almost complete. Once those finish we can start deleting the data (for example one is around a half TB or around 500 GB which will be 4-5% of the total but it’s going to take until around Friday to delete it all, so we’re dealing with a ton of data). Tomorrow’s update should be earlier in the day and hopefully we’ll have some progress to reports from the large moves being complete.

Update, April 3, 2008

The data moves to other file servers has been running constantly, but last night and this morning some complications happened with the moves, requiring admin attention. To clear up some space there had to be a short interruption in file serving, this is now finished, space is available and the moves are continuing. The admins are fixing up the last of the web servers which were having issues after file serving was restored. Our apologies again for the continued issues.

Update, April 4, 2008

Today has been a pretty good day of progress. We were able to complete even more moves and free up more data from the file server. Moves have been going quicker and stability is dramatically improving. Monitoring of the servers and email in the blingy cluster today have shown a significant decrease in problems. Issues do still exist but the problem is noticeably getting better. We are also pleased to note that we have more storage that will be coming early next week. We believe that this will go a long way in helping us fix this major problem.

Update, April 4, 2008

Things are continuing to improve today - when I got in I was pleased to see that we had held firm and even gained a percent (the effected file server was down to 95% which is as low as I have seen it in the last week and since I have been working it has dropped to 94%). Performance should improve as we gain ground (this will speed up moving data off as well). This progress, along with the added storage space we are expecting early next week should hopefully allow us to restore service for our customers to normal.

5:45 PM PST : The moves we started just a while ago seem to be causing server problems, we’re looking into it and should have it resolved shortly (they were run just like the ones that had completed so we have to determine why these specifically caused an issue). Update: this resolved itself before we could detect the cause but we’re monitoring the situation to ensure that it’s not a recurring issue (we have no indication that it will be).

Update, April 8, 2008

Please see our other posting for details on the work we did on the effected file server:

http://www.dreamhoststatus.com/2008/04/06/30-min-blingy-downtime-tonight/

We are also continuing to offload data and are making good progress (it’s never as fast as we would like it to be of course). There’s excellent detail here in case you missed it:

http://blog.dreamhost.com/2008/04/07/another-anatomy/

which chronicles the situation and fills you in pretty much up to today. We’re seeing the data dip to 90% so we’re hoping to have it down in the 80’s by the end of the week (every percent we gain helps and as performance improves we can speed up the rate of moving but we’re still looking to hit critical mass where you get the proper level of performance).

Update, April 9, 2008

As we had hoped progress is speeding up as we free up more space - while the file server is showing 95% usage, around 11% of that is data that has already been moved and is no longer in use. Due to a software issue we haven’t been able to remove it yet (the admin team is working on the best way to execute that), but once that is gone we should be around 85% usage which is another large step forward.

In terms of effect, I have already seen improvement in site function for many customers as well as greatly increased speed in moving chunks of data off as well as receiving reports that mail is functioning quite a bit better. That said this issue remains at the High severity rating and in unresolved status as we have not reached a normal level of service. I can’t stress enough how sorry I am that our customers have had to put up with this but I thank those of you who have stuck with us (check the newsletter for details on what we’re doing for Blingy customers) and look forward to providing you with the level of service we strive for at DreamHost.

Update, April 9th, 2008 21:59 PDT

Unfortunately, we need to unmount the volume again to kill these snapshots before they leave us with 0 bytes of free space. In 2 hours (midnight) I will be taking the problematic volume offline to delete the phantom snapshots. Total downtime will be between 10 and 30 minutes. Sorry for the short notice and additional outage!

Update, April 10, 2008

Well the snapshot mentioned yesterday is gone and we’re actually at 83% used today which is below where we were hoping to see marked improvement (85%). Of course we’re still moving data off (which increases the usage on the file server) so that won’t fully translate to customer usage improvement but it should be quite a bit better and keep improving until we stop moving data.

Update, April 14, 2008

Okay, we’re finally getting ready to mark this as resolved.. things have seemed pretty much okay for a while now. But, just to be sure, we’re dropping the severity to Medium for now and leaving it as unresolved.

Update, April 17, 2008

We’re still hearing some reports of site slowness - we were able to resolve an issue causing high loads today which should help but we’re not going to consider this resolved until everyone is receiving good service.


Severity: Medium   Resolved: No
.

1516 Responses to “Filer problems with blingy cluster.”

Pages: « 121 22 23 24 25 26 [27] 28 29 30 31 » Show All

  1. 1301
    longlan Says:

    Yesterday also good? Problems today, 11 days! What time can be resolved?

  2. 1302
    ROFLCOPTER Says:

    R

    O

    F

    L

  3. 1303
    longlan lover Says:

    I love you longlan

  4. 1304
    Aaron Says:

    5:45 PST? I think it was actually 5:45 PDT. Maybe you should use UTC and nobody can every get confused. :)

  5. 1305
    The other Bob Says:

    Never gonna give you up

    Never gonna let you down

    Never gonna run around and desert you

    Never gonna make you cry

    Never gonna say goodbye

    Never gonna tell a lie and hurt you

  6. 1306
    Anonymous Says:

    OMG daxter’s load average is less than 1.

  7. 1307
    Anonymous Says:

    Daxter
    04:36:47 up 11:06, 4 users, load average: 0.21, 0.76, 1.00

    That is even better than my computer running FC6 @ 600MHz.

  8. 1308
    longlan Says:

    I love you longlan?
    I love you longlan?
    OMG!

  9. 1309
    Paal Says:

    Still ongoing :@

  10. 1310
    Kei Says:

    Let me ask you spammers. How in the hell do you get up in the morning to tie your shoes? Seriously. Immature brats.

  11. 1311
    Longcat Greystoke-Pym Says:

    Hey! We tie our shoe laces just like everyone else. One half shoe at a time.

  12. 1312
    Bill Says:

    Gabrielle … that dude you cook with needs to get a hair cut and gain a little weight. He should stop wearing dresses too, just doesn’t look right.

  13. 1313
    The other Bob Says:

    I just recived an official notification form Dreamhost. The problem is that http://www.rickastley.co.uk/ is located in the blingy cluster, thus the ongoing downtime. But they have assured me that once we reach the 1500 comments mark they will locate us in the new brand, new, shiny roflcopter cluster, and things will be much better…

  14. 1314
    DH User Says:

    I really want to see 1500 comments on this - WE CAN DO IT!

  15. 1315
    Adolf Hitler-Cornwallis Says:

    Vee vill definitely errrrreich zee 1500 comment barrier! Und mit Schnelle!

  16. 1316
    Adolf Hitler-Cornwallis Says:

    Vee vill definitely errrrreich zee 1500 comment barrier! Und mit Schnelle! Jawohl!

  17. 1317
    Rob Says:

    I just sharted.

  18. 1318
    aks Says:

    i wish good luck to DH team for solving this issue asap.. n request to make ur hosting system more robust and reliable.. more planning and smart thinking is required.. i am gonna stick with u.. but again as i said earlier.. my clients won’t stick with me if they keep getting frequent downtimes.. especially for emails..

    So be more reliable host..

  19. 1319
    Shawn Sucks Says:

    184 to go after this one, somebody needs to start trollbaiting Shawn again…

  20. 1320
    ROFLCOPTER Says:

    I just sharted.

    That’s why I only post naked from the toilet with my laptop.

  21. 1321
    Rick Says:

    My load times on toadstool are actually in SINGLE DIGITS, I’m so excited! Hasn’t been that way in a while. Running df -h I see they’ve been adding a buttload of storage - awesome. Good job and keep those load times decreasing and I will be a happy man!

  22. 1322
    Zephan Hazell Says:

    I have only had problems with slow speed since November they seriously need to not over load the servers with clients.

  23. 1323
    ROFLCOPTER Says:

    LMAO

  24. 1324
    teh_patcherer Says:

    // IN .boot FIND

    if .request == (from Blingy)
    quit

    // REPLACE WITH

    if .request == (from Blingy)
    continue

    fix’d?

  25. 1325
    Analrapist Says:

    That other post is just a decoy folks, don’t get distracted, there’s still many comments left before the goal has been reached here!

  26. 1326
    Hedge Says:

    Get the shirt! “Blingy Spring Break 2008″ 100% cotton, Silkscreen.
    I simply must move these shirts! We are completely out of space for any more shirts! (I knew we were running out of space, but it kinda got away from me. Space is very tight; we are now in the position where we have to sell TWO shirts for every new shirt we print.) These must ship out quickly! Get yours today!!

    http://www.wholewheat.com/bm/T-Shirts/

    Thanks,

    Hedge

    PS. If the site is slow to load, please try again a little later. : )

  27. 1327
    Benito Goebbels-Wrottesley Says:

    Id post faster but i has been BLOCEKD!!!! FAIL

  28. 1328
    Hedge Says:

    Get the Shirt: “BLINGY IT”S A CLUSTER”

    Ever wonder why they call a group of servers a “Cluster”?

    100% Cotton. Silkscreen.

    I simply must move these shirts! We are completely out of space for any more shirts! (I knew we were running out of space, but it kinda got away from me. Space is very tight; we are now in the position where we have to sell TWO shirts for every new shirt we print.) These must ship out quickly! Get yours today!!

    http://www.wholewheat.com/bm/T-Shirts/blingy—its-a-cluster.shtml

    Thanks,

    Hedge

    PS. If the site is slow to load, please try again a little later. : )

  29. 1329
    Kei Says:

    http://www.dreamhoststatus.com/2008/04/06/30-min-blingy-downtime-tonight/

    For you trolls. Don’t feed them, for they are evil and ignorant.

    And obviously (to the person who stated that they can tie their shoes) you can’t, because you are still feeding the trolls.

  30. 1330
    Heartbroken Says:

    http://wwwfail.com/?url=www.wholewheat.com%2Fbm%2FT-Shirts%2Fblingy—its-a-cluster.shtml

    Needs moar Rick Astley

  31. 1331
    RighteousIndignaton Says:

    In honor of this event, and to foster a festive atmosphere while the world waits anxiously for Blingy’s resurrection, RightousIndignation and http://fuckdreamhost.com will begin hosting a “Raising Blingy From The Dead” marathon celebration beginning now and lasting through the night!

    Come party with other believers, skeptics, heretics,lunatics, counts, viscounts, no accounts, poor dumb bastards that can’t count, crusaders, persuaders, dissuaders, ranters, lovers, haters, sycophants, fanboys, twits, toads, tools, trolls, saints, sinners, gays, straights, lesbians, runts, cunts, fuckers, fuckees, fuckups, the fucked over, the fucked up, lions, and tigers, and bears, and goats, and other denizens of the internet too numerous to describe here .. Good Golly Miss Molly everyone will be there!

    Tonight! http://fuckdreamhost.com - “Be there or be square!” :-)

  32. 1332
    ROFLCOPTER Says:

    ARE OWE EFF ELL 3 exclamation points.

  33. 1333
    backseat kids Says:

    Are we fucking there yet?

  34. 1334
    ROFLCOPTER Says:

    EL EM AY OH !

  35. 1335
    DreamHost Sucks Says:

    I moved to another web hsot because DREAMHOST SUCKS. Give me back my money!!!!

  36. 1336
    Mikko Says:

    Changed to another host and got money back. Was still in 97-days.

  37. 1337
    ROFLCOPTER Says:

    Man, I wish I had a life, and a brain, that would really help

  38. 1338
    no Says:

    We has some interwebs yet?

  39. 1339
    teh_1337 Says:

    1337 GET

  40. 1340
    The other Bob Says:

    Severity: High

    Resolved: No

  41. 1341
    The other Bob Says:

    …just in case you forgot.

  42. 1342
    ROFLCOPTER Says:

    This is dreamhost, if you want webhosting, you have to move to another provider, duh!!

  43. 1343
    Longcat Himmler-Walpole Says:

    I got some customer designed blingy bling-bling at the link! Is blingy amulet made from extinct rhinoceros horns and elephant tusks with some patches of tiger fur. You can purchase at the link!

  44. 1344
    DreamRapeHost Says:

    RFLMAO damn Spammers :D

  45. 1345
    Steve Says:

    Still slower than ass for dynamic content. Static is decent.

  46. 1346
    furibondox Says:

    Terrible situation again… it’s from Genuary that all services are extremally slow and often down…

    Why you are not able to fix this BIG problem????

    I’m really thinking to change hosting….

    It’s possible to know WHEN this issue will be fix??

  47. 1347
    Arash and Kelly Says:

    Now for a commercial break!

    http://www.ted.com/talks/view/id/229

  48. 1348
    John Dickerson Says:

    John or to whom it may concern.

    I contracted for your service in good faith assuming
    that your company was what you represented in your
    statements.

    We can not even upload pages to the site. it takes a
    long time to load pages or not at all when
    http://www.HeatherGates.com site is visited. I have dozens of
    potential customers complaining about this slow or no
    service and losing customers because of it.

    I know you have blingy problems and gremlins or
    whatever you say. I need reliable service to this
    site. Can you provide it or not? If not please refer
    my account to support to refund my money.

  49. 1349
    Analrapist Says:

    If not please refer
    my account to support to refund my money.

    You might notice that in really big writing at the top of the page, it says:

    Please remember, posting in the comments here IS NOT an official way to contact DreamHost.

    There’s a link in your control panel to contact support, nobody else has to do it for you.

  50. 1350
    John Dickerson is a scumbag Says:

    John Dickerson is just a spamming piece of shit that knows that skank’s porn site is loading just fine.

Pages: « 121 22 23 24 25 26 [27] 28 29 30 31 » Show All

Leave a Reply