Fileserver switch troubles (Resolved!)
One of the switches which connects some of our fileservers is having unexplained high cpu load. This is causing slower than normal loading of some sites. We’re looking at the issues and hope to have it fixed up shortly. Updates to follow.
–
We think we have this one resolved. It seems the CPU load was simply too high and causing the switch to have watchdog timeouts and reboot. This was affecting about 8-9 file servers. We installed another switch to handle the extra load and watching it for half an hour saw no problems. We will continue to monitor it throughout the night to ensure that the problem is completely resolved.
.
October 19th, 2006 at 4:53 pm
How come you guys never post the actual server the problem is on? It would be a lot more helpful, that way we would know if this is the problem, or something else is causing it. Well my site is being affected by it, and I doubt it’s the fact that there’s more traffic today.
October 19th, 2006 at 4:56 pm
Providing the name of the fileserver won’t help much - the impact would be seen on multiple webservers that use the fileserver for storage.
October 19th, 2006 at 5:00 pm
Read the words, “some of…” More than one. You having a problem? Then you’re probably “one of some.” Not having a problem, you’re okay. Want to know for sure if you’re having a problem? Can’t access your site? You’re having a problem. Can reach your site? You’re not.
See, the problem isn’t whether or not you’re having a problem. The problem is what appears to be a cascade of problems which have now become quite problematic for everyone.
October 19th, 2006 at 5:19 pm
yup straight up and down y’all.
October 19th, 2006 at 6:47 pm
My site http://www.www.erotofun.com was just down - error
“Warning: mysql_connect() [function.mysql-connect]: Lost connection to MySQL server during query”
I checked system wide status for one of DB i use and got:
“Verified mysql outage: xxx [stop tracking : 1 min 38 secs ago: Outage verified: We are actively looking into resolving it.”
October 19th, 2006 at 7:58 pm
Perhaps not resolved entirely — me ol’ websites are slow, slow, slow right now.
October 19th, 2006 at 11:03 pm
I am still seeing quite a few VERY long delays…not convinced you’ve nailed the issues.
October 20th, 2006 at 2:01 am
Agreed - very long delays and connection issues, time outs etc. Whatever the problem is, ya still got it.
October 20th, 2006 at 7:55 am
And it’s still a problem …
October 20th, 2006 at 8:52 am
I also have very slow performance.
Damn, I have to finish this project this weekend.
October 20th, 2006 at 12:02 pm
it isnt just slow i cant get to the web page
October 20th, 2006 at 12:15 pm
it would be useful to know what web servers are dependent on these file servers that are down. That way we could know if we are submitting a duplicate site outtage report or not. Currently my webserver on Cletus is not serving even static pages, so I’m wondering if the problem is related to this switch issue or not. Oh, well. Been down for almost half an hour now that I’ve noticed.
Seth
October 21st, 2006 at 3:00 am
Is this still happening?
I am getting problems uploading to FTP, and even the web ftp is throwing out this message :
Warning: ftp_put(): Transfer aborted. No space left on device in /usr/local/ndn/web/webftp/includes/filesystem.inc.php on line 1145
October 21st, 2006 at 6:56 am
Sounds like it’s out of disk space. Report it. Somewhere other than here because not one of us can fix it for you.
October 21st, 2006 at 7:37 am
i have the same problem.
October 21st, 2006 at 9:11 am
I hear ya Pixelman, I have two critical web projects to complete by Nov 1 and this is the SECOND weekend in a row where my servers have had problems. I should send a bill to Dreamhost!
October 21st, 2006 at 10:46 am
Same here, my sites are still intermittently down …..
October 21st, 2006 at 1:09 pm
Mark, last time that happened to me, I got this response:
“Your file server ran out of file handles, which causes problems about the same as running out of space. I’ve fixed it. I have no idea why this was not reported by our monitoring systems, but I’m investigating right now, and will keep an eye on your file server to make sure this doesn’t happen again.”
October 21st, 2006 at 2:03 pm
Been a delay / or complete failure in loading pages for the last 12 hours. Doesn’t look like the issue is actually resolved.
October 21st, 2006 at 2:05 pm
Also getting severe delays still (22:00 GMT).
October 21st, 2006 at 9:15 pm
I’m finding a fix to workaround this issue - is to set up a new unix user… change the hosting of your website to that user and re-upload the website to the new user login…. I was having the same problem with all my sites (disk full, 550 ftp errors) and they were under the same unix user… after changing the hosting to a new unique username, I can write to the drive again and sites are back up and running (well not yet… I have over 20 gigs of files to move up and down and 15 websites to transfer)
Hope that helps.
October 21st, 2006 at 9:43 pm
You can find out what fileserver you’re on by logging in to a shell account and typing ls -ld ~
You’ll see a result along the lines of:
lrwxrwxrwx 1 root staff 23 2006-10-13 19:23 /home/username -> .fileserver/username
October 24th, 2006 at 4:41 am
You guys have had a hell of a time over the last few months, whatever you’ve done recently, my web panel is now as zippppy as ever. Keep up the great work!!
October 26th, 2006 at 12:21 am
Yet to explore the new fileserver !!