Why is the site always broken and saying there's SQL errors?
Why SQL error
Crossbones+ - Reputation: 5899
Posted 29 April 2013 - 05:47 AM
There was a more recent discussion on the topic, but I can't seem to locate it now. But yes, depending on your time zone(for me it's midnight), ~15-30 mins every day the site is down.
Senior Staff - Reputation: 24459
Posted 29 April 2013 - 07:08 AM
We are investigating, but so far we've been unable to find the cause of this one -- seems to be some regular task carried out by our provider, but so far it's been non-obvious.
- Jason Astle-Adams.
Staff Emeritus - Reputation: 5668
Posted 29 April 2013 - 07:40 AM
Here is a brief synopsis of what happens:
There is a period of about 5-10 minutes where our final system activity suddenly spikes and we see a HUGE IO load that pretty much brings most of the system to a halt. During this time the SQL server cannot answer all queries and gets about 250 queries backed up and waiting to be answered. For a brief time the IO fluctuates up and down enough to make it possible to answer some of the queries but very few of them. What happens is that the HTTP part of the request is answered by our reverse proxy (that's the part that says the "Site is down") and sometimes by our site software (that's the part that says SQL Error). When it's our site software the reason for the SQL Error is "Too Many Connections". Largely the effect is caused by the reverse proxy cache being invalidated for being down more than a few minutes, the queries getting backed up in the SQL server, and the shear number of people hitting the "WTF is going on.. refresh this shit.. damn why isn't it working?!" button. That last button slams our server because so many of the queries used to build the page aren't even being used because the page request gets abandoned.
TL;DR - It's caused by an as-of-yet unknown high disk IO problem (which we think may be our ISP backing up the server to be honest) caused by a period of people rapidly refreshing their browser in an attempt to help us fix the problem (while also overloading the server with too many database queries too quickly).
Moderators - Reputation: 12020
Posted 29 April 2013 - 08:10 AM