Why is the site always broken and saying there's SQL errors?
Why SQL error
Crossbones+ - Reputation: 3886
Posted 29 April 2013 - 05:47 AM
There was a more recent discussion on the topic, but I can't seem to locate it now. But yes, depending on your time zone(for me it's midnight), ~15-30 mins every day the site is down.
Senior Staff - Reputation: 18577
Posted 29 April 2013 - 07:08 AM
We are investigating, but so far we've been unable to find the cause of this one -- seems to be some regular task carried out by our provider, but so far it's been non-obvious.
- Jason Astle-Adams.
Senior Staff - Reputation: 5208
Posted 29 April 2013 - 07:40 AM
Here is a brief synopsis of what happens:
There is a period of about 5-10 minutes where our final system activity suddenly spikes and we see a HUGE IO load that pretty much brings most of the system to a halt. During this time the SQL server cannot answer all queries and gets about 250 queries backed up and waiting to be answered. For a brief time the IO fluctuates up and down enough to make it possible to answer some of the queries but very few of them. What happens is that the HTTP part of the request is answered by our reverse proxy (that's the part that says the "Site is down") and sometimes by our site software (that's the part that says SQL Error). When it's our site software the reason for the SQL Error is "Too Many Connections". Largely the effect is caused by the reverse proxy cache being invalidated for being down more than a few minutes, the queries getting backed up in the SQL server, and the shear number of people hitting the "WTF is going on.. refresh this shit.. damn why isn't it working?!" button. That last button slams our server because so many of the queries used to build the page aren't even being used because the page request gets abandoned.
TL;DR - It's caused by an as-of-yet unknown high disk IO problem (which we think may be our ISP backing up the server to be honest) caused by a period of people rapidly refreshing their browser in an attempt to help us fix the problem (while also overloading the server with too many database queries too quickly).
Moderators - Reputation: 8491
Posted 29 April 2013 - 08:10 AM
Crossbones+ - Reputation: 8870
Posted 29 April 2013 - 09:17 AM
~ 4:30pm here.
The slowsort algorithm is a perfect illustration of the multiply and surrender paradigm, which is perhaps the single most important paradigm in the development of reluctant algorithms. The basic multiply and surrender strategy consists in replacing the problem at hand by two or more subproblems, each slightly simpler than the original, and continue multiplying subproblems and subsubproblems recursively in this fashion as long as possible. At some point the subproblems will all become so simple that their solution can no longer be postponed, and we will have to surrender. Experience shows that, in most cases, by the time this point is reached the total work will be substantially higher than what could have been wasted by a more direct approach.
- Pessimal Algorithms and Simplexity Analysis