[web] Load balancing

Started by
6 comments, last by kwackers 17 years, 7 months ago
I'm running a website on a LAMP setup with seperate Web and DB servers. It's handling the load nicely, but the usage is steadily increasing. I am wanting to develop a plan for expanding into more servers. I'm fine on the DB side, but I haven't had any experience with load balancing. I've done a little bit of research, but I haven't found anything too useful. Has anyone here had experience with load balancing before? If so what technologies did you use, and what kind of issues did you face?
Advertisement
One of the first things I'd look into is how your site is being used. If you get a lot of people that aren't logged in or don't do much interactive things then it might be worth looking into a Squid proxy instead of a load balancer.

Wikiedpai has a nice description of their setup which used both Squid proxy and load balancing. Read and learn :-)

<hr />
Sander Marechal<small>[Lone Wolves][Hearts for GNOME][E-mail][Forum FAQ]</small>

If you want to use multiple servers to achieve higher performance, you MAY find that it's better to run a local database on your app servers, and cluster the database rather than running separate app and database servers.

This is because web pages typically run a large number of very easy queries, so the overhead of the query over a network takes a lot longer than the query itself.

I haven't tried the clustering myself, but I know that running even a simple application is vastly slower when the DB and app are on different boxes.

Whatever environment you choose, be sure to test it well.

If at all possible, use an environment that you can test with cheap commodity hardware, it will make things so much simpler.

If you simply want high performance with no redundancy, then the easiest way to make it very fast is to buy the fastest box you can with the most RAM, the best discs etc, and run everything off one machine (probably dual or quad processor, no more!).

However, this is not a scalable solution as once you reach the maximum size box you can get, it hits the ceiling.

Mark
Quote:Original post by Sander
One of the first things I'd look into is how your site is being used. If you get a lot of people that aren't logged in or don't do much interactive things then it might be worth looking into a Squid proxy instead of a load balancer.


Most content will be dynamically generated, so a Squid proxy setup will not be especially useful. Thanks for the Wikipedia links, I'd seen diagrams of their setup before, but not the page you linked to.

Quote:markr
If you want to use multiple servers to achieve higher performance, you MAY find that it's better to run a local database on your app servers, and cluster the database rather than running separate app and database servers.


I haven't really heard of this being used before. To the best of my knowledge MySQL isn't very especially good at clustering; most solutions I have seen just use a master-slave setup. I can imagine that you would run into some issues with the DB size if it were necessary to have a copy on each database server. If the database was distributes then you would still have a hit from network traffic.

My current setup has seperate database and application servers, and the slowdown is not especially noticable.

Quote:
If you simply want high performance with no redundancy, then the easiest way to make it very fast is to buy the fastest box you can with the most RAM, the best discs etc, and run everything off one machine (probably dual or quad processor, no more!).

However, this is not a scalable solution as once you reach the maximum size box you can get, it hits the ceiling.


That's pretty much the situation I am in now. I'm not expecting to hit the limites for a while, but I want to make sure that when I do I am prepared.

I'll take a look at the Wikimedia page (I've read through the Live Journal description of their network too, which was pretty interesting). I also located a few apps that seem to do what I want (Ultra Monkey, Pound, Pen, Inlab Balance and Apache's mod_proxy_balancer) so I'll just try and to find the one the best suites my needs.
For read-heavy sites, using MySQL's Master/Slave setup works nicely. It's pretty easy to get up and running and you will notice the improvement (assuming the database is the bottleneck). Note that it's not a perfect solution because at some point, the slave databases will spend all their time writing to keep up. But it is an easy first step that might buy you quite a bit of time.

Also, make sure your application is well tuned. Enable slow query logging and no-index logging. There's no reason why any real time query should take more than a split second.
Quote:
To the best of my knowledge MySQL isn't very especially good at clustering; most solutions I have seen just use a master-slave setup.


I take it you haven't heard of the MySQL NDB shared-nothing clustering mode then? It stores the database in memory on each data node, and they can be connected via normal commodity ethernet and provide redundancy.

I've not used it myself but I know people who have.

Of course you need a lot of RAM- but it's still exceptionally cheap compared with some silly thing which requires vastly expensive SAN hardware such as MSSQL cluster.

Mark
Quote:Original post by kwackers
I've read through the Live Journal description of their network too, which was pretty interesting


Got linkage? I was actually looking for that one but couldn't find it. I found it a very interesting piece as well.

<hr />
Sander Marechal<small>[Lone Wolves][Hearts for GNOME][E-mail][Forum FAQ]</small>

Quote:Original post by markr
I take it you haven't heard of the MySQL NDB shared-nothing clustering mode then? It stores the database in memory on each data node, and they can be connected via normal commodity ethernet and provide redundancy.


I recall hearing about it. I heard it accused of not being "true" clustering as it requires enough memory to keep the entire database in RAM. If that is the case then I would want to avoid it, as I'm currently planning for worst-case-scenario.

The database is mostly reads, and contains a lot of user-created data. I have a custom template caching system for common pages which reduces a lot of database hits. I also plan on looking into memcached to reduce it even further.

This topic is closed to new replies.

Advertisement