Server Performance Test

Started by
2 comments, last by hplus0603 14 years, 8 months ago
I'm working on a Battle.Net-like system. So we are talking low-traffic, but a lot of potential users. It ofcourse depends on success but I'd like the Server to be able to handle 100k users if possible. Our plans was to use a single server for this. The idea was to put the server and the account-database on the same machine to reduce the load on that connection. My estimation is that a user does an action that results in a query ~each every 10sec. At 10k users that would result in 1k/packets/queries per sec, at 100k users around 10k queries/packets/sec. My question is if anyone has any pointers, or tips of resources/links/books, on how to set up a good stress test for this. I was thinking about making an app that handles thousands of user-bots that tries to connect/create accounts/login/chat. It seems to be no problem receiving/sending 20k packets/sec and Bandwitdth should be a no issue in this case I think aswell as processing power. However it feels like that nothing can replace a live test. Anyone who knows the most common pitfalls in systems like this? Do you think processing power could be an issue?
// The only limit is the endless possibilities
Advertisement
The eDonkey server project (lugdunum?) discussed various gotchas about scaling into tens and at peak hundreds of thousands of clients, and that was 5 or more years ago). But that project has been abandoned and pages seem to be gone as well.


The by far simplest thing you can do is use UDP, design your protocol as stateless (one query packet, one response packet), and if possible, keep database immutable and replicated in memory. If database does change, try to keep changes in batches. Perhaps once an hour, reload it completely.

With such design, the number of users will scale indefinitely, and you can add multiple servers at whim.

Quote:My question is if anyone has any pointers, or tips of resources/links/books, on how to set up a good stress test for this. I was thinking about making an app that handles thousands of user-bots that tries to connect/create accounts/login/chat.


For anything large enough, nothing will replace real world. There are stress test tools for web-related protocols, but I don't believe there are any for custom, outside of various virtual Internet simulators, but those serve different purpose.

If your protocol is stateless, then simulating is much simpler. Design bots to send queries with semi-random pauses (vary delays by several percent or more, to avoid certain nuances of LAN). Test the number of successful and correct replies, as well as response time. Bots can then be ran on multiple machines till they overload the server.

But the key for me would be to try to keep protocol stateless, and server's state immutable. That alone eliminates so many potential problems, and falls nicely in line with IOCP or related asynchronous networking approaches.
Thanks for the reply :)

I'm using UDP, and basically each server query from the server comes with a simple reply like you said. So I would assume it is stateless in that sense. I do verify if the user is logged in or not, and handle those requests accordingly. But how would a non-stateless protocol look like? :)

And about making the server state immutable, can you give an example? It seems like a good idea if it is possible. However we have quite some queries to the database, for creating accounts/avatars/teams/ranking/friendslists. I do think it would be wise to put some of those queries in memory, and save it (like you said, 1/hour), at the cost that if the server goes down, some requests will be lost.
// The only limit is the endless possibilities
The way that you keep the database cache largely immutable is that you keep two sets of data you query:

1) All the data that you last loaded from the database.
2) A log of all changes since then.

When you query, you run the query first on 1), and then rip through 2) to extract whatever additional data you need.

While writes are immediately queued when the database is updated, not every process reads back the database directly. Instead, the processes receive copies of the write requests, and that goes into their own log. Every once in a while, those processes re-load the database from scratch and evict their write log, or, rather, evict any write log entry that they know is part of the database re-load. A time-stamped or transaction-id-ed database is convenient for constructing that. Often, that versioning data is only available to the low-level database implementation, so this cache/log set-up is typically implemented inside a database replication process, instead of in the client of the database.

You may think that there is a small window of de-sync, when a write has been queued to the database, and sent to a query server, but the query server hasn't yet applied it to its write log. A query handled at that point won't have the updated data in its response. However, that query might just as well have arrived a millisecond sooner, before the log update was sent, so as long as updates are seen in order, the system is consistent.
enum Bool { True, False, FileNotFound };

This topic is closed to new replies.

Advertisement