Data compression/optimization strategies

Started by
8 comments, last by wodinoneeye 10 years, 8 months ago

For games where data throughput and size is an issue, what do most people use for compression? Is it generally worthwhile using different strategies for say floating point data versus other stuff, or do most just compress everything with something like zlib and call it a day?

The specific case I have is an mmo client/server. I'm working on the scale of say something like Guild wars II or Eve online as far as requirements. Right now I'm able to track several thousand objects on the server side and do efficient neighbor queries on them, it's bandwidth that's becoming an issue. Sending the coordinates of 1000 players to the client is a lot of data, and that's just movement, I haven't even gotten into combat and other stuff that will bump it up even more.

So far this is what I'm doing to optimize location tracking:

- Use integers instead of floats. I don't generally need the precision for x,y movement coords.

- Send local grid coordinates not world coordinates.

- base 62 encoding of string id's

- don't send data if it hasn't changed since last update

- partial coordinates. Only send coordinate values that have changed by a certain threshold. So a client might get a vector with x being empty, which means use the last known value.

If anyone has ideas on best compression algorithms, or other tricks to keep bandwidth usage down, let me know!

Chris

Advertisement

- Use integers instead of floats. I don't generally need the precision for x,y movement coords.

Both are 32 bits, unless there is something you are not telling us.

- Send local grid coordinates not world coordinates.

How is that smaller?

- base 62 encoding of string id's

Why base 62? Your goal is to reduce the number of bits sent, that would increase the number.

- don't send data if it hasn't changed since last update

Excellent.

- partial coordinates. Only send coordinate values that have changed by a certain threshold. So a client might get a vector with x being empty, which means use the last known value.

Excellent.

If anyone has ideas on best compression algorithms, or other tricks to keep bandwidth usage down, let me know!

Only two of those items above actually reduces the number of bits. Do more things that reduce the number of bits.

When you encode and decode your larger packets you should generally have some form of bit packer. If there are only 3 options don't add a full int (32 bits) to the message, you can represent it with just 2 instead of 32. If your integer has a small numeric range, reduce the number of bits to those that contain the range. If your number can be reasonably represented by a 16-bit float, use that instead of a 32-bit float. Delta-encoding is another way to reduce the number of bits; if you know something can only have moved up to /n/ distance in the most extreme case, subtract the new value and store only the necessary log2(n) bits rather than 32. Repeat the process of stripping out unused bits for each of the messages that you need to send.

If your messages are STILL too large, use a compression algorithm. For TCP you can use a stream-based general purpose compression algorithm, for UDP you might be able to use a stream algorithm, but you sadly might need to compress each packet individually. Stream-based may require take more data to go through but can give better overall results. Message-based compression is hard because the algorithms only have a few bytes to encode, and hopefully they are high-entropy due to the encoding already mentioned so the compressor will struggle to improve it.

Ya should have said I'm using bit packing already via protocol buffers, which has variable integers.

Base 62 is the smallest encoding you can get for upper and lower case alphanumeric.

Local grid coordinates means the current grid within the spatial partitioning scheme I use. So a local grid might just be 100 x 100, where the world grid is in the thousands. You can use less data by sending the local grid coordinate plus the grid number. The calculation to get the world coordinate from that is fairly cheap.

I'm using udp. I didn't even stop to think that only being able to compress a packet at a time would hurt, but ya that makes sense. Seems like the best win is to just stick with application specific ways to keep size down.

Chris

Base 62 is the smallest encoding you can get for upper and lower case alphanumeric.

Ah. Usually encoding like that, such as base-64 encoding or uuencoding, converts binary data into values that can fit into plain text newsgroups and other textual places. It encodes six bits as an ascii value, expanding it to 8 bits in length. So it gives an instant 1/3 growth in size, plus a tiny bit of overhead for header, footer, and padding at the end of the message.

I've never heard text-to-binary conversion called 'base62', probably because it is a new thing and because Base X already has a defined meaning different from that. You apparently are going the other way; accepting only 62 ascii values and turning them into six bits --- with two extra values left over for your own nefarious purposes. :-)

But yes, if you are looking for size the best bet is to use your own knowledge of the data to remove all unnecessary bits. If your data is still large a more traditional encoder might work for you, or might not if your data is small or if it has high entropy.
Some additions to your list of things to consider:

Distance based quality reduction of positions. Positional data is rather non-intuitive in terms of how you can reduce the quality without the player noticing, which I suspect, is why most folks don't do it. Basically don't think in absolute x/y/z but instead convert everything to polar coordinates around the client in question. Up to about 10 meters around a player, 3 bytes can transmit data which is within 2-3% of the absolute 24 bits of floating point data you would normally use. (Assuming your game unit floats, represent 1=1 meter.) The non-intuitive side, the further away data costs more bytes per message out past say 50 meters as the accuracy required to maintain the angles runs the distance portion out of bits. I.e. up to 10 meters 8:8:8 is likely accurate enough, from 10-50 9:9:6 is how you would compress the data because you effectively have 7 bits of distance since the message id changes the distance calculation. Beyond that the distance bands have to get more narrow, the angles use more bits and the distance is going to have to expand into a 4th byte. Though, for any reasonable view range, the worst case even in a space game should be 64bits or less, still saving 2 bytes compared to absolute floats. (NOTE: obviously with UDP you have to send a "last known received" position item which should be either "the position I'm sending in this packet"==0 or a relative to this packet saying "that's the last one you said you saw, that I know about. Hence, 5 bytes: ID, lon, lat, distance, relative packet id. Also, this assumes ID's are <128 and you are bit slicing such as GPB formatting for the integers. There are a couple more items to keep in mind if used in UDP, still much lower byte count than 3 floats.)

Rotation based on quality reduction. In general, for most purposes, send 3 unsigned bytes and one signed byte to represent a normalized quaternion. There is little reason for rotations to be all that accurate during transmission. (As I remember, the sign on the w is required, been a long time though, could remember it wrong.)

Depending on the game, use string dictionaries. I.e. send a message such as "present this string to user: id" and the client has the string tables. That sucks for quest data though, folks will just dig it out of the client and ruin most of the quests by showing everything. That happens eventually but the other option is to send "present this string to the user: id" and the client will then look it up, if it does not have it, it replies with "don't have string: id" at which point the server sends it across. Initially this costs more bandwidth but at least you haven't shipped the entire storyline/branches etc to the players so it can't be ruined with simple data mining. Overall though, you can do all your story expansions on the servers and the players only know about it if they actually find it and the cost is not too much more than providing the patch with the strings in it. Additionally, all string data "should" run through a zlib or related library, assuming lossless chat data/string data, the dictionaries for compression will be quite full and you'll get fair compression overall. It won't be as good as separate streams and compressors, but likely at least a 50% compression rate should still apply.

Another one which depends on the game type. Relative "distant" positions. Say you have a large army style of gameplay with the possibility to come into range of a lot of objects quickly, even though they are a long way away. If you have a sweep/prune system it is easy to build islands like you would do for collision, but instead you are building it for network messaging. Given an island, you transmit the distant polar coordinate of the island center then a stream of delta's to represent the items which need to be added. Given distance, basically you just grid out the size of the island based on desired accuracy and send ID+grid location within the island. Double compression effectively. (You of course start refining data with more bits per object later...)

As always, looking at things in a slightly different way can lead to impressive results. I've implemented most of these at one time or another and know they reduce bandwidth in big chunks. Using polar coordinates in TCP is a huge win, with UDP it's semi-iffy depending on the connection though it can be made solid. The other items are just common things to do except the relative distance stuff which is something I experimented with but never put in a shipped game.

The most compact data is the data that you do not send.

Work very hard on not sending data.

Why send position every update when you can send position when starting out, and then just send player (or object) inputs, and run the simulation on the client, for example?

When it comes to "string ids," presumably those are known in advance, and can be sent as a small index into a look-up table.

If they are not, then you can build a dynamic string table (per connection.)

When sending a string, first look it up in a hash table. If there is a "previous ID" for that string, then send only that ID. Else, allocate a new ID, and send a command saying "here's a string and here's the ID I will use when referring to it again."

Strings that contain ever-changing data (like object IDs or timestamps or coordinates) are a bad match for this pattern, though.

enum Bool { True, False, FileNotFound };

I would recommend utilizing a static huffman compression to further reduce the size of your packets, probably about as much as they can be after higher-level protocol optimizations (eg: "not sending data" is definitely crucial).

Try to allow the client to make as many assumptions as possible with as little data coming from the server as possible to update it. Objects on the periphery of the player's perspective of the game world don't need updates as quickly, or with as much precision, as those closer to the player.

Thanks a lot guys some great ideas here.

I'm fairly new to this particular problem, although I think it's critical to solve well.

What I do know is that I've never seen a game handle this problem 'well' from a player perspective. Eve online literally just slows the game clock at a certain point. Guild wars II basically stops sending anything other then location updates and auto attacks with just a hundred or so players in visible range.

Chris

The most compact data is the data that you do not send.

Work very hard on not sending data.

This, this, a thousand times this.

The art of multiplayer game development is knowing what not to send, and how often to not send it. That is where the craftsmanship comes in.

Connecting machines together with well defined APIs is not a difficult task. Basic housekeeping tasks like matching up network ports and game identity are not hard things to master.

There are two arts to master. One is how to hide or design around latency (since if you have a work-around for the speed of light, you have bigger fish to fry). The other is how to maximize the efficient use of bandwidth. The former is fundamentally a design issue (although technological mistakes can make it worse). The latter is fundamentally an engineering issue (although design mistakes can make it worse).

If you aren't making sure your networking and game architecture makes it easy for the network developers to easily route traffic such that nothing unnecessary hits the wire, all of your bit packing efforts are fundamentally just optimizing a bubble sort.

Ya should have said I'm using bit packing already via protocol buffers, which has variable integers.

Base 62 is the smallest encoding you can get for upper and lower case alphanumeric.

Local grid coordinates means the current grid within the spatial partitioning scheme I use. So a local grid might just be 100 x 100, where the world grid is in the thousands. You can use less data by sending the local grid coordinate plus the grid number. The calculation to get the world coordinate from that is fairly cheap.

I'm using udp. I didn't even stop to think that only being able to compress a packet at a time would hurt, but ya that makes sense. Seems like the best win is to just stick with application specific ways to keep size down.

Chris

Thats UDP with a guaranteed delivery scheme built into it (otherwise delta compression becomes problematic)

A fundamental for the base issue (sending least possible) is also efficiently handling the overhead in your packet driver.

If you can group packets into known counts/blocks preassembling them before sending (to make it more like a file transfer) ACK processing can be optimized.

Checking ACKs in a pass for all the packets in the 'file' and resending any missing ones can simplify the number of ACKs sent (even over other windowing schemes) and overhead in the individual packets (some).

---

The change threshold update you may have to periodicly send a syncing update to correct minor changes (which still might be significant client visibility-wise)

which then leads into a round-robin scheme to space those out timewise.

---

Fixed value floats (like incremental object facings where 256 are enough and a BYTE value is decoded by a table lookup on the other end when its used or translated to populate the client local data)

---

Classifucation of objects as near vs far, significant vs trivial and having differentiating update schedules - the more important ones get updated much more frequently (basically a Level of Detail scheme possibly with many more than 2 degrees)

Part of that may include App level throttling if the network does start bottlenecking that the most critical data for the player tries to get thru if possible.

---

--------------------------------------------[size="1"]Ratings are Opinion, not Fact

This topic is closed to new replies.

Advertisement