Winsock Compression

Started by
8 comments, last by yewbie 12 years, 4 months ago
I am currently running a custom winsock client/server implementation.

When a player connects to my server I need to send a lot of information (totaling 500k at maximum objects per zone)
Which seems like a ton to send out every time a player connects.

Is there a way I can use something like zlib to compress and then decompress my data (for each packet sent and received)
Or is this method flawed and I should just deal with the large data volume?

my zone size is 150x150
I am looking at 14 bytes per object, 0 to 22500 per zone.
10 bytes per tile, 22500 per zone.
24 bytes per entity, I am currently capping that at 2000.

So in a worst case scenario 585500 bytes for the initial update.
edit: + tcp/ip overhead per packet

All of this data can change over the course of the game and needs to be updated when the player logs in, so keeping a cache wouldn't be worth it.
I thought about maybe doing it chunk style ala minecraft but I would think I would end up sending much much more data in the long run.

Any thoughts?
Advertisement
Whether compression helps depends on entropy of data. If entropy is low, it will compress well. If each tile and object is "unique", then it won't.

In general, zlib will bring size down, 70% is very common. For sparse data, savings can be big.

Organization of data may also matter. Similar to how Burrough's Wheeler transform works, organizing data by "type" or "class" may improve compression or even allow alternate representation, such as RLE.

Whether compression helps depends on entropy of data. If entropy is low, it will compress well. If each tile and object is "unique", then it won't.

In general, zlib will bring size down, 70% is very common. For sparse data, savings can be big.

Organization of data may also matter. Similar to how Burrough's Wheeler transform works, organizing data by "type" or "class" may improve compression or even allow alternate representation, such as RLE.


Thats a really good sign =p
data entropy is going to be very high.

But just knowing that its possible and I might get some savings out of this is a good starting point for me, thanks =)

[quote name='Antheus' timestamp='1324057003' post='4894552']
In general, zlib will bring size down, 70% is very common. For sparse data, savings can be big.

But just knowing that its possible and I might get some savings out of this is a good starting point for me, thanks =)
[/quote]

If you separately ZLib encode each packet, then don't expect to get lots of compression for small packets -- in fact, small packets may be expanded by zlib encoding!
If you keep the stream open and want to send, and receive, full packets, then you have to do extra work to make sure that sender and receiver agree on where each packet is delimited (as Zlib can emit "half bytes" that need to get flushed).

In general, Zlib can give you some small constant value of savings -- say, cut it in half. However, for great savings, the best savings are the data you don't send at all.

If all you do is downloading a level file at the beginning of a game, then that's OK -- Zlib will help, and users will stare at a progress bar until it's done. For updates during gameplay, you want to be more careful about data that you choose to send. If a client can correctly make the right derivation of some change based on what it already knows, you don't need to send the actual change, for example.
enum Bool { True, False, FileNotFound };

If all you do is downloading a level file at the beginning of a game, then that's OK -- Zlib will help, and users will stare at a progress bar until it's done. For updates during gameplay, you want to be more careful about data that you choose to send. If a client can correctly make the right derivation of some change based on what it already knows, you don't need to send the actual change, for example.


I will not be using zlib on anything outside of this initial world update on connection.
All changes made during the update are queued up in that players packet buffer and sent out after the original update.

(as Zlib can emit "half bytes" that need to get flushed).[/quote]
With my implementation only full bytes are sent/received so it shouldn't be an issue.
edit: unless Zlib compression can create trailing halfbytes, that would be an issue.

edit2: Also I wouldn't Zlib the header of my packet just all of the information following it so the header would still have an accurate size

(as Zlib can emit "half bytes" that need to get flushed).

With my implementation only full bytes are sent/received so it shouldn't be an issue.
edit: unless Zlib compression can create trailing halfbytes, that would be an issue.

edit2: Also I wouldn't Zlib the header of my packet just all of the information following it so the header would still have an accurate size
[/quote]

Compressors might emit only a single bit (high compression), but you need at least 8 to send over standard IP network. Padding to full byte is not the biggest problem, losses here are often negligible.

Second problem is how compressors work. Take "AAAABAAAA^Z" (^Z is EOF symbol). A compressor could emit something like "4A1B4A"., but it would do like this:A
A
A
A
B 'emit 4A'
A 'emit 1B'
A
A
A
^Z 'emit 4A'

Or, compressor determines when you can send the data, not the network. So your simulation loop might be desperate for data to send, but compressor, despite having encoded the last updates, would not be ready to emit.

Or, you could send data more frequently, resulting in something like:AAA^Z - 3A
ABA^Z - 1A1B1A
AAA^Z - 3A
But now we sent 10 characters instead of 9. Something similar happens with networking. Once a simulation step is complete, the data to be sent needs to be flushed, even if compressor could benefit from having more data from next frames.

It makes sense, the more data you have, the more redundancy you can find. Or, if that didn't hold, then one could compress single bits independently and individually.

zlib offers a way to force a flush at any time. But doing so interrupts normal operation and resets (might be just partially, I forgot) internal state. Less data one compresses, less redundancy there is, the poorer the compression.


But I also mentioned entropy. If map is 150x150, 22500 values total, the coordinates can be sent using 15 bits. Similar redundancies exist elsewhere. If there is a lot of it, using simple Huffman or arithmetic encoding will consistently reduce the amount of data sent. Huffman coding is part of zlib's compression,
---

But as said, compression is variable. If one needs to send 500kB of data, then that is still the worst case which needs to be dealt with. And if compression consistently reduces that to 100kB, then it's possible to just send that much data in the first place without redundancy.

Compression can be seen as a tool that trades off some development complexities (such as requiring everyone to fine-tune structure layouts) and ignores some of inefficiencies of languages and encodings. It cannot solve problems with design or overall concept. If there is too much data being sent, compression won't be a magic bullet that makes it work.
Thank you Antheus and hplus.

I have a lot of testing to do. Since I am not doing simulation updates with zlib only the initial gamestate, I am hoping I don't have to worry about a compressor.

Even if I get a 10% reduction it would be worth the time of implementing it.
Hell even if I get a 0% reduction due to whatever I am doing wrong the learning to use zlib would be worth it =)

edit:
Also this may be way off base here, but I think in essence what you both are saying is that I should design my data so that I don't have to actually send so much information?
such as this for the tiles:

int X;
int Y;
int TileNumber;
int TileFlags;

It would be better to send my tiles in the specific number order to eliminate the X/Y variables all together
since I know tiles are TileNum = y + (MapWidth * x) ;

I could just take my tilenum in the order I have parsed them then I only need to send my TileNumber and TileFlags.
And if those don't need to be over 255 then I could send them as char
And only send 1/4 of the data?


int X;
int Y;
int TileNumber;
int TileFlags;



so for each of those, do you actually need data in the range of -[font=monospace]2147483647[/font] to [font=monospace]2147483647[/font] ? (that's the range you are likely to get for "int" on your compiler). I doubt that. What you really probably need is bit packing, which you're basically reducing the entropy of your data, which means that you are only sending data with the minimum number of bits necessary to fully represent that data. So, what are the expected ranges of those things? Packing down to a char is a good thing, but do you even need a full char of data? would 7 bits be enough? that would reduce your data by 12% right there, and what else are you sending?

One other thing you could consider is "quake encoding" which is where you only ever send deltas from previous states. The way you do this is you must find a way to represent your entire world state as a single array of <data type, try char>. Then, you send a complete world state update, let's call that T1. For every new world state update you want to send, you XOR the world state Tn with T1, thus clearing everything that has not changed. This will (likely) give you extremely long runs of 1's and 0's, which will easily compress with some sort of run-length encoding. Also, if for every Tn it is just a diff against T1, that means that any dropped packets will not matter since you're comparing against the initial frame, which you could always periodically re-send if you wanted to.

I would not bother with huffman encoding, arithmetic encoding, or anything else that zlib will do for you. Just focus on your app packing data down as much as possible and reducing the amount of data that you need to send at all. You mentioned this already; if you can recreate the tile number from its x,y coords, or (better) can recreate the x,y coords from the tile number, then just send whichever of those is smaller, packed down to its minimum size. implementing bit packing is not trivial, but it's certainly possible and pretty much every game ever has done something like this.

Agreed with the bit packing comment.

We've had classes that are several kilobytes in game where the compressor can reduce to a single kilobyte, that a programmer knowledgeable in the system could reduce to just just a few hundred bytes. That specific map-and-settings detail was reduced to just the bits necessary to deserialize, reducing something like 8KB of live data (since the system likes to expand bool flags into bytes and 4-option enums into 32-bit integers) down to around 200 bytes of serialized values.

The processor is fast. The wire is slow. Generally it is wise to use a bit of processing time to prevent sending unnecessary bits, and spend a little processing work to reconstruct complex objects and structures rather than sending empty data or reconstructable information.
Thanks for the reply's guys.

It's interesting. Everytime I think I am getting a handle on something related to network programming I find out that I have merely scratched the surface of techniques =p

I think what I am going to do is a combination of everyone's advice here and do the following:
1) Figure out a good way to use bit packing on my initial world update to make it as small as possible
2) If its still somewhat large I will try to use zlib to reduce it farther
3) If after all this everything is still too large I will bite the bullet and just stream chunks of the map instead of the whole zone.

Worse comes to worse as hplus said I could just have someone looking at a progress bar for a minute or so =p

This topic is closed to new replies.

Advertisement