Small data compression

Started by
6 comments, last by gimp 23 years, 11 months ago
I was wondering is there was an art to compressing 30 bytes of data? I''ve had conflicting reports that zlib etc will work efficiently and other''s suggest to custom compress the data. I''m not sure how to approach the custom compression. My knowledge of information theory and general compression is light, and unless it''s simple probably isn''t worth pursuing..ist it? Any thoughts? chris
Chris Brodie
Advertisement
Compression is based on the repitition of data in the packet to be compressed. If there is little to no repitition in the bits, then there is very little compression that can be done.

Dictionary-style compression like ZLib or ZIP works at the byte level, typically, so small strings of non-repeating characters are not going to compress. In fact, it can actually *add* 4-8 bytes to the size of the packet.

The savings come when you have a lot of repeated character sequences in a single packet. For instance, if you batch together several updates to similar data structures and send them as a single packet (e.g., sending population changes for all cities in one packet).

In our case, most of the information sent out is in "batches" composed every tenth of a second. We do see some small packets sent out with a few extra bytes from failed compression, but the savings on the larger packets more than makes up for that.


DavidRM
Samu Games
It might seem obvious, but I''ll post for the benefit of newbies if not for you In applications like this where processor time is less important than the amount of bandwidth you take up, you will want to make the most of every bit that you send. The first thing to do is combine any and all bools into 1 integral value: no point sending ''bool var1'' followed by ''bool var2'', just send ''char theBools'' with 2 of the bits set or cleared to represent the boolean values, and parse it out at the other end. And sometimes you may be sending less than a full variable worth of data. For instance, sending a 7-bit ascii character in a char - this leaves you space for an added boolean in the ''spare bit'' of that variable. Use it! Which leads to another point - if you ever need to send letters, consider replacing standard ascii codes with your own lookup system - you can get upper and lower case letters with a few extra symbols in a 6-bit representation (add 32 for the ascii value, for instance), a saving of 25% (although you may have to precede it by an extra char saying how long a string to expect, so the value of this should be weighed up before use.)

And if you do go with any kind of standardised compression algorithm... use a single bit in your packet which indicates whether it is compressed or not. This way, you can perform the compression, and if the compressed version turns out to be larger than the uncompressed version (due to it difficult to compress effectively) you can just send the original, and the receiver can easily see not to bother with decompression. For the cost of an if-check on both sides of the connection, you could save a few extra bytes every so often.
Thanks guys...

Actually I was thinking of a data conversion layer where things like "my position is here" and "turret rotated 30 degree''s" or "your dead" gets transmitted as preagreed structures that use the smallest data type possible to transmit, in many cases using binary values where I can, AND I was thinking that maybe some really small numbers for example that range from 1 to 1000 could be represented by my own data types , kind ofe lik a really small int''s etc.

Perhaps where floats ae needed I could do some rounding to drop a digit...

I''m builing a multiplayer-FPS so my plan is for it not to suck when it come to netspeed. The game will be heavily team based so in general will need more players connected for a good game..., hence tighter net code.

I guess I really need to do some testing to work out how often the server should send packets with the interpolation(dead and cubic) doing its job. I''d need to know how infrequent I could get away with while remaining accurate and smooth.

I''m beting that if I just use the server to echo new information, like ''player holds down forward and strafe'' then the server could quickly fill thin connections. Maybe a cetain level of collating might speed up the process while also allowing compresson to come in to play as collated packes would be large hence higher change of repeats.

Once the data is manually reduced to its smallest possible state and I have flags to indicate the message type should I start looking for a bit based compression algorithm or at least one that uses a smaller set than a byte for the repitition checking?

chris
Chris Brodie
Check out the longbow discussion board, do a seach for network programming. They grappled with many of the same problems which you will, and suggest some possible solutions.

http://www.longbowdigitalarts.com/ubb-cgi/Ultimate.cgi?action=intro&BypassCookie=true

Doing sub-byte compression is somewhat overkill, in my opinion. With those suggestions above, and from what you''ve told me, you should be able to get the packet stream well into a 56k modems bandwidths range.

Also coallate only those packets generated on the same tick, otherwise you would be adding undue latency. Try to minimize the turnaround time between a packets arrival and processing, this will lower latency as well.

Some FPS games, use predictive code both on the server and clients. That is to lower the apperance of latency, the client processes and executes user inputs locally and sends the command packet + timmer info to the server. With a highly synchonrized timmer (1-5 ms range delta) the server predicts where the client will be based on the delta between its timmer and the time stamp of the packets creation and updates the clients entites on the server side to that predicted position (the assumption is that the client is already at this position on its side). The benifits is that the client has virtually no apprearance of lag, yet the simuliation is still sychnorized and stable. Loss can adversly affect this scheme and as such packets needs to be verified and there must be a mechanism for handling loss (client side, like moving the player back etc.. or resending the packet) I''m thinking of making a pong game like this to try it out. Good Luck!

-ddn

Good luck!

-ddn
Regarding doing rounding so you can send ''smaller'' data... if this data is being sent often (for example, a movement delta), you could store the difference between the original value and the rounded value and store it on the sending machine as a ''cumulative error''. Then add this in the next time you send data. Example: You round 1.4 down to 1, and send it. Locally, you save the 0.4. Next time, you have 1.4, and instead of ''naively'' sending a rounded value of 1, add in the error from last time, making it 1.8, and -then- round it, up to 2 (storing the 0.2 for next time). Since the ''perfect'' total value to send would have been 2.8, by carrying over the error you managed to send 3 in total, which is 0.2 out, whereas without the cumulative error handling you''d have sent 2, which is 0.8 out. This would allow you to get slightly better accuracy in the long term at the cost of a few extra sender-side variables.

I would have to agree that you may find ''conventional'' compression to not give you any significant benefit. After all, the best forms of compression are those that know something about the data you''re putting in, which is what has already been covered. I doubt zlib or whatever will gain you more than 2 or 3 bytes out of 30, if you are really packing your bytes already. Of course, it is certainly worth trying, so give it a go on some test data that is close to what you will actually be sending, and compare the differences.
While sending large packets may be undesireable, sending a lot of small packets sucks even more. The overhead of sending a single large packet is roughly the same as sending any sized smaller packet.

Thus, if at all possible, batch together as many smaller packets as possible and send them out as a single larger packet (but not too large, of course). A thread that checks 10 times per second and sends whatever is available is likely sufficient. Or if that''s too much effort, just create some "packet batching" routines that collect all packets "sent" between them and send them out all at once.

And, once you have some form of batching, "conventional" compression algorithms work quite well. Because while a 30-40 byte packet isn''t likely to a compress well, a 70-80 byte packet will be see about 30% compression.

So with some minor batching and "conventional" compression, you will have achieved a significant reduction in the overhead of sending the packets *and* you will have reduced the total of what needed to be sent (time required to compress and decompress is significantly less than the time required to send over the network).


DavidRM
Samu Games
Many thanks for all the info, esp the link to longbow.

Aside from many other ''data reduction techniques I''ve come up with a guy from comp.compression sent me the source from an arithmetic compression algorithm he has implemented.

The results on some test data were:
before ~200 bytes
after ~120 bytes

about 40%, not bad. Zlib bit it big time with an 18% reduction rising to a 220% increase under 100 bytes.

From what I can understand of the method I can create my own lookup tables with up to 256 definable byte patterns.

So doing a little profiling of my data and modification of its structure(to better fit within byte boundries) I think I can probably push it efficiency a bit higher.

Of course grabbing a few uni books to understand what I''m doing a little better would be benificial.. it would at least allow me to understand the code comments...

I really think 36 hours a day would be better than 24.

gimp
Chris Brodie

This topic is closed to new replies.

Advertisement