Efficiently packing a series of numbers that do not have a base 2 upper limit

Math and Physics Programming

Started by LonelyStar August 26, 2010 07:47 AM

7 comments, last by Sirisian 13 years, 7 months ago

192

Author

August 26, 2010 07:47 AM

Hi,

I want to ebulliently (as in size efficient) pack a series of numbers into a stream. The numbers all have a range they can be in (i.E. 23-45) which is known to the packer and to the unpacker. But the ranges are not power of 2 aligned. For example a number may have the range 23-45, which is 22 possibilities. I could pack this number into 4 bits, wasting a little bit of space.
I want to pack more efficiently.

I remember reading about this, having to do with arithmetic encoding. But reading the wikipedia article about arithmetic encoding, I can not figure out how.

Thanks!
nathan

alvaro

21,604

August 26, 2010 09:43 AM

The idea is relatively simple: Encode your message as a single number between 0 and 1 and write it down in binary.

For instance, if you want to send 3 numbers (a, b, c) from 0 to 21, your real number is

x = a/22 + b/22^2 + c/22^3

So to send (5,21,2) you'll send x = 5/22 + 21/22^2 + 2/22^3 = .270848985725018782870022539444027047332832456799398948159...

In binary that's .0100010101010110010110111110111111010111111...

You only need to send enough bits to be able to distinguish between multiples of 1/22^3. In this case I believe that's 14 bits ( ceil(3*log(22)/log(2)) = 14 ).

Decoding goes like this:
.01000101010101(b) = .27081298828125
.27081298828125*22 = 5.95788574218750
.95788574218750*22 = 21.07348632812500
.07348632812500*22 = 1.61669921875000 (which rounds to 2)

I would have to think carefully about the details of the rounding or truncating.

alvaro

21,604

August 26, 2010 09:53 AM

Hmmmm... Perhaps my previous post is overly complicated. You can just write a natural number in base 22. So (5, 21, 2) is

5*22^2 + 21*22 + 2 = 2884

You would send this number as a 14-bit number (again because ceil(3*log(22)/log(2)) = 14).

To decode it (the numbers come out in reverse):
2884 % 22 = 2
floor(2884 / 22) % 22 = 21
floor(2884 / 22^2) = 5

Sirisian

2,263

August 26, 2010 11:37 AM

Quote:Original post by LonelyStar
the range 23-45, which is 22 possibilities. I could pack this number into 4 bits,

4 bits is 15 combinations. You'll need 5 bits. 0 to 31 since actually inclusively 23 to 45 is 23 possibilities. My binary packet article covers a bit writer/reader with C++ code included. You'll want
void WriteUInt(uint32_t value, int32_t bits)

Just subtract the min and you'll be fine.

WriteUInt(value - min, 5);
or in your example
WriteUInt(value - 23, 5);

//Edit I might as well include the general equation in case you're writing a function.

To find the number of bits you want to round the log2 of the max-min + 1 up. So include cmath and do:
WriteUInt(value - min, static_cast<int32_t>(ceil(log2(max - min + 1) - 0.5)));

[Edited by - Sirisian on August 26, 2010 12:37:58 PM]

clb

2,152

August 26, 2010 12:43 PM

A good analogy is to think about how numbers are encoded in base-k.

Example: If you have a number 1423 in our normal base-10, it is uniquely represented in hex by 5 * 16^2 + 8 * 16 + F == (5,8,15)_16.

This unique representation goes the other way as well. If you have a list of digits in base-16, say (4,6,10)_16, there is a unique number that expresses this triplet, namely the number 1130. This decoding is exactly what you want - to convert a list of numbers (that take values in a certain range) to a single number you can serialize.

Now, to use this, if you have 4 numbers that can range between, say, 0-55, just convert the base-56 representation of these numbers to a decimal number:

Input: (14, 41, 2, 50)_56, or
a1=14 which ranges from 0-55.
a2=41 which ranges from 0-55.
a3= 2 which ranges from 0-55.
a4=50 which ranges from 0-55.

Output: 14 * 56*56*56 + 41 * 56*56 + 2 * 56 + 50 = 2587362. (I deliberately avoid pow to make this example look similar to below.)

And since you know this mapping between the two representations is bijective, you can switch back and forth whenever you want.

But the problem above is that all the numbers in the list need to have the same range 0-55. To fix this:

1) If the range of a number in the list is just an offset of 0-55, say 10-65, then it's a simple exercise to apply a -10 offset to the number when converting.

2) If different number have different range lengths, say one is 0-20, and another is 0-100, then we'll just do a mixed-radix conversion, which is really no different from what we did above. Example:

Input:
a1 which ranges from 0-100.
a2 which ranges from 0-8.
a3 which ranges from 0-60000.
a4 which ranges from 0-22.

Output: a4 * 60001 * 9 * 101 + a3 * 9 * 101 + a2 * 101 + a1

Now, how many bits will this take? Just multiply the range sizes and convert to bits: ceil(log2(101 * 9 * 60001 * 23)). When you pack numbers together, either fit the fields nicely to a u32 or u64, or do/use a bignum library that can handle bigger byte streams.

Decoding is just backwards, an exercise of applying mod and division operators.

iMalc

2,466

August 26, 2010 02:25 PM

A simpler option to code that also works well is to use Bit-phased packing. You write out either floor(log n) bits or ceil(log n) bits, depending on how far between min and max the value is, and how close max-min is to a power of two.

To pack 4 different combinations you of course use the following series of bits:
00 01 10 11

To pack 5 different you split the last one and add an extra bit to the two new strings, now using the following series of bits:
00 01 10 110 111

To pack 6 different combinations you split another of the shorter combinations:
00 01 100 101 110 111

To pack 7 different combinations you split one more:
00 010 011 100 101 110 111

Packing 8 combos is again trivial. As you can see, the extra bits are phased-in as we increase the number of combinations.
You should notice that reading in each bit stream can be done correctly by the knowledge of the range in the encoder and the decoder. The decoder first reads floor(log n) bits, and from this it can tell if one more bit must be read or not.

I'll share some code I have for doing this:

template <class T>void bitPhasedRangeEncode(T &bitWriter, int min, int max, int val) {	val -= min;	max -= min;	int numBits = 0, l = max;	while ((l>>=1) > 0)		++numBits;	int cutoff = (1<<(numBits+1)) - max - 1;	if (val >= cutoff) {		val += cutoff;		++numBits;	}	for (int i=numBits-1; i>=0; --i)		bitWriter.outputBit((val >> i) & 1);}template <class T>int bitPhasedRangeDecode(T &bitReader, int min, int max) {	max -= min;	int numBits = 0, l = max;	while ((l>>=1) > 0)		++numBits;	int cutoff = (1<<(numBits+1)) - max - 1;	int val = 0;	for (int i=numBits-1; i>=0; --i)		val = (val<<1) | bitReader.inputBit();	if (val >= cutoff) {		val = (val<<1) | bitReader.inputBit();		val -= cutoff;	}	return min + val;}

"In order to understand recursion, you must first understand recursion."
My website dedicated to sorting algorithms

LonelyStar

192

Author

August 27, 2010 05:18 AM

Quote:Original post by alvaro
The idea is relatively simple: Encode your message as a single number between 0 and 1 and write it down in binary.

For instance, if you want to send 3 numbers (a, b, c) from 0 to 21, your real number is

x = a/22 + b/22^2 + c/22^3

So to send (5,21,2) you'll send x = 5/22 + 21/22^2 + 2/22^3 = .270848985725018782870022539444027047332832456799398948159...

In binary that's .0100010101010110010110111110111111010111111...

You only need to send enough bits to be able to distinguish between multiples of 1/22^3. In this case I believe that's 14 bits ( ceil(3*log(22)/log(2)) = 14 ).

Decoding goes like this:
.01000101010101(b) = .27081298828125
.27081298828125*22 = 5.95788574218750
.95788574218750*22 = 21.07348632812500
.07348632812500*22 = 1.61669921875000 (which rounds to 2)

I would have to think carefully about the details of the rounding or truncating.

Hey,

OK, that I understand. But now I have a LOT of numbers to put into a stream. How can I know what to write out and when? I mean, the precision of the double is exhausted at some point.

Regards,
Nathan

Edited for clarity.

alvaro

21,604

August 27, 2010 06:16 AM

Quote:Original post by LonelyStar
OK, that I understand. But now I have a LOT of numbers to put into a stream. How can I know what to write out and when? I mean, the precision of the double is exhausted at some point.

You should probably concentrate on the more practical approach I posted later.

If you have a lot of numbers to put into the stream, all with a range of size 22, I would pack as many of them as I could in a 64-bit integer and then send a sequence of 64-bit integers.

How many can you write in 64 bits? Well, look at the highest power of 22 that is less than 2^64.

floor(64*log(2)/log(22)) = 14

So write out 64-bit numbers that represent 14-digit numbers in base 22.

Sirisian

2,263

August 27, 2010 07:30 AM

I'll point out if you need to send floats between a certain range the article I linked earlier solves that. It's under the "custom resolution float" section. In the C++ code that's included the function is:
void WriteCustomResolutionSingle(float value, float min, float max, int32_t bitResolution)

You pass it a value and an inclusive range then how many bits you would like to use.

Edit, I seem to have misunderstood what you said. Could you write a better explanation.

[Edited by - Sirisian on August 27, 2010 9:30:15 AM]

Efficiently packing a series of numbers that do not have a base 2 upper limit

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Efficiently packing a series of numbers that do not have a base 2 upper limit

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines