The output of my encoder is in two parts. For one, there's obviously the compressed data; I've got that sorted nicely now. It's almost breathtaking how quickly you can implement the tree construction using std::priority_queue.
The other is the tree used to decompress the data. I view it as a filter through which I can stream the compressed data - that's how I'm designing it to be used, for nice asyncronous loading/decompression of resources and so on. I'm trying to get the decompression very fast, and to keep the tree size down.
So, what I thought I'd do is this. Each non-leaf node would be X bytes (where X is as small as possible); that breaks down into two equal-sized fields, for left child and right child. The MSB on each field is used to indicate whether that child is a leaf, and the rest is used to store an offset to the child from the current read pointer. Leaf nodes are just stored as their values, and so will be as large as the templated type the user is encoding with.
The problem is with laying out nodes in the stream. The distance between any parent/child pair cannot be more than the number of bits I have available to store an offset in - given that I wanted to encode non-leaf nodes in 16-bit words that gives me 7 bits to store an offset in - 0-63 bytes of offset. So how do I order nodes in my stream such that the offsets are never too large?
I'm going to have to think about it for the next couple of days, but my initial thoughts:
- It might be worth starting from the bottom and trying to collect nodes up into contiguous blocks of 62 bytes each
- Consider how it differs from a list - if it were a regular list you could set up children at 63-byte intervals each with 63-byte offsets. Perhaps find the 'dominant' chain and do that with it, and send all other children into local zones
- Another approach might be to place child nodes as far away from the parent as possible while still satisfying the offset condition - if the space is taken track back until you find space - and then 'vacuum-pack' the result
I've got my copy of Introduction to Algorithms open, I guess it's time I started trying to read it.