I'm going to write about exactly what it is I'm trying to achieve here in the hopes that it'll inspire me.
What I basically want to create here is a decoder program generator that will run on a VM. This VM has an input stream from which I can read one bit at a time, an output stream to which I can write X bytes at a time, and a program pointer. Each cycle, it can read the byte at the program pointer and perform an action based on the value of that byte.
So, I'd initialise the VM with my encoded data as the input stream, hook up a buffer to the output stream, and point the program pointer at the beginning of this decoder thingy. I step the decoder, which causes it to
- Load the next bit from the input stream
- Increment the program pointer if the bit is nonzero
- Read the value at the program pointer
- Save the MSB into another register
- Move the program pointer forward by the value in the rest of the byte
- Read back the saved bit from the register, and if true, write the current byte (and following X-1 bytes) to the output stream. And reset the program pointer to the beginning.
So the problem here is with step 5, the move - I can only move some distance determined by the number of bits available (which will be 7, so a max distance of 63 bytes). If I'm pointing at a left child half of a node, one of those bytes will be required to move past the other half, leaving 62 bytes by which I can skip forward - so 61 bytes, or 30 2-byte nodes, to back into the space inbetween. I guess I could add 1 to all offsets (because offset 0 will never happen) to give me space for 31 nodes instead. Of course, that assumes that the actual symbols I'm encoding are 2 bytes as well - they're stuck in the middle of everything else, so if they're 1 byte or 4 bytes they'll throw those numbers out.
I could just pack a level0 node, followed by 2 level1 nodes, followed by 4 level2 nodes, etc etc.. but the problem comes when I've got more than 31 nodes in any subtree. If the left and right children of the root tree node have 31 children each, the right child of the root will end up needing to be 'interleaved' with the last couple of nodes in the left node's subtree. I'm not sure how to track that.
I'll make a start by calculating the total number of bytes to be stored under each node during the tree building stage...