• Advertisement
Sign in to follow this  
  • entries
    375
  • comments
    1136
  • views
    297900

Yay more trees!

Sign in to follow this  

90 views

So I think I might know how to solve my tree problem - or at least, how to go about developing a solution. I've not got it all figured out up-front but I guess as I move closer to a solution the remaining steps will become more obvious.

The first thing I need to do is write a 'byte scheduler.' I've got no idea if that's what they're actually called, but what I'm talking about is something that can track 'reservations' in a theoretical buffer, like someone managing bookings of a conference hall. I can file a reservation of 2 bytes (or X bytes, for leaves) along with a pointer to a callback interface; the scheduler can then shuffle things around, calling back once in a while to see is a particular slot is OK for a particular reservation, until the problem's solved. It requires no actual buffer space because the scheduler will operate on a list of entries (3-tuples, I think, {start, size, callback}). One of the things that had really been screwing with my head was how to deal with the fact that leaf nodes could be four times as big as internal nodes and yet could be scattered throughout the tree; this should make things easier, as well as allowing me to 'prototype' a tree layout without actually having to copy any data around.

As far as the actual algorithm goes. It's important to remember that (hopefully) the tree won't be very balanced - the most frequently occuring elements will have shallower leaves, and the rarer ones deeper. It should only come out balanced if each symbol occurs roughly the same number of times in the sampled data. So at a given depth N (where N is greater than, say, five) I will hopefully be dealing with fewer than 2^N nodes. I still have to cope with perfectly balanced trees, but this might allow me to make certain optimizations.

I've really got two approaches available to me. Either, my algorithm can produce the exact solution in a single pass; or, it can produce a 'best guess' (which may not be 100% valid) and then iteratively refine the layout until it is valid and complete.

If I go with the former, I may want to establish a couple of loop invariants (see, I really am reading that Algorithms book). I can't really think of any that would really help, though, short of "For each node written to the output, the remaining nodes must have a solution." Well, duh.

I've been wondering about a divide-and-conquer approach - solving for subtrees and then for the root - but I don't think it can work here because the subtrees may need to be written out in such a way as to leave room for nodes in other subtrees (i.e. "interleaving" subtrees). As such the subproblems aren't independent.

I can smell food. I'm going to go see what it is, and then come back and write my byte scheduler.
Sign in to follow this  


0 Comments


Recommended Comments

There are no comments to display.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Advertisement