# KD-Tree nodes in 4 bytes

This topic is 5015 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

##### Share on other sites
Very interesting. I was wondering the same thing lately, but I hadn't though about it in this much detail.

I have got around 6 bytes per triangle average storage with my aabb tree. I'ts really not clever - the main idea is to store several tris at once ( 2 - 8 ) in a leaf, which amortizes the cost of the full 24-byte AABB at each node & leaf.

I could get it much smaller if I compressed the AABB, but its not a bottleneck right now...

##### Share on other sites
I posted once about how I store my kd nodes in 32 bits. I do the axis/leaf identifier in the lower 2 bits of the split-plane. If the low 2 bits specify 'leaf' then the upper 30 bits are an offset into a list of primitive indexes. The primitive indexes are specified by a count of the primitives in the leaf in the first entry, then a list of unsigned ints that point to the actual leaf data.

 _______________ ______  ______________ ______  ______________ ________| 30-bit offset | leaf || 30-bit offset| leaf || 30-bit split | x-axis | --------------- ------  -------------- ------  -------------- --------      |                  /      |                 /   ___|_____ _______ __/______ _____  | # prims | 0 1 2 | # prims | 2 4 |   --------- ------- --------- -----            /  /    \    ______/   \     .           /   |     \  /           \     .        [leaf] [leaf] [leaf] [leaf] [leaf]

and the children of the node stored at m_nodes are implicitly stored at left = m_nodes[2i] and right = m_nodes[2i + 1].

##### Share on other sites
i do the same thing for my leafs. nearly the same. i store the amount of primitives in a leaf also in the node itself. plenty of space, and it saves a dereference on empty nodes.

as for implicitly storing nodes: isnt that very memory AND cache inefficient? note that this is just my reasoning, i havnt tested this, but deeper down in the tree it hardly grows exponentionally, even more so, it should shrink in the deepest levels. youll have seas of empty space deep in your tree, and no cache coherency at all.

however, after posting this topic i had a similar idea. to reduce the amount of bits needed to specify an offset, you could, instead of specifying the distance from object to object, specify the distance from a trendline, for example j = 2*i, although i think j = i is better on average. if you store the widest level of the tree, you could use a varying trendline depending on wether its expectde to grow or shrink. that really should bring back the number of needed bits for a decent sized tree to something like 14 or so, leaving a nice 16 bits for splitplane location.

if you store vertices as 16bit short, you only need 16bit short splitplane positions anyway.

however, im still not sure about the fixed-point aspect of all the maths, and its speed in particular. maybe i should start a topic on that in the math forums.

##### Share on other sites
By tracking the aabb of a node while traversing the tree, you could use your 16-bits like this:

unsigned int nodeBits = m_nodes[currentNodeIndex]; // Upper 16 bits are the split position.float weight = float(splitIndex) * (1.0f / float(0xffffffff)); // Get the weight [0, 1]float splitPos = aabbMax[axis] * weight + aabbMin[axis] * (1.0f - weight); // Get the actual split-axis

You could probably ignore to clear the lower bits since they would only shift the position very little.

##### Share on other sites
Quote:
 Original post by Eelcoi store the amount of primitives in a leaf also in the node itself. plenty of space, and it saves a dereference on empty nodes....as for implicitly storing nodes: isnt that very memory AND cache inefficient? note that this is just my reasoning, i havnt tested this, but deeper down in the tree it hardly grows exponentionally, even more so, it should shrink in the deepest levels. youll have seas of empty space deep in your tree, and no cache coherency at all.

right, if the leaf is empty I save the offset as 0, so there is no dereference. And as for memory/cache efficient... that's 2 different questions. Yes, memory usage it is exponential, but even at the 20th level, it's 4 Megs. Beyond that it gets much less efficient. Phantomus talked about a 45-deep tree for Buddah... I don't have 128 Terabytes of RAM, so it's not an option for me :)

As for cache-efficient, that was the whole reason why I originally came up with this scheme. It allows to determine address of grandchildren nodes without dereferencing and prefetch way ahead of time. Also, grandchildren are stored consecutively in 32 bytes -- size of a cache-line, so they can all be prefetched in one call.

As it turned out, when I profiled with prefetching turned off, it really didn't spend any time stalling on d-cache. I'm honestly not sure why this is. I was kind of mad because it eliminated what I thought was a huge performance benefit of my design. Oh well.

##### Share on other sites
yeah, thats sort of what i was thinking. by specifying everything (memory locations and positions in space) relative to a parent, you can archieve the needed resolution in only a couple of bits. however, i think any casts to float ought to be avoided in your inner loop, so another method should be used i think.

for example, you could store 7 exponent bits and 9 mantissa bits to store a truncated float in the range 0-1.

thinking about the implicit storage of nodes again, i just realized a way to do it with minimal memory waste. say you have a treelevel i and j=i+1, with two indices in it, I, and J respectively. for each node in i, count how much leafs came before it in i. since leafs wont have children, you can then store semi-implicit pointers of the form J = j.start + 2*(I - nodesbeforeI). if you put all nodepairs which are both children at the end of a level, i think you could build huge trees with only a couple of bits.

##### Share on other sites
Quote:
Original post by ajas95
Quote:
 Original post by Eelcoi store the amount of primitives in a leaf also in the node itself. plenty of space, and it saves a dereference on empty nodes....as for implicitly storing nodes: isnt that very memory AND cache inefficient? note that this is just my reasoning, i havnt tested this, but deeper down in the tree it hardly grows exponentionally, even more so, it should shrink in the deepest levels. youll have seas of empty space deep in your tree, and no cache coherency at all.

right, if the leaf is empty I save the offset as 0, so there is no dereference. And as for memory/cache efficient... that's 2 different questions. Yes, memory usage it is exponential, but even at the 20th level, it's 4 Megs. Beyond that it gets much less efficient. Phantomus talked about a 45-deep tree for Buddah... I don't have 128 Terabytes of RAM, so it's not an option for me :)

As for cache-efficient, that was the whole reason why I originally came up with this scheme. It allows to determine address of grandchildren nodes without dereferencing and prefetch way ahead of time. Also, grandchildren are stored consecutively in 32 bytes -- size of a cache-line, so they can all be prefetched in one call.

As it turned out, when I profiled with prefetching turned off, it really didn't spend any time stalling on d-cache. I'm honestly not sure why this is. I was kind of mad because it eliminated what I thought was a huge performance benefit of my design. Oh well.

yeah id also think prefetching grandchildren would be a good idea.

but youre saying it basicly it didnt matter? was it only the prefetching that didnt help, or was the switch to 4 byte simply completely useless?

btw, even if there is no speed benefit, i still like the idea of fixed point math. i can never stand floats and the fact that their resolution isnt constant, whereas for such an application, it obviously should be.

##### Share on other sites
Quote:
 Original post by Eelcobtw, even if there is no speed benefit, i still like the idea of fixed point math. i can never stand floats and the fact that their resolution isnt constant, whereas for such an application, it obviously should be.

I'm not sure squeezing nodes to 32bits is a win. First, once you have accessed a node you've wasted an entire cache line for it, say 64 bytes (read locality matters more).

Then, and that's the biggest problem in my eye, when doing a traversal à la Wald you better have the distance computed as early as possible, because you have few cycles to waste on concurrent computation before you have to decide (that is, use that distance) where you're going.
Going from fixed point to float/vector is going to take time you don't have; unless you make an all fixed point raytracer :)

Btw there's a good reason to use 3+1 bits to encode the leaf flag and axis (like Wald pointed out): with the leaf bit as the msb you can just test for signedness to know if it's a node or a leaf.
Then if you store the axis in the lowest part, you just need a compact and with a small immediate to extract it and you don't need to shift/scale the offset before you use it.
Can't be more efficient computationnaly.

##### Share on other sites
Quote:
Original post by tbp
Quote:
 Original post by Eelcobtw, even if there is no speed benefit, i still like the idea of fixed point math. i can never stand floats and the fact that their resolution isnt constant, whereas for such an application, it obviously should be.

I'm not sure squeezing nodes to 32bits is a win. First, once you have accessed a node you've wasted an entire cache line for it, say 64 bytes (read locality matters more).

yeah that makes sense.

Quote:
 Then, and that's the biggest problem in my eye, when doing a traversal à la Wald you better have the distance computed as early as possible, because you have few cycles to waste on concurrent computation before you have to decide (that is, use that distance) where you're going.Going from fixed point to float/vector is going to take time you don't have; unless you make an all fixed point raytracer :)

does the smiley indicate you think thats a bad idea? note that im not a numerical guru, but i have the feeling floats are used far to widely, in applications where they shouldnt be. however, going fixed point is often considered a hassle, because everything is built around floats, for seemingly no other reason than that its assumed to be the better choice.

why would i want to store vertex positions as floats? i might want to do so for vertices in camera-space, where adaptive resolution makes sense, but i see no compelling reasons to let my coordinates in model space have a resolution of 2^128 near the origin, and a resolution +-120 orders of magnitude lower just one unit away from it. floats enable me to cover a huge range, but my model space (in which i perform all calculations) is very much a bounded domain, so any ability to specify positions beyond that are wasted bits.

Quote:
 Btw there's a good reason to use 3+1 bits to encode the leaf flag and axis (like Wald pointed out): with the leaf bit as the msb you can just test for signedness to know if it's a node or a leaf.Then if you store the axis in the lowest part, you just need a compact and with a small immediate to extract it and you don't need to shift/scale the offset before you use it. Can't be more efficient computationnaly.

yeah that makes sense from an efficiency POV. i was assuming simple operations like bitshifting and bitmasking could be scheduled away somehow, but i suppose thats a little optimistic.

so your conclusion is going 32bit is a waste of time? i certainly am inclined to say so after the input ive been given (for which id like to thank you all).

however, id still like to investigate the possibilities of fixed point tracing further. i have applied fixed point math to a raycaster once, but needless to say it was a little less time crititcal. i cant remember the specifics, but it was a huge speedboost in texture lookup/interpolation anyway, and in general, i dont see why it has to be slower, as long as you stay away from casts. and ofcource make sure your algorithms are stable with fixed point math under any circumstances, ie no overflows.

1. 1
2. 2
3. 3
Rutin
22
4. 4
5. 5

• 13
• 19
• 14
• 9
• 9
• ### Forum Statistics

• Total Topics
632930
• Total Posts
3009290
• ### Who's Online (See full list)

There are no registered users currently online

×