# Raytracing - very, *very* weird kdtree problem

This topic is 4761 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

-edit- ack, never mind... my triangle intersection code was playing tricks with me. Anybody got any precise intersection code for triangles? [Edited by - Megin on January 4, 2005 6:01:54 AM]

##### Share on other sites
i suppose the problem lies in the axis-alignedness of the triangles that form the box. the problem might be that your kd-tree fits way too tight around those tris, thereby missing the tri due to numerical error.

try a randomly oriented triangle and see if the problem disappears.if so i would suggest giving your splitplanes a slight offset.

##### Share on other sites
Here's some triangle intersection code:

const unsigned int modulo[] = { 0, 1, 2, 0, 1 };#define ku modulo[k + 1]#define kv modulo[k + 2]int TriAccel::Intersect( RayPacket* a_Packet, IData* a_IData, int r ){	int k = rayidk & 3;	int aidx = r << 2;	const real lnd = 1.0f / (a_Packet->dcell[(k << 2) + r] + nu * a_Packet->dcell[(ku << 2) + r] + nv * a_Packet->dcell[(kv << 2) + r]);	const real t = (nd - a_Packet->ocell[(k << 2) + r] - nu * a_Packet->ocell[(ku << 2) + r] - nv * a_Packet->ocell[(kv << 2) + r]) * lnd;	if (!(a_IData->dist[r] > t && t > 0)) return MISS;	const real hu = a_Packet->ocell[(ku << 2) + r] + t * a_Packet->dcell[(ku << 2) + r] - au;	const real hv = a_Packet->ocell[(kv << 2) + r] + t * a_Packet->dcell[(kv << 2) + r] - av;	real u = hv * bnu + hu * bnv;	if (u < 0) return MISS;	real v = hu * cnu + hv * cnv;	if (v < 0) return MISS;	if ((u + v) > 1) return MISS;	a_IData->dist[r] = t;	a_IData->u[r] = u;	a_IData->v[r] = v;	a_IData->tacc[r] = this;	return HIT;}

It assumes that the following data members for class TriAccel:

int rayidk;real nu, nv, nd;real bnu, bnv;real cnu, cnv;real au, av;

And it assumes that these fields are filled using the following precomputations:

vector3 A = a_Prim->GetVertex( 0 )->GetPos();vector3 B = a_Prim->GetVertex( 1 )->GetPos();vector3 C = a_Prim->GetVertex( 2 )->GetPos();vector3 c = B - A;vector3 b = C - A;vector3 N = b.Cross( c );int u, v;if (_fabs( N.x ) > _fabs( N.y)){	if (_fabs( N.x ) > _fabs( N.z )) rayidk = 0; else rayidk = 2;}else{	if (_fabs( N.y ) > _fabs( N.z )) rayidk = 1; else rayidk = 2;}u = (rayidk + 1) % 3;v = (rayidk + 2) % 3;real krec = 1.0f / N.cell[rayidk];nu = N.cell[u] * krec;nv = N.cell[v] * krec;nd = N.Dot( A ) * krec;real reci = 1.0f / (b.cell[u] * c.cell[v] - b.cell[v] * c.cell[u]);bnu = b.cell[u] * reci;bnv = -b.cell[v] * reci;cnu = c.cell[v] * reci;cnv = -c.cell[u] * reci;au = A.cell[u];av = A.cell[v];

This is heavily based upon Ingo Walds thesis on real time ray tracing.

Shameless plug: You can see this code in action in this demo:
[url]http://www.bik5.com/bunny.zip[/url] (probably P4 only, uses sse2).

##### Share on other sites
what happens when:
(a_Packet->dcell[(k << 2) + r] + nu * a_Packet->dcell[(ku << 2) + r] + nv * a_Packet->dcell[(kv << 2) + r])

equals zero? Is it possible? I assume that's the case where the ray is coplanar with the triangle. Surely in something like raytracing, which fires 50 bazillion rays, the problem is not insignificant.

There is a method that compares the signs of the determinants of the ray segment with each pair of triangle vertices that is the most stable, and very fast, but doesn't generate t,u,v. If you wanted to do the most precise (not generally fastest), you could test ray/tri with that method, and then use double-precision to calculate t,u,v for triangles found to intersect.

I don't know of anyone that actually needs to do that though... unless you're into RTing extremely small triangles that are coplanar to the viewpoint :)

Also, I really wish you wouldn't edit away your question just because you figured it out. That's the point of forums. Someone has a similar problem later, they search the forums, find your post, read the discussion, and hopefully enjoy a similar solution.
[/edit]

[Edited by - ajas95 on January 4, 2005 12:47:51 PM]

##### Share on other sites
sorry about that, different conventions on different forums I suppose. I'll keep it in mind if I ever post a question again.
Eelco was right on the money though, the boundingbox was too tight around the triangle. (ie: in one of the axis the difference between the minimum and maximum coordinates was precisely 0.0).
The way I figured there are two solutions: either make the bounding box a tad bigger (something like 0.001 in all directions), or add a small ammount to the distance traveled inside the node (assuming use of havran's (?) appendix C traversal code), so objects exactly against the back "wall" of the node don't get cut off.
Thanks for all the intersect code, now I'll have to figure out what suits my tracer best :)
phantomus, that code you posted seems to have alot of precomputed data. Have you had any problems with memory usage when rendering huge ammounts of triangles?

Oh, by the way, has anyone found the mailboxing technique worthwhile to implement?

##### Share on other sites
Megin: It's 48 bytes of precomputed data. However, you can delete your original triangle data, as it's not needed any longer to test for intersections. In practise, you'll want access to a material and three vertices, so I extended it to 64bytes. But once that's done, there's really no reason to keep the original data, so 64bytes is all you need (compared to about 32 bytes for a regular triangle). My ray tracer needs about 200Mb to render a 1.1M model. This includes triangle data, the kd-tree and various other structures.

Mailboxing: It's worthwhile as long as your kd-tree sucks. :) It always gave me some gains, but now that the dust has settled in kdtree.cpp it doesn't give me any improvements anymore (could just as well accept the extra intersections) but it doesn't hurt either. I believe this is what everyone is experiencing atm.

##### Share on other sites
Quote:
 Original post by phantomusMailboxing: It's worthwhile as long as your kd-tree sucks. :) It always gave me some gains, but now that the dust has settled in kdtree.cpp it doesn't give me any improvements anymore (could just as well accept the extra intersections) but it doesn't hurt either. I believe this is what everyone is experiencing atm.

interesting, i didnt know that.

however, i plan on tracing cubic beziers, and since their AABB is less optimal, and their intersection cost much higher compared to tree traversal, it should still be worth it.

however, im not really sure whats the best way to implement a mailbox. just keep a list (or probably better, a fixed size array), and compare entries before intersecting? seems like quite some overhead, or are there more clever techniques?

maybe give each patch a field that holds the last rayID it was intersected with? a little more memory, but youd only have to make one check. memory isnt an issue anyway when dealing with higher order surfaces instead of triangles, so i suppose ill go with that. on the other hand: how do you generate a guaranteedly unique rayID? maybe if youd use only one bit, set it to 1 when intersected, and set all 1's back to 0's when youre done with a ray. thatd still require building a list of intersected patches, but you wouldnt have to check against it every intersection.

##### Share on other sites
For more expensive primitives, it's definitely worth it, unless you have a very efficient early out test (like I have in my ray/tri code).

I implemented it as follows: Each ray has a unique ID. When a primitive is tested, this ID is stored for the primitive. A primitive is only tested if it's stored ID is not equal to the ID of the current ray. Very simple, works very well.

##### Share on other sites
so how do you generate a unique ID?

it seems kindof like an impossible task to me. theres only a limited number of ID's in an x bytes ID, and youd need as much bytes as it takes to define a ray (two 3d float vectors, 2*3*4 = 24 bytes) to store a truely unique ID. now i realize you can simple use a 32bit ID with some magic, but there is absolutely no guarantee two different rays wont end up with the same ID when they obviously shouldnt.

you probably wont notice, but it does look a tad hackish to me.
giving each primitive a 1 bit field 'haveibeenintersectedwiththecurrentrayyet?' seems like a more elegant solution to me, except for the fact youll need to reset those bits again. however, that might be accomplished efficiently by for example putting the used primitives in the first elements of the array holding them all. might give some nice cache benefits aswell :).

##### Share on other sites
Hm you're right, perhaps that's causing the small speckles I have in the demo I pointed to a few posts above this one (did anyone try it? could you post some performance figures and machine specs?). Basically I start with ID 0, and add one to this for every generated ray. It's an unsigned int, so there are quite some rays generated before it wraps. Should be pretty safe.

Using your one bit approach requires you to reset the bits. And that's not going to help performance. That list that you mention is going to be huge, on my bunny model about half of the triangles will be in it. For the 1.1M buddha model it's a nightmare. And even for that model, the incrementing ID will cycle only every x frames so why bother. ;)

##### Share on other sites
Quote:
 Original post by phantomusHm you're right, perhaps that's causing the small speckles I have in the demo I pointed to a few posts above this one (did anyone try it? could you post some performance figures and machine specs?). Basically I start with ID 0, and add one to this for every generated ray. It's an unsigned int, so there are quite some rays generated before it wraps. Should be pretty safe.

ill try it after i finish this post.
uint, 4 billion isnt it? i agree, considering you only cast a couple of million rays / frame, there would only be a change once every 1000 frames for a patch not to be hit at all 1000 frames long, then to be hit by a ray with the exact same ID. not going to happen, realisticly. not an optimal solution though, imho.

Quote:
 Using your one bit approach requires you to reset the bits. And that's not going to help performance. That list that you mention is going to be huge, on my bunny model about half of the triangles will be in it. For the 1.1M buddha model it's a nightmare. And even for that model, the incrementing ID will cycle only every x frames so why bother. ;)

i dont think its that bad. i keep my patches in an array inside my mesh class. if i just swap the references to the tested patches to the front of the array (a cheap operation youll agree), resetting the bits will be a joke. just count how many patches youve intersected (only something like 3 on average at most) and reset their bits.

##### Share on other sites
Quote:
 Original post by phantomusdid anyone try it? could you post some performance figures and machine specs?

I tried it on a celeron 600 with 128M RAM running XP (i know, i know, that's the work conditions they gaved me!) and it crashed. I don't know if the use of sse2 could have made the crash, but it's possible.

##### Share on other sites
Celeron, that's PII class right? Nah that's not gonna cut it. :)

"not an optimal solution though, imho."

Why on earth not? :)

"i dont think its that bad"

It's not bad, but it's not better. My approach takes NO postprocessing and you're never gonna beat that. Anyway those are your clockcycles, you may decide what to do with them. ;)

##### Share on other sites
mobile p4 centrino 1300, it maintained an easy 5 fps, very very awesome!

however, seeing the size of such a bunny and the accompanying kdtree, reinforces my choice to go with higher order surfaces :P. also, do you use vertex normals? i see raytracing of higher order surfaces as a good way to get rid of that hack once and for all.

##### Share on other sites
oh yeah.

i did see some speckles, but i dont think its even possible they were due to your mailboxing method, since none of the affected triangles had the chance not to be hit. it was probably just passing of rays trough edges. another problem i hope to reduce by using barycentric patches. patches rule.

as for the postprocessing you mentioned: three dereferences and three bits to reset (on average). doesnt seem significant to me, but if i can cram that bit in somewhere else, it does save me 4 bytes/primitive. (3 if i dont).

another idea: instead of a bit, i give a primitive a pointer to the primitive that was hit before it. if this pointer is null, the primitive hasnt been hit yet, otherwise it is. to reset these pointers, just unraffle the chain.

this allows me to keep the primitives not by reference, but directly in the mesh structure, which in turn allows for 2 byte pointers. (aswell as less memory fragmentation and better cache behavior)

##### Share on other sites
5fps is nice, that's in line with my own measurements.

But you're right: To get a decent amount of detail I use 69k triangles (well that's not completely true, I justed wanted to render a standard model). Higher order surfaces would probably much more efficient, and perhaps the lower memory usage would be more friendly to the cache also. But I'm not sure about this vertex normal thing: Obviously you don't have to interpolate normals based on barycentric coords anymore but how about texturing?

##### Share on other sites
O btw I have to admit I used one trick. The shadow rays start at a specified offset from the intersection point of the primary ray and the mesh. The advantage is that these rays don't have to traverse the dense kd-tree structure near the bunny mesh; for two light sources this virtually doubles the performance of the demo. The drawback is that some small features don't cause self shadowing. This is noticeable for the ears: Without the trick, the edge of the ear casts a shadow on the inner ear; with the trick it doesn't. But at 100% speed improvement, I thought it was a nice trick nevertheless.

##### Share on other sites
Quote:
 Original post by phantomus5fps is nice, that's in line with my own measurements.But you're right: To get a decent amount of detail I use 69k triangles (well that's not completely true, I justed wanted to render a standard model). Higher order surfaces would probably much more efficient, and perhaps the lower memory usage would be more friendly to the cache also. But I'm not sure about this vertex normal thing: Obviously you don't have to interpolate normals based on barycentric coords anymore but how about texturing?

well, much more efficient not im affraid. youll have much lower memory requirements, so indeed better cache, and less tree traversal. however, intersection aswell as normal calculation is considerably more expensive. texturing shouldnt be a problem. a good intersection algo returns UV coords. personally i dont like vertexnormals, because they are a hack, and one that potentially bites you in the ass with raytracing, because youll have normals that arnt always perpendicular to your surface, potentially messing up reflection and refraction calculations. higher order surfaces give you even smoother results, without the hacks, and at very much reduced memory costs.

personally i plan to go with triangular cubic bezier patches. they have some nice benefits in raytracing, such as a reduced chance of a ray passing between two of them, and i think quads are totally retarded as a primitive to model with.

im still in research phase however, im not sure yet whats the best way to combine all of my goals. wald also did a nice paper about higher order surface tracing. 'ray tracing triangular bezier patches' is also a good read imo.

##### Share on other sites
Quote:
 Original post by phantomusO btw I have to admit I used one trick. The shadow rays start at a specified offset from the intersection point of the primary ray and the mesh. The advantage is that these rays don't have to traverse the dense kd-tree structure near the bunny mesh; for two light sources this virtually doubles the performance of the demo. The drawback is that some small features don't cause self shadowing. This is noticeable for the ears: Without the trick, the edge of the ear casts a shadow on the inner ear; with the trick it doesn't. But at 100% speed improvement, I thought it was a nice trick nevertheless.

i must say i did think your shadows looked slighty 'off', although i wasnt able to specify why. in retrospect i would indeed describe it as a lack of selfshadowing. clever trick though.

##### Share on other sites
Hi again!

Tested again on a celeron 2.00 Ghz with 512Mb RAM and got a steady 6.* fps.

##### Share on other sites
That's better. Looks like I'll have to revert to distributed rendering for truely interesting frame rates though..

##### Share on other sites
I tested it on a p4 2.8Ghz (512MB ram) and I get a fabulous 8-10 FPS (though it doesnt work on my own AMD PC).

##### Share on other sites
excuse my poor benchmarking skills.

i just ran your app again, then realized i had wmp running. closing it gave me another fps! combining to a whopping 6 fps total.

btw, what resolution is your window? 400x300 would be my estimate, but id like to know for sure.

##### Share on other sites
For shadow ray, why not cast backwards from light source? Then you only test against the dense bunny tree if the point is illuminated.

Ray ID: even with 16 bits, the likelihood of your overflow problem is very small. For example, a ray doesn't get intersected in 64k ray tests, then DOES on precisely the 64kth? Even 8 bits, the chance is small, but will undoubtedly miss a tri at some point. I guess you could figure it,
P(false negative) = p * (1 - p)^n * p * i, (i.e. it gets tested, then doesn't get tested for n rays, then gets tested again, and there's an intersection (that we miss because of early exit))

Where n = Id space, p = probability a tri will be tested for an arbitrary ray, and i is the probability that given a tri is tested, an intersection is found.

Say with n = 256, p = .0005 (it depends) and i = 0.1

probability of a false negative on a triangle is 1 out of every 45M triangles would be missed. Obviously these parameters are made up, but remember that's just 8-bit ID space!

Pre-computed tri data: For 48 bits, you can pre-compute cross-products for a triangle, and then your ray/tri test is a sub and 3 dots (like 24 cycles)! (no u,v,t, but who cares about the 60 cycles to compute that once an intersection is found). And really, it's only 36 bytes extra.

Good point about deleting vertex info, though, memory access is definitely the hard part. You could prefetch all your data as soon as you hit a leaf and maybe it wouldn't be too bad.