Sign in to follow this  
Stonemonkey

Stepping through a height map to find intersection

Recommended Posts

Hi all, I have something I want to speed up. At the moment I have a height map (max size 256*256) with positive values and a vector always with positive z and a start point somewhere on the map with z=0 and want to find where the vector intersects the height map. I only need the element, not accurate intersection coords.

It all starts off as floating point (the height map, the vector, and the start point) and at the moment I am scaling the vector scale3d(vector,1.0/max(vector.x,vector.y)) then convert the x and y components to 8:8 fixed point for stepping through the map and do the height map z comparison as floats

p_loop:
paddw mm3,mm4 //add x and y for next element (fixed point)
Addss xmm5,xmm6 //add z (float)

Movd edx,mm3 //get index for height map lookup
Shr edx,8
Shl DX,cl
Shr edx,cl
And edx,ebx //mask for wrapping coords
Comiss xmm5,[esi+edx*4] //compare z with heightmap
Jb p_loop //repeat if still lower
So either the x step or the y step (whichever is largest) is equal to 1 and the other is fractional.

I've tried increasing the step to 2 and then when the comparison passes then step back 1 to do a final check, although not petfect that's adequate and does give me some speed up but I'm wondering if there's better ways of going about this.
It seems like a similar problem to voxel landscapes but I can't use the method of vertical scanning used there.

Share this post


Link to post
Share on other sites

Since the algorithm has no mathematical optimizations yet, a more advanced data structure in a high level language could easily outperform the early micro optimizations while also being more future proof and portable.

So move back a little to think about the statistical properties of heightmaps.

Usually, a height map has little difference between a point and the highest neighbor.

This can be used to skip comparisons when you know that a large field has a certain maximum height.

Then you can use a bounding box for 16x16 samples to confine all height values below a maximum plane.

Then 4x4 of those boxes are grouped together and so on in a heap structure remembering the maximum of each group.

When intersecting, you only intersect down into the actual samples after passing the test of each group's bounding box leading to it.

By storing the groups based on size in a heap, the largest groups that are used most frequently are stored together like in planar pyramid images for a much better cache efficiency.

 

To keep things easy to debug as the project grows larger, always make a reference implementation first so that automated tests can cast millions of rays with both methods and assert correctness.

You should also encapsulate the grid heap structure so that it looks like a regular image format to the outside but with line intersection.

Edited by Dawoodoz

Share this post


Link to post
Share on other sites
Ah, that what I have is in asm isn't exactly due to early attempts at optimizing, initially I was working on normal mapping using SSE, it was only when I'd got that working that I realised I had the eye vector in texture space and could use that to step through the height map for what I later found out was parallax mapping and it was pretty easy to just add that in as asm code although it was slow from having to step through it. So yes, I realise that the algorithm is what needs looking at.

I'm going to look into what you're suggesting here, could take a bit of organising to get the pyramid of maps, one thought I've just had though (may not be as cache or memory friendly) is instead of having my height map as floats, use 8 bit ints and keep the element size at 32 bits and store the group heights in the higher bytes of the element.

Another way I've just seen somewhere is to use the height of the initial element to extend the ray approximately and carry out a number of iterations

Share this post


Link to post
Share on other sites

A cache miss from storing 32 bit floats would be much worse than some cycles spent on unpacking and if you solve the puzzle well, you might get an advantage from using small types in your calculations. On ARM NEON, the math operations would be easier to distribute on more lanes but SSE has the advanced memory loads that you can take advantage of.

Share this post


Link to post
Share on other sites
OK thanks, I'm working on x86 at the moment but will be trying it out on arm neon later.

I'm trying to come up with ways of doing this (the idea of using the current height to extend the vector didn't work too well) but I'll have to come up with a different way of stepping through the group elements from what I'm currently doing, I can get away with missing the corner of a single element but not the corner of a 16x16 group.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this