• Advertisement
Sign in to follow this  

Cone Step preprocess algorithm for GPU

This topic is 2618 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts


[font="Calibri"]I wish to present fast and simple algorithm that will prepare texture for Cone Step Mapping. The algorithms that I know work slow and were implemented for CPU only. The usual times presented in different publications are minutes to hours. My implementation for GPU, if running on GF9600GT (low cost budget GPU), takes 20-30ms for 256*256 texture and 500-1500ms for 1024*1024.[/font]

[font="Calibri"]I assume that you are familliar with Cone Step Mapping algorithm. This algorithm requires 2 additional channels for texture: a height channel that stores height in the points and a tangent channel that stores internal tangents (or square roots of tangents). During the draw these 2 channels are interpolated so for every point in the texture space the correct height is available. This interpolation brings a requirement for tangent channel: every texel must contain a tangent value that is minimum from 3*3 nearest corners. This requirement applied during the post-process after all tangents are known. Another approach is to use point filtering for these 2 channels. This is possible if the height channel contains a cap for a texel: maximum height from 4 texel’s corners. The tangent channel just contains a minimum tangent that is calculated for texel surface. In this case there is no need for postprocess and final tangents are better than for classic approach. But draw loop itself is more complex due to spaces below the caps.[/font]

[font="Calibri"]Anyway, the code attached can be configured for any approach. It also can be configured for tiled/non-tiled textures. If texture is tiled, the algorithm executes 4 times for different points: this increases time by 5% only.[/font]

[font="Calibri"]The algorithm is stack-based. As you know, current pixel shaders do not have indexed register access, so I used binary search for stack Push and Pop operations. I tried to use array shift also, but unfortunately this worked even slower than binary search using flow control.[/font]

[font="Calibri"]The algorithm can process square textures only, the side size must be power of 2.[/font]

[font="Calibri"]Before the main search, you must prepare acceleration textures:[/font]

[font="Calibri"]-[/font] [font="Calibri"]You must find a pyramid of min-max height textures: if initial texture is 32*32, create texture 16*16 that will keep max and min values for every 2*2 rectangle from the initial 32*32 height texture. Also find 8*8 level, 4*4 and so on down to 1*1. This is as for QuadTree displacement mapping algorithm, also known as maximum mip-map displacement mapping. Then pack all these levels into one 2-channel 32*16 texture: level 1*1 should be placed at point {1,0}; level 2*2 at point {2,0}; … level 16*16 at point {16,0}[/font]

[font="Calibri"]-[/font] [font="Calibri"]Create 4 channel height texture of original size: every channel contains height value from the corners of the texel. This step is optional. The code attached may work with initial one channel height texure also. But 4 channel height texture saves instructions and increase speed by 10%[/font]

[font="Calibri"]Finally, before the main search you must have 2 textures: 2 channel pyramid atlas texture and 1 or 4 channel height texture.[/font]

[font="Calibri"]Now algorithm (for every texel):[/font]

StackPointer = 0
SearchRect = {0, 0} // coordinates
SearchLevel = 0 // this means that SearchRect has size == 1
Tangent = MAX
NeedPop = false

While (true) {
Get MinMax height for SearchRect
Compare {rect,MinHeight} against the current cone: update tangent if cone intersects the rect
Compare {rect,MaxHeight} against the current cone:
if rect is below the cone, set NeedPop = true
else go into the rect:
if level == MAX
calculate tangent for every of 4 points contained in the rect,
update tangent and set NeedPop = true
else subdivide rect:
subdivide rect to 4 subrects;
find order of walk: nearest first, far last;
encode order of 3 subrects into a single floating point value
push <rect_x, rect_y, level+1, code> into the stack
select nearest subrect, increase level and continue with this subrect

if NeedPop == true then pop saved rect from the stack:
If StackPointer == 0 then break
Set NeedPop = false
Get {rect, level, code} from the top of the stack
Modify code at the top of the stack
Calculate subrect using {rect, level, code}
If subrect is last then decrement StackPointer
Use this subrect

[font="Calibri"]A very important note about usage under the Microsoft Vista and Windows 7. If triangle takes too much time for rendering (more than 2 seconds), the Windows thinks that the driver hangs and restores the driver. In this case you never get results. So subdivide viewport to patches of 128*128 size and render them one after one. Using this approach it is possible to preprocess 4096*4096 textures: full preprocess takes ~40 seconds.[/font]

[font="Calibri"]Few words about encoding of walk order. For a parent rectangle, there are 4 subrectangles. I call them as {0,0} {1,0} {0,1}{1,1}. If nearest subrect p0 is {dx, dy}, the far (or p3) subrect is {1-dx, 1-dy}. The p1 and p2 are {dx, 1-dy} and {1-dx, dy}. In some cases p2 should be accessed before the p1. This depends on a position of rect against the diagonal crossing through the test point. You may use DIAGONAL_SWITCH define: sometimes this increases speed, sometimes no.[/font]

[font="Calibri"]Let the order is {dx,dy},{ax,ay},{bx, by}, {1-dx,1-dy}[/font]

[font="Calibri"]Then encode this to a single floating point value: [/font]

[font="Calibri"]ax + ay*2 + bx*4 + by*8 + (1-dx)*16 + (1-dy)*32 + END_MARKER[/font]

[font="Calibri"]then it is possible to extract {x_shift, y_shift} back: just multiply code by 0.5 and use fractional part, then clear code from the fraction. If code == END_MARKER, all subrects are processed.[/font]


[font="Calibri"]Excuse me, but I can attach assembler code only. I started with the HLSL sources, but unfortunately microsoft compiler could not compile such sources: the O1 optimization generated too poor code (or even could not compile – it wanted more than 32 registers), the O3 optimization thinked 40 minutes and then failed with unbelivable messages. So I moved to assembler code.[/font]


[font="Calibri"]Mikhail Levashov,

Share this post

Link to post
Share on other sites
This sounds very interesting! I'm familier with Cone Step Mapping but have never implemented the algorithm. It the algorithm I'm intending to use when I get around to adding sub-polygon details to my engine. Well actually the relaxed varient, RCSM. I believe the input textures ate the same in both cases, it's just a question of how you use them?

Anyway, great work and thanks for sharing!

Share this post

Link to post
Share on other sites
Sign in to follow this  

  • Advertisement