from standard code to PS -- need advice

Graphics and GPU Programming Programming

Started by BlueChip March 31, 2005 03:11 PM

8 comments, last by Namethatnobodyelsetook 19 years ago

BlueChip

100

Author

March 31, 2005 03:11 PM

Hi folks... I need some informations... In my "engine" I use this code for terrain rendering...


m_pd3dDevice->SetTexture( 0, m_pTextureLayers[root->IDTextureLayer0] );
m_pd3dDevice->SetTexture( 1, m_pAlphaMapArray[0] );
m_pd3dDevice->DrawIndexedPrimitive( D3DPT_TRIANGLELIST , 0, 0, maxnumberofvertex, 0, m_iNumOfChunkVertex/3);
m_pd3dDevice->SetTexture( 0, m_pTextureLayers[root->IDTextureLayer1] );
m_pd3dDevice->SetTexture( 1, m_pAlphaMapArray[1] );
m_pd3dDevice->DrawIndexedPrimitive( D3DPT_TRIANGLELIST , 0, 0, maxnumberofvertex, 0, m_iNumOfChunkVertex/3);
m_pd3dDevice->SetTexture( 0, m_pTextureLayers[root->IDTextureLayer2] );
m_pd3dDevice->SetTexture( 1, m_pAlphaMapArray[2] );
m_pd3dDevice->DrawIndexedPrimitive( D3DPT_TRIANGLELIST , 0, 0, maxnumberofvertex, 0, m_iNumOfChunkVertex/3);
m_pd3dDevice->SetTexture( 0, m_pTextureLayers[root->IDTextureLayer3] );
m_pd3dDevice->SetTexture( 1, m_pAlphaMapArray[3] );
m_pd3dDevice->DrawIndexedPrimitive( D3DPT_TRIANGLELIST , 0, 0, maxnumberofvertex, 0, m_iNumOfChunkVertex/3);

but this brings my frame from 80 to 30.... is there an altrenative PS code? Is that alternative PS code more fast? are there others ways to implement it?? thanks boys

JohnBolton

1,373

March 31, 2005 08:54 PM

Definitely.

1. Convert m_pAlphaMapArray into a single 32-bit texture.
2. Set up all 5 textures to be passed into the pixel shader.
3. In the pixel shader, use the 4 channels of the alpha texture to modulate the values of the 4 textures and combine them into a single color.

The code you posted turns into this:

m_pd3dDevice->SetTexture( 0, m_pTextureLayers[root->IDTextureLayer0] );m_pd3dDevice->SetTexture( 1, m_pTextureLayers[root->IDTextureLayer1] );m_pd3dDevice->SetTexture( 2, m_pTextureLayers[root->IDTextureLayer2] );m_pd3dDevice->SetTexture( 3, m_pTextureLayers[root->IDTextureLayer3] );m_pd3dDevice->SetTexture( 4, m_pAlphaMapArray );m_pd3dDevice->DrawIndexedPrimitive( D3DPT_TRIANGLELIST , 0, 0, maxnumberofvertex, 0, m_iNumOfChunkVertex/3);

John BoltonLocomotive Games (THQ)Current Project: Destroy All Humans (Wii). IN STORES NOW!

BlueChip

100

Author

April 01, 2005 07:16 AM

Hi JohnBolton... and thanks for your advice..

In your opinion, this helps fine my fps?

I don't know PS, but I can study it.... but first I would want to be sure :)
i.e if with only 1 texture I get 70 FPS, and with 4 I get 30 fps, with 4 textures and PS I can get 50/60 FPS?

bye bye

Namethatnobodyelsetook

1,260

April 01, 2005 10:54 AM

Even without pixel shaders, just a more advanced TextureStageState setup you could be much faster, but first lets look at what you're doing.

You're using 8 textures, over 4 passes. Drawing 4 passes takes a long time. As John pointed out you could drop down to 5 textures, and draw in a single pass, but that requires a GeForceFX (GeForce 3&4 are 4 textures, GeForce 1&2 are 2 textures), or a similarly new Radeon card.

Depending on which hardware you're targetting, you may already have "optimal" code, using 2 textures at once. You may want to look at figuring out which geometry uses which textures, and rendering each section of it using as few textures/draw calls as you can.

Our terrain renderer uses 4 textures, and the vertex diffuse color holds 3 alpha blending amounts, controlling how to mix the textures. That works on our target hardware.

Aiursrage2k

320

April 01, 2005 11:55 AM

A general Solution:
From a book I read it says that you want to avoid changing states as much as possible. If you sorted your data by pixel shader and then by textures it could speed things up, because there would/could be less changes.

Insufficent Information: we need more infromationhttp://staff.samods.org/aiursrage2k/

Namethatnobodyelsetook

1,260

April 01, 2005 12:05 PM

Yeah, rendering each chunk 4 times and putting the state changes between each chunk is certainly hurting.

Simply changing to loop over all the chunks between texture changes may speed things up. Dynamically building an index buffer and turning the drawing of the chunks into a single Draw call per pass would be better. If you can remove a pass or 3 on many of the chunks (you'd need a new dynamic index buffer per pass), that'd be even better. In order to do that though, you'll need a better idea of which sections of the map have an alpha of 0, and that's going to be hard if you're just using an alpha texture. Perhaps a loadtime preprocessing step could determine which chunks have alpha=0 across the entire chunk. Simply include or skip a chunk's indices in the dynamic buffer based on whether the preprocessing said the chunk was used or not.

JohnBolton

1,373

April 01, 2005 12:33 PM

Quote:Original post by Namethatnobodyelsetook
... Our terrain renderer uses 4 textures, and the vertex diffuse color holds 3 alpha blending amounts, controlling how to mix the textures. ...

Ha Ha. Somehow, it never occured to me to put the blending values into the vertex color. I've been trying to figure out how to blend 4 textures at a time for so long now. ha ha

John BoltonLocomotive Games (THQ)Current Project: Destroy All Humans (Wii). IN STORES NOW!

BlueChip

100

Author

April 03, 2005 02:48 PM

thank to all... and sorry for my late answear..

Quote:
You're using 8 textures, over 4 passes. Drawing 4 passes takes a long time. As John pointed out you could drop down to 5 textures, and draw in a single pass, but that requires a GeForceFX (GeForce 3&4 are 4 textures, GeForce 1&2 are 2 textures), or a similarly new Radeon card.

Depending on which hardware you're targetting, you may already have "optimal" code, using 2 textures at once. You may want to look at figuring out which geometry uses which textures, and rendering each section of it using as few textures/draw calls as you can.

My goal was using only 2 textures, for low performance machines, but to do it I have a problem...
my thecnic is
1)render once far aways terrain with precalculed textures
2)render 4 times closed terrain with detailed textures

the problem is that I need of 8*8 precalculed texture of 1024*1024 pixels..
i.e. 189MB with .bmptextures or 32MB with .dds textures.
If I want more large terrain I must use too much memory....
my solution is make all texture at run time, but this kills my fps...

So I've decided of using PS... but this change my goal...

Quote:
Our terrain renderer uses 4 textures, and the vertex diffuse color holds 3 alpha blending amounts, controlling how to mix the textures. That works on our target hardware.

this is an excelent way.. but I don't know if an alpha value for vertex is enough to me

Quote:
From a book I read it says that you want to avoid changing states as much as possible. If you sorted your data by pixel shader and then by textures it could speed things up, because there would/could be less changes.

I think that this is not good in my case

Quote:
Simply changing to loop over all the chunks between texture changes may speed things up. Dynamically building an index buffer and turning the drawing of the chunks into a single Draw call per pass would be better. If you can remove a pass or 3 on many of the chunks (you'd need a new dynamic index buffer per pass), that'd be even better. In order to do that though, you'll need a better idea of which sections of the map have an alpha of 0, and that's going to be hard if you're just using an alpha texture. Perhaps a loadtime preprocessing step could determine which chunks have alpha=0 across the entire chunk. Simply include or skip a chunk's indices in the dynamic buffer based on whether the preprocessing said the chunk was used or not.

my terrain is in a quad-tree... I rendering tree leafs index-buffer.
These index-buffer are pieces of differents vertex-buffer, with differents texture ( always 4, but various)..
I fear that dynamically building an index buffer is not the right solution for me...too many small index buffer

Anonymous

April 03, 2005 09:37 PM

Traversing the quadtree should be considered a frustum culling, and information gathering stage. During or even after deciding what's potentially visible, you may do occlusion culling.

Once you've decided what you're drawing, you should sort by materials/textures/shaders. In a case of matching states, you should sort by Z to take advantage of quick Z rejection. So for example, if you have 12 trees and a 4 crates, you'd want to sort such that you draw the 12 trees together and the 4 crates together. Next, you may want to sort the trees by Z, and the crates by Z. Finally, you may determine which to draw first by the closest Z, or by expected estimated screen coverage, or some combination of the two.

Changing states between each quad tree node is likely very excessive, especially when doing 4 state changes per node. You quickly hit your limit for recommended state changes per scene (nVidia recommends ~300 per frame, which would be 75 terrain tiles and NOTHING else using your scheme)

You also want to limit your draw calls. If your nodes are drawing <300 triangles, you'll want to batch things. Once your quad tree has selected which nodes to draw, it should be pretty easy to do a quick memcpy of the node's indices (not in an IB... no need to read from VRAM) into a dynamic index buffer. Really. There's a TON of overhead on a draw call. For the last 5 years nVidia and ATI have both been saying "batch, batch, batch", and they mean it.

Targetting older cards likely means software vertex processing, which can mean transforming large amounts of triangles without need. If your entire landscape is in one VB like this:

01234567
89ABCDEF
GHIJKLMN
OPQRSTUV

and the visible chunks are, say, 0, 1, 2, 3, 8, 9, A, B, then software vertex processing may force you to transform the unneeded 4,5,6,7 chunks too if you were to batch using a dynamic index buffer as I've suggested. Thankfully modern cards don't have this issue, but if you targetting older 2 texture cards, this is an issue. You can attempt to batch, and reduce overtransform by sorting the landscape in 2x2, or 4x4 chunks.

For example, Lets say we put the VB in the order 0,1,8,9, 3,4,A,B, 4,5,C,D, 6,7,E,F, G,H,O,P. We still cull to a single chunk resolution. Our dynamic IB will be one of 15 combinations (any 1, 2, 3, or all 4 chunks). We can then render this larger chunk. This potentially cuts draw calls by 4 (or by 16 if we go with 4x4 chunk blocks), and saves transforming too much extra vertices with software transform (in reality, I imagine your world is larger then 8 chunks, so the savings will really add up).

I suggest this: First go through changing your quadtree traversal to just gather info, sort it, then draw. If possible, eliminate the passes that contribute nothing. If the sorting doesn't give you the improvments you need, look into draw call reduction by rendering multiple chunks at once. On hardware transform this is just making a dynamic buffer out of visible chunks, and rendering in just a few calls. On software transform this includes optimizing your vertex layout, which unfortunately may mean less than ideal batching... but you should still batch somewhat. You could look into a dynamic VB and IB solution as an alternative to attempt to reduce vertex transform in software mode.

So, there are some things you can try. Yeah, it's work, but allowing an infinite variety of textures, doing 4 passes even when not required, and not sorting at all just isn't going to work. Cards with 2 texture units vary from 4MB to 32MB, so hundreds of megs of textures are also going to be an issue. The 16 and 32MB cards likely support DXT, but some of the 4, 8, and possibly some 16MB cards may not have support for DXT, compounding the problem even further. In addition to the other benefits of sorting, texture thrashing can be kept to a minimum with sorting. Uploading many large textures repeatedly per frame on an aging AGP2X, or 4X card is going to take a LONG time.

Namethatnobodyelsetook

1,260

April 03, 2005 09:37 PM

^^ That's me again.

from standard code to PS -- need advice

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

from standard code to PS -- need advice

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines