Sign in to follow this  

new to HLSL - A few questions

This topic is 2850 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Been pouring over DirectX SDK examples and online tutorials. Looks cool, doable, but scary at the same time. Here are my questions. 1) If you have more then one effect you want to apply, would you write separate fx files and load them up as separate effects and run them one after the other or would you combine all your effects into one fx file and run it as one? 2) Is it faster to use shaders to handling lighting instead of the fixed-function lighting built into DirectX? The one benefit I see is the ability to define all the lights I need but then again I don't know what the speed hit would be compared to fixed-function. 3) It seems like shaders are being used for more then just visual effects these days. From calculating shadow volumes to skeletal animation, shaders are doing a lot these days. Does it make sense to offload all these tasks onto the GPU?

Share this post


Link to post
Share on other sites
1) .fx files support includes so you can keep it ordered even if you have multiple shaders in one file. I use to have 10/20 shaders per .fx.

2) I never measured the difference. Anyways, shaders is the future.

3) It makes total sense. Latest GPUs have over 20x the floating point performance of a CPU. It is achieved using a highly parallel pipeline. But this highly parallel architecture has downsides : it uses SIMD (Single Instruction Multiple Data) cores, making it very fast for stream processing but worse at branching.

http://www.coniserver.net/wiki/index.php/Simple_HLSL_Shader_Tutorial

Share this post


Link to post
Share on other sites
To expand on 2 and 3:

2: Any modern GPU (sold in the past 5 years) emulates the fixed function pipeline by using shaders anyway. That said, shader performance depends on the complexity of the program, and you can create very complex shaders as compared to the features of the fixed pipeline. Then again, simple shaders are faster.

3: It makes sense if your bottleneck is the CPU. A GPU usually has greater raw parallel performance so it makes sense to offload highly repetitive and parallelizable tasks (such as rasterizing and geometry setup) to it by default. A good counterexample might be vertex shaders, that could run faster on CPU on isolation - however, in practice the CPU usually has other things to do as well, that are more suited to it.

Share this post


Link to post
Share on other sites
Quote:
Original post by OctavianTheFirst
3) It makes total sense. Latest GPUs have over 20x the floating point performance of a CPU. It is achieved using a highly parallel pipeline. But this highly parallel architecture has downsides : it uses SIMD (Single Instruction Multiple Data) cores, making it very fast for stream processing but worse at branching.


It's not just the floating-point performance. Ideally you'd want your game to make use of every drop of power available on both the CPU and GPU, but in practice that's not really feasible on the PC since you have too much overhead from communicating with the GPU. Either you suffer overhead from driver transitions, or from transfering data to GPU memory, or from needing to sync the two processors. Thus it's usually ideal to just do as much as you can on the GPU...in the PC space, anyway. Consoles is a different story.

Share this post


Link to post
Share on other sites
More on #3:

If you do skeletal animation on the CPU, you've got to send the new data (i.e. transformed vertex positions) down to the GPU every time it animates. If you do it in your vertex shader, only the animation parameters are sent, and the vertex positions are calculated locally.
By offloading this work to the GPU, you're cutting down on the amount of data sent over the PCI bus ;)

Share this post


Link to post
Share on other sites
Quote:
Original post by Hodgman
More on #3:

If you do skeletal animation on the CPU, you've got to send the new data (i.e. transformed vertex positions) down to the GPU every time it animates. If you do it in your vertex shader, only the animation parameters are sent, and the vertex positions are calculated locally.
By offloading this work to the GPU, you're cutting down on the amount of data sent over the PCI bus ;)


Yea, but it only helps if the bus is a bottleneck :D

Share this post


Link to post
Share on other sites
Quote:
Original post by Hodgman
More on #3:

If you do skeletal animation on the CPU, you've got to send the new data (i.e. transformed vertex positions) down to the GPU every time it animates. If you do it in your vertex shader, only the animation parameters are sent, and the vertex positions are calculated locally.
By offloading this work to the GPU, you're cutting down on the amount of data sent over the PCI bus ;)

Thank you so much guys. This has really helped. This is really an exciting time in game development.

One thing I'm confused about. Below is how I send my primitives to DirectX.

void CXMesh::Render()
{
for( int i = 0; i < faceLstCount; i++ )
{
// Select this material if not already selected
CMaterialMgr::Instance().SelectTexture( pXVertBuf[i].pMaterial );

// Set the material property
CXWindow::Instance().GetXDevice()->SetMaterial( &pXVertBuf[i].materialProp );

CXWindow::Instance().GetXDevice()->SetStreamSource( 0, pXVertBuf[i].xVertBuf, 0, sizeof(CVertex) );
CXWindow::Instance().GetXDevice()->SetFVF( D3DFVF_XYZ|D3DFVF_NORMAL|D3DFVF_TEX1 );
CXWindow::Instance().GetXDevice()->DrawPrimitive( D3DPT_TRIANGLELIST, 0, pXVertBuf[i].fcount );
}

} // Render


I need to do this every frame even though the geometry hasn't changed. Your comment suggests my geometry is somehow stored on the video card. How is that the case? That would really be helpful not to have to send static geometry over the buss all the time. How would you do that?

Share this post


Link to post
Share on other sites
The proper way to upload a mesh to the GPU is to use vertex buffers (and index buffers) that store a lot of triangles. If you use the managed or default pool, the data resides on the GPU memory and doesn't need to be uploaded again and again. Note that you still need to call SetStreamSource and SetIndices anyway when changing the geometry source.

Your code seems to draw one triangle at a time (or what does faceLstCount denote), which is very inefficient. The draw calls itself are expensive, so you should draw as much geometry as possible with as few device calls as possible.

Share this post


Link to post
Share on other sites
Quote:
Original post by Nik02
Your code seems to draw one triangle at a time (or what does faceLstCount denote)...
faceLstCount is the number of face lists. The lists are grouped by texture. So a mesh of 1000 faces but share one texture, the loop will iterate once.

Share this post


Link to post
Share on other sites
Okay, things aren't as bad as I thought :)

Anyway, default and managed pool buffers go to GPU memory if applicable. Unless you update() or lock() them, they stay in there. Managed resources might be paged out of GPU memory if the space is scarce, but this is transparent to the app (except for performance impact).

Share this post


Link to post
Share on other sites
Quote:
Original post by Nik02
Anyway, default and managed pool buffers go to GPU memory if applicable. Unless you update() or lock() them, they stay in there. Managed resources might be paged out of GPU memory if the space is scarce, but this is transparent to the app (except for performance impact).
My curiosity id definitely peaked. I'm trying to find a simple example of keeping the geometry loaded on the video card via shaders.

Share this post


Link to post
Share on other sites
Shaders have nothing to do with the location of the resources (directly).

In order for the shaders to load any resources at all, said resources must be in GPU-accessible memory which usually means the GPU memory itself.

In D3D9, there is no system-wide paging scheme in case the GPU memory gets full, but there is a per-process resource manager available. Hence "managed" resources. Default pool resources always reside in GPU memory, but their creation will fail if there's not enough physical memory available.

The DXGI infrastructure in Vista and 7 does implement a virtual video memory manager, which works on system level to move memory pages from GPU to CPU and vice versa when needed, transparently to the apps. This infrastructure change is a huge reason why D3D10(+) cannot easily be implemented in XP.

The common thing between these resource managers is that they have a priority queue which determines what resources will be evicted from the GPU memory based on the last time (and frequency) of use during drawing ops. When an evicted resource is then paged back to the GPU memory, it is done in on-demand basis and the internal priority of that resource is updated if its usage pattern warrants it.

Share this post


Link to post
Share on other sites

This topic is 2850 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this