Archived

This topic is now archived and is closed to further replies.

Vendayan

D3DCREATE_HARDWARE_VERTEXPROCESSING

Recommended Posts

Vendayan    278
can anyone shed some light on this flag for me a bit? The SDK documentation just says that specifying this flag will increase performance, but it doesn't say at what costs, or how to effectively use it or even detect if it's usable by the video card. I suppose the biggest questions I have are... 1) In what areas should it increase the performance of my app if and when its supported by the hardware, and how does it actually increase the performance? 2) How do can I tell by the device caps if this flag is supported by hardware? 3) How will this effect my code? Will I have to do more coding later to actually make use of hardware vertex processing or is it just one of those 'flip this switch on and DX handles the rest' kinda things? Thanks in advance- ~Vendayan

Share this post


Link to post
Share on other sites
IndirectX    122
This flag basically requests that your vertices are transformed and lit by the GPU as opposed to CPU. It should increase performance of pretty much everything related to D3D.

From what I understand in the SDK samples, D3DDEVCAPS_HWTRANSFORMANDLIGHT cap specifies if hardware T&L is available. Use IDirect3D8::GetDeviceCaps to retrieve the caps.

You don''t need any code to use hardware T&L if it''s available.

Share this post


Link to post
Share on other sites
stefandxm    122
Yes you do!

You need to make proper VertexBuffer code!
Ofcourse thats how most people does it but still you dont HAVE to

And also you have to care about what flags youre giving to the vertexbuffers when updating dynamical shit and what flags you give it at all ..

Hums Also Not always does the HARDWARE_blablabala flag works but however the MIXED_VEr.... balbalbala works


On my Geforce2GTS with nVidia detonator (latest released one) i get only mixed supported in windowed and hardware only in fullscreen but then again nVidia driver people must have alot of cheap booze :D

Well anyways theres lots of stuff to think of.. Since theres not many card manufactors around that support hw t&l nowadays you should really check their pages (ati.com , nvidia.com) etc

+ Dont forget that never use more than 8 lights ! Or you will drop down to software processing on nvidia cards

Share this post


Link to post
Share on other sites
Vendayan    278
so can anyone show me a couple of short code samples? Showing how to detect it''s capabilities and how to do the vertex buffer stuff?

~Vendayan

Share this post


Link to post
Share on other sites
IndirectX    122
quote:
Original post by stefandxm
Yes you do!

You need to make proper VertexBuffer code!


What exactly do you mean by that?

To see a detection sample, open d3dapp.cpp in the DSK samples and search for D3DDEVCAPS_HWTRANSFORMANDLIGHT. The lines around that are what you need.

Share this post


Link to post
Share on other sites
S1CA    1418
1)
a. As mentioned above, HARDWARE_VERTEX_PROCESSING shifts the vertex processing work (transformation & lighting) from the CPU onto the GPU **if the hardware supports it** .

b. A first generation hardware T&L card like a GeForce256/GeForce2 *ONLY* accelerates the fixed function vertex processing when HARDWARE_VERTEX_PROCESSING is enabled. Any attempt to use shaders on that type of card with that type of processing *WON''T* work.

c. Current generation hardware T&L cards/chips support *both* fixed function and shader vertex processing with HARDWARE_VERTEX_PROCESSING.

d. Non-T&L cards such as the Kyro series and older cards won''t do any vertex processing in hardware. For those you *MUST* use SOFTWARE_VERTEX_PROCESSING.

e. For first generation cards where you need to use vertex shaders, you should use either only SOFTWARE_VERTEX_PROCESSING or use MIXED_VERTEX_PROCESSING and switch the processing mode when necessary using the D3DRS_SOFTWAREVERTEXPROCESSING renderstate.


2) Some code from our engine - I''ve stripped it down and removed a lot of stuff to simplify it:


typedef struct
{
DWORD dwCreationFlags;
DWORD dwFixedVPUsage;
DWORD dwShaderVPUsage;
}
ENGINEDEVICEINFO;


ENGINEDEVICEINFO CheckDeviceCaps( UINT uiAdapter )
{
ENGINEDEVICEINFO info;

D3DCAPS8 caps;
if (FAILED(m_pD3D->GetDeviceCaps( uiAdapter, D3DDEVTYPE_HAL, &caps )))
{
// error handling
}


// work out how much vertex processing can be done in hardware
if (caps.DevCaps & D3DDEVCAPS_HWTRANSFORMANDLIGHT)
{
// if we get to here, the chip can do some form of T&L

// test to see which version vertex shader is supported in hardware
// you''d set whatever version you need below - our app needs 1.1.
if ( caps.VertexShaderVersion < D3DVS_VERSION(1,1) )
{
// if we get here it''s a 1st gen T&L chip which can''t do programmable shaders
info.dwCreationFlags = D3DCREATE_MIXED_VERTEXPROCESSING;
info.dwFixedVPUsage = 0; // [0==hardware]
info.dwShaderVPUsage = D3DUSAGE_SOFTWAREPROCESSING;
}
else
{
// if we get here, chip is shader capable AND T&L capable
info.dwCreationFlags = D3DCREATE_HARDWARE_VERTEXPROCESSING;
info.dwFixedVPUsage = 0; // [0==hardware]
info.dwShaderVPUsage = 0; // [0==hardware]

if (caps.DevCaps & D3DDEVCAPS_PUREDEVICE )
{
// in the engine this snippet is from, we don''t need to read any
// state back from the device so we set the pure device flag to
// get a *potential* performance increase on some cards/drivers
info.dwCreationFlags |= D3DCREATE_PUREDEVICE;
}
}
}
else
{
// if we get here, the card is either a new one without T&L (e.g. Kyro)
// or an older pre-T&L card. So all processing is in software.
info.dwCreationFlags = D3DCREATE_SOFTWARE_VERTEXPROCESSING;
info.dwFixedVPUsage = D3DUSAGE_SOFTWAREPROCESSING;
info.dwShaderVPUsage = D3DUSAGE_SOFTWAREPROCESSING;
}
...
return info;
}



3) The above code snippet sets up 3 DWORDs in an info structure (the real code sets many others which let us decide things like which screen mode to start in etc). Those DWORDs are used as follows:

a. dwCreationFlags are what gets passed as the BehaviourFlags parameter to IDirect3D8::CreateDevice

b. dwFixedVPUsage is what gets passed as the Usage parameter for any IDirect3DDevice8::CreateVertexBuffer or IDirect3DDevice8::CreateIndexBuffer calls which are to be used with FIXED FUNCTION (FVF) vertex processing.

c. dwShaderVPUsage is what gets passed as the Usage parameter for IDirect3DDevice8::CreateVertexBuffer or IDirect3DDevice8::CreateIndexBuffer calls which are to be used with VERTEX SHADER/programmable vertex processing.


There are other subtle things to think about with T&L versus software (buffer sizes, buffer locking flags etc), but the above is the gist of how to use it properly

--
Simon O''Connor
Creative Asylum Ltd
www.creative-asylum.com

Share this post


Link to post
Share on other sites
reaptide    226
Hey that''s some pretty good stuff S1CA. It cleared up a bunch of issues I was having, thanks a lot.

I have a question in relation to this thread. Is it always better to use Hardware T&L when it is available? I have a GeForce 2 MX 200 and I''m thinking that my 1.5ghz processor may be able to process vertices & lights at a faster rate than the GPU even with sound and AI included.

So what''s is everyone''s opinion on the viability of HW T&L in every situation?

------------------------------------------------------------

Stupid divide by zero error. I should be allowed to tear a hole in the fabric of time and space if I want!

Share this post


Link to post
Share on other sites
Evil Bill    126
If you can do hardware TnL, do it. The CPU may be able to do it faster then the GPU, but firstly its a hassle to determine which is faster, and secondly if its done on the GPU, it can be done in parallel.

2p
Steve

Share this post


Link to post
Share on other sites
S1CA    1418
As Steve said, if there''s a T&L processor there, then you may as well use it, it''d be a waste of the user''s money if they bought a fancy gfx card which never got used.

However, if the CPU is sat idle most of the time (usually it isn''t!) - then that''s a waste too. Depending on your app and depending on how much time you''re willing to spend on scalability, it may be worth trying to achieve some sort of balance between the CPU and GPU.

Most apps aren''t taxing modern GPUs in terms of vertex procesing at all according to the chip manufacturers. There''s always more you can be shoving to the GPU - the shoving and the ''more'' take CPU time. Some ideas for a machine which had some spare CPU time where you also wanted to maximise GPU use:

1. Procedural modeling and texturing - dynamically generate more polygons and more texture detail for super-rich scenes. Spread the work over a few frames if necessary.

2. Do skinning into world/object space on the CPU at a low frequency in such a way that its output is two morph-target frames of animation which can be transformed and lit by the GPU.

3. Mix lighting between hardware and software - using multiple vertex streams, have one static stream for what gets hardware lit and a dynamic stream of vertex colours from CPU calculations.


Doing more interesting stuff with sound and AI would be appealing too... I''d imagine some softsynth stuff and DSP effects would burn loads of CPU cycles for you.

--
Simon O''Connor
Creative Asylum Ltd
www.creative-asylum.com

Share this post


Link to post
Share on other sites