Sign in to follow this  
Norman Barrows

index buffers, D3DPOOL_DEFAULT, and lost device

Recommended Posts

Norman Barrows    7179

i was just reading up on index buffers and came across this in the directx docs under the topic "Index Buffers (Direct3D 9)":

 

 

"Note Always use D3DPOOL_DEFAULT, except when you don't want to use video memory or use large amounts of page-locked RAM when the driver is putting vertex or index buffers into AGP memory." 

 

 

I recently implemented handling lost devices, and switched everything possible from default to managed memory, with the understanding that there was no real cost penalty.

 

but now i find this, with, of course, no explanation why you should always use default.

 

they do mention hardware index caching as giving performance boosts to indexed drawing.

 

anybody have any idea whats going on here?

 

 

 

Share this post


Link to post
Share on other sites
mhagain    13430

From the corresponding article on vertex buffers we get the following:

 

It is possible to force vertex and index buffers into system memory by specifying D3DPOOL_SYSTEMMEM, even when the vertex processing is done in hardware. This is a way to avoid overly large amounts of page-locked memory when a driver is putting these buffers into AGP memory.

 

This indicates to me that the distinction is only between buffers created in D3DPOOL_DEFAULT versus D3DPOOL_SYSTEMMEM.

 

Using D3DPOOL_MANAGED should be seen as the equivalent of actually creating two copies of the resource - one in the default pool, the other in the system memory pool.  The D3D runtime then looks after everything else for you.  This is discussed some here: http://legalizeadulthood.wordpress.com/2009/10/12/direct3d-programming-tip-9-use-the-managed-resource-pool/

Share this post


Link to post
Share on other sites
Norman Barrows    7179

Using D3DPOOL_MANAGED should be seen as the equivalent of actually creating two copies of the resource - one in the default pool, the other in the system memory pool.

 

yes, that was my understanding.

 

perhaps they recommend default as it eliminates the overhead of d3dpool_managed ?

 

i wish they weren't so cryptic about everything.

 

sometimes i think they don't really know themselves. 

 

they bought the code from Brender. and a million fingers must have touched it by now.

 

there may not be anyone at MS who knows everything about the system and the best way to use it.

Share this post


Link to post
Share on other sites
mhagain    13430

The original code that Microsoft bought bears no relationship to current versions.  D3D3 didn't have vertex or index buffers (it didn't even have what we'd recognise as a "Draw" call) so you can't really make comparisons.

 

The problem with D3D9 was that it straddled multiple hardware generations.  At the time it was originally released we were completing the transition from software T&L to hardware T&L and beginning the transition from the fixed pipeline to shaders, and D3D9 had to support them all (it also supports 2 major shader model generations).  So it's inevitable that it suffers from compromises and complexities that a clean API fully targetted at a specific hardware generation wouldn't, and a lot of the programming advice for it must be read in the context of 2002/2003 hardware (you see similar with OpenGL, where much advice you can find is quite firmly rooted in 1997/1998).  Fast-forward a few years and much of it is no longer relevant; hardware doesn't really work the way D3D9 was designed any more, so referring to older documentation just serves to confuse.

 

In this case the best thing is to set up your index buffer in D3DPOOL_DEFAULT, give it a good intensive benchmarking, set it up in D3DPOOL_MANAGED, run the same benchmark, then make a decision.  If the facts you discover through this process contradict advice in the documentation, then remember that the documentation is ancient and was written around a completely different generattion of hardware.

Share this post


Link to post
Share on other sites
VladR    722

Using D3DPOOL_MANAGED should be seen as the equivalent of actually creating two copies of the resource - one in the default pool, the other in the system memory pool.

 

yes, that was my understanding.

 

perhaps they recommend default as it eliminates the overhead of d3dpool_managed ?

Of course they recommend D3DPOOL_DEFAULT. It enables driver engineers to implement few important optimizations.

 

 

Using D3DPOOL_MANAGED should be seen as the equivalent of actually creating two copies of the resource - one in the default pool, the other in the system memory pool.

 

i wish they weren't so cryptic about everything.

 

sometimes i think they don't really know themselves. 

Wrong. They know it very well, trust me on this smile.png

You need to read between the lines. Get all performance-related papers/presentations from GDC and other events, mainly from nVidia's developer portal. There is a lot of info that you will not find anywhere else. Open 5-10 of them and enjoy smile.png  

Share this post


Link to post
Share on other sites
Norman Barrows    7179

In this case the best thing is to set up your index buffer in D3DPOOL_DEFAULT, give it a good intensive benchmarking, set it up in D3DPOOL_MANAGED, run the same benchmark

 

indeed.      science doesn't lie.      and neither do timers.

Share this post


Link to post
Share on other sites
Hodgman    51237
For real science though, you'd need all the GPUs from the past 15 years that you want to support, as they're what's going to determine a lot of the behavior...

e.g. Some GPUs might have 2 memory controllers, one for reading system RAM over tha AGP bus, and one for reading local VRAM. Counter-intuitively, on such a GPU it can be faster to keep a small amount of data in system RAM in order to utilize both controllers for parallel fetching. This could mean keeping vertices in VRAM and indices in system RAM.
Of course I wouldn't recommend this any more - just an example of how much of the APIs performance characteristics actually depend on the GPU/driver...

Share this post


Link to post
Share on other sites
Norman Barrows    7179

For real science though, you'd need all the GPUs from the past 15 years that you want to support, as they're what's going to determine a lot of the behavior...

e.g. Some GPUs might have 2 memory controllers, one for reading system RAM over tha AGP bus, and one for reading local VRAM. Counter-intuitively, on such a GPU it can be faster to keep a small amount of data in system RAM in order to utilize both controllers for parallel fetching. This could mean keeping vertices in VRAM and indices in system RAM.
Of course I wouldn't recommend this any more - just an example of how much of the APIs performance characteristics actually depend on the GPU/driver...

 

 

yes, see this is the thing. its such a shifting target (the users graphics capabilities). and god only knows what kind of pc they might want to run it on.  in order to avoid the whole protracted mess, i've been trying to develop to the lowest common denominator, which seems to be directx 9 fixed function. 

 

i remember what it was like to be an impoverished college student with a hand-me-down pc with the previous generation of graphics on it. 

 

there's no reason to leave those dollars on the table (make a game they won't buy because it won't run well/at all on their PC), as long as i can get the desired results (from dx9 fixed function).

 

perhaps the more fundamental question to ask is how far back in terms of windows versions and directx versions should i attempt to support? IE what's so old that i shouldn't be worrying about it?

 

i take it that supporting dx9 only PCs is something i should not be worrying about?

 

right now the system requirements for the directx and windows stuff i'm using is windows 2000 and directx 9.

 

how difficult would it be to convert a fixed function dx9 app to shaders and dx10 or dx11 ?

 

FVF goes away but i'm just calling mesh:getfvf and device:setfvf in one place (maybe two).

 

is there shader code available that replicates basic fixed function capabilities? all i'm doing is aniso mip-mapping, some alphatest, and a little alpha blending. no real messing with texture stages or anything like that. when i was working on the never released caveman v2.0 back in 2008-2010, i was unimpressed with the blend ops available and wrote my own 10 channel real time weighted texture blender. if i had to do blend ops again, i'd probably do it myself and then send the results to directx. i almost prefer doing things that way. its more like good old fashioned "gimme a pointer and lemme party on the bitmap!" programming. nowadays, you throw your own private parties on your own private bitmaps, then send them off to directx for display. or use shaders and party on directx's bitmap (so to speak).

Share this post


Link to post
Share on other sites
mhagain    13430

For a reasonable baseline, consider the following.

 

The ATI R300 was introduced in 2002.

The GeForce FX was introduced in 2003.

The Intel GMA 900 was introduced in 2004.

 

All of these are D3D9 parts and all support shader model 2; therefore in order to find any common consumer hardware that doesn't support shaders at this level you need to go back to almost over a decade ago.

 

Now look at the latest Steam Hardware Survey: http://store.steampowered.com/hwsurvey - in this case, a total of 99.53% of the machines surveyed have SM2 or higher capable GPUs, and almost 98% have SM3 or better.

 

Concerning the "roll your own texture blending on the CPU" approach, this is a BAD idea for a number of reasons.  The major reason is that it's more-or-less guaranteed to introduce many CPU/GPU synchronization points per frame, which is the number 1 cause of performance loss.  A secondary reason is that the CPU will never be as fast as the GPU for this kind of operation.  A third reason is that you're introducing all manner of latency and bandwidth considerations to your program.

 

The thing is - these are exactly the kind of problems that shaders solve for you.  You get to have any kind of arbitrary blend you wish without having to deal with synchronization or latency problems, and you get massive parallelism completely for free.  The end result is simpler, faster and more robust code that runs well on the stupefyingly overwhelming majority of hardware.

 

The very fact that you've introduced this, and coupled with some of your previous posts, leads me to suspect that you're the type of person who doesn't trust the hardware, that you have a preference for doing things yourself in software even if it comes at the expense of your own code quality or performance.  That's an attitude you need to lose, to be honest.

Share this post


Link to post
Share on other sites
Norman Barrows    7179

The very fact that you've introduced this, and coupled with some of your previous posts, leads me to suspect that you're the type of person who doesn't trust the hardware, that you have a preference for doing things yourself in software even if it comes at the expense of your own code quality or performance.  That's an attitude you need to lose, to be honest.

 

no, i'm  just lazy.  <g>.

 

a lazy perfectionist, thats me.

 

so i want results which will probably require shaders, but i don't want to write shaders unless necessary. thus my question about the availability of "plug and play" boilerplate shader code.

 

based on the performance i'm getting now, and what i want to get, its starting to look like shaders will be in my near future.

 

based on that, the system requirements should change from DX9 fixed function to whatever the typical requirements are for a title coming out in the near future.

 

this means i'll be able to take advantage of newer hardware.

 

the real time texture blender i wrote was an act of desperation. when i simply can not get the required effect from the existing libraries, i'm forced to write my own low level stuff.

 

i've only had to do it 4 times in 32 years of writing PC games:

1. realtime zoom, scale, mirror, & rotate blitter in assembly for a blitter engine about the time of Wing Commander II.

2. p-correct t-mapped poly engine when MS bought rend386 and before they re-released it as directx 1.0.

3. 50 channel real time wave mixer, around the time when S.0.S., MILES Audio,  and Diamondware SDK were the preferred audio solutions and typically only supported 8 channels with no "stepping".

4. and the texture blender (directx 8 era).

 

while i can do low level stuff, i prefer building games.  to me, low level stuff is a necessary evil.

 

[famous gamedev quote of mine coming up...]

 

but "Sometimes you gotta break a few eggs to make a REAL mayonnaise".

Share this post


Link to post
Share on other sites
mhagain    13430

Roundabout now I should mention how much I despise mayonnaise. :)

 

Anyway, typical requirements for a game coming out maybe 5 or 6 years ago would have been D3D9/programmable pipeline.  I'm going to confidently predict that based on what you've just said, once you get over the learning curve you'll love it.

 

Anyway, the following seems a reasonably simple introduction: http://www.two-kings.de/tutorials/dxgraphics/dxgraphics18.html - it does a few things slightly weird (specifically, the GetTransform stuff) but other than that it should be enough to get you up and running with a basic program.

Share this post


Link to post
Share on other sites
Hodgman    51237
Very roughly:
You can rank a GPU's compatibility/power level by the Shader Models that it supports - these are the 'Asm' instruction sets that your HLSL shader code will be compiled into.

FF was ok til 03.
Then Dx9 SM2 popped up - this is Unity's minimum requirement.
In around 04-06, Dx9 SM3 took off - this is PlayStation3/Xbox360 level hardware (old).
Then in 07-09, DX10 SM4 appeared.
Then in the past 2 to 4 years, Dx11 SM5 has been starting to take over, but it's still new ground.

If you want to keep support for older versions of windows, then you'll have to use Dx9, but you can choose which hardware era with SM2 or SM3 shaders (sm2 has more limitations, but buys you a few more years of compatibility).

If you're ok with ditching WinXP support, then you can go straight to Dx11. It has "feature levels", which allows it to run on earlier hardware (not just on SM5-era hardware). You can choose between SM2, SM4 or SM5 (they don't support PS3-era SM3 for some strange reason...).
Most modern games at the moment probably use Dx11 with SM4 for hardware compatibility (and a few SM5 optional code paths for the latest eye candy).

Share this post


Link to post
Share on other sites
mhagain    13430

It's worth adding here that GPU capabilities tend to evolve hand-in-hand (despite the impression that D3D caps or OpenGL extensions may give you), so if you've got certain other non-shader capabilities which your code is dependent on, then you've already got a requirement to have hardware that supports shaders anyway.  For example, you mentioned a 10-texture blend earlier - if you're blending 10 textures on the GPU then you're already into SM3-class hardware territory.  That's not all.  Are you, for example, using any non-power-of-two textures?  Or textures sized 2048x2048 or more?  These will all raise your minimum hardware requirements to something that's also going to support shaders anyway.

 

The point is that even if you're avoiding shaders in order to support prehistoric hardware, you may well have already committed your hardware requirements to something more modern elsewhere - shaders aren't the only feature of more modern hardware and it's quite easy to trip over that line and thereby invalidate your reasons for avoiding shaders.

Edited by mhagain

Share this post


Link to post
Share on other sites
Norman Barrows    7179

Very roughly:
You can rank a GPU's compatibility/power level by the Shader Models that it supports - these are the 'Asm' instruction sets that your HLSL shader code will be compiled into.

FF was ok til 03.
Then Dx9 SM2 popped up - this is Unity's minimum requirement.
In around 04-06, Dx9 SM3 took off - this is PlayStation3/Xbox360 level hardware (old).
Then in 07-09, DX10 SM4 appeared.
Then in the past 2 to 4 years, Dx11 SM5 has been starting to take over, but it's still new ground.

If you want to keep support for older versions of windows, then you'll have to use Dx9, but you can choose which hardware era with SM2 or SM3 shaders (sm2 has more limitations, but buys you a few more years of compatibility).

If you're ok with ditching WinXP support, then you can go straight to Dx11. It has "feature levels", which allows it to run on earlier hardware (not just on SM5-era hardware). You can choose between SM2, SM4 or SM5 (they don't support PS3-era SM3 for some strange reason...).
Most modern games at the moment probably use Dx11 with SM4 for hardware compatibility (and a few SM5 optional code paths for the latest eye candy).

 

 

this is EXACTLY the type of info i need.

 

looks like i should be at dx11 sm4-5. even if i don't need all the capabilities, it'll be easier to port to DX12 when the time comes.

Share this post


Link to post
Share on other sites
Norman Barrows    7179

Anyway, the following seems a reasonably simple introduction: http://www.two-kings.de/tutorials/dxgraphics/dxgraphics18.html

 

i take it that in dx11, shader code is required for both the vertex and pixel stages? and that code similar to that on two-kings (mat mul, and text lookup) would be the basics to get me started? and then i have to add gouraud and phong and mips to get the rest of the standard fixed function pipeline? the baseline capabilities i'm looking for are aniso, mipmaps, and T&L.  the special capabilities i need beyond that are alphatest, and alpha blend (for now)  the shader code itself seems very straightforward. having written my own poly engine once probably helps (anyone remember sutherland-hodgeman clipping algo? <g>).  i hope i don't get addicted to writing shader code!

 

ok, here i go already....

 

with the HLSL instruction set, would it be possible to implement real time raytracing?

 

i'm thinking probably not, unless you used it as a big mathco. like all that non-graphics GPU stuff you hear about. its more of  specialized processor hardware stage in a poly engine.

 

too bad they don't make cards that accelerate ray tracing. then again, they wouldn't be much different to program i'd imagine. vertex stuff would get replaced with ray stuff, pixel stuff would be analogous to color calculations at each ray/surface collision.

Share this post


Link to post
Share on other sites
Norman Barrows    7179

For example, you mentioned a 10-texture blend earlier - if you're blending 10 textures on the GPU then you're already into SM3-class hardware territory.  That's not all.  Are you, for example, using any non-power-of-two textures?  Or textures sized 2048x2048 or more?  These will all raise your minimum hardware requirements to something that's also going to support shaders anyway.

 

the texture blender was for the previous version of caveman. it did its work in ram with the cpu, then mempy'd the result to a dynamic texture in i guess it was mempool_default.

 

all textures in the titles i'm working on now are 256x256. i experimented with sizes up to 4096x4096, but was able to get decent results with 256x256. i spent a lot of time playing around with quad sizes, # of times the  texture is repeated across the quad, seamless textures, real world size of the image on the texture, etc, trying to get textures at the correct image scale, seamless, with little or no moire patterns, low pixelization, and 256x256 textures for speed. some of the stuff turned out really nice. no bump maps or anything fancy. it appears that a high quality texture can make all the difference.

Share this post


Link to post
Share on other sites
Norman Barrows    7179

The point is that even if you're avoiding shaders in order to support prehistoric hardware, you may well have already committed your hardware requirements to something more modern elsewhere - shaders aren't the only feature of more modern hardware and it's quite easy to trip over that line and thereby invalidate your reasons for avoiding shaders.

 

no worries there. the most radical thing i did recently was i implemented QueryPerformanceCounter. its nice having a real high rez timer again, like back when you used to reprogram the timer chip as part of standard operating procedure for a game. graphics wise, its all directx8 compatible code basically. other than wanting to draw more stuff (its always "more stuff!" with games), i'm only using dx8 capabilities. 

 

so if i add  to / switch my gamedev graphics library to shaders, i can write a vertex shader for basic transforms, and 3 pixels shaders: regular, alphatest, and alphablend, and thats it? i'm done?

 

that will speedup the transform and texture stages, but i'm still sending 500 batches of 20 triangles. 

 

i've done a bit of basic testing and it appears i'm cpu bound due to the large batch numbers and small batch sizes. 

 

right now my approach to drawing most scenes is to assemble the scene from basic parts like ground quad, rock meshes 1 & 2, and plant meshes 1-4, then texturing, scaling, rotating, translating, and height mapping them, one quad, rock, and plant at at a time.

 

i take it that two alternate approaches used are:

 

1. chunks: bigger meshes with entire sections of a level

2. dynamic buffers where the possibly visible mesh(es) is/are assembled on the fly

 

is it just me, or is it weird that what games want to do (draw lots of small meshes) is just what vidcards suck at?

 

or did they evolve with a specific type of game and way of doing graphics in mind?   or was it another case of non-gamedevs doing what they thought might help, and be a way to make some $ at the game of making games?

 

overall, i'm looking for general solutions for basic graphics capabilities. stuff where i can build it, plop into the gamelib, and forget about it. and get back to building games, not components and modules.

 

but it does look like the time has come when i need to move on to a new way of doing things, if i want to have the level of scene complexity i want and probably need to be competitive in todays market.

 

i only sell in low/no competition markets. when you're the best or only one out there, you can get away with less than bleeding edge graphics.  but things like applying a normal lighting equation and some simple scaled mipmap with CORRECT alpha test wouldn't be that big a deal. pretty much all of that i've done before or something similar.

 

 

so i guess i'd be looking for a generalized shader based approach for drawing indoor and outdoor scenes for games like shooters, fps/rpgs, and ground, air, and water vehicle sims. 

 

at the GPU end, you want to set a texture, and draw a batch of all the triangles that use that texture and are at least partially in the frustum, then do the next texture, and so on, touching each texture exactly once.   thats what the card likes the most, right?

 

the question is, what should the data look like on the game end for proper "care and feeding" of the GPU in such a manner.    or if its even 100% possible or practical to do so.

 

i'm doing all this with drawing randomly generated levels and environments in mind. so pre-processed and hard coded data are sort of out of the question.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this