# DrawIndexedPrimitive & ATI driver

This topic is 2009 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

I have a strange problem with my DirectX9 engine and ATI drivers - and no idea how to track this further:

After I installed the latest ATI driver on my computer directX-DrawIndexedPrimitive calls starts to ommit some triangles while rendering. This behaviour starts exactly after 512 calls to DrawIndexedPrimitive. I have run under maximum debug settings, debug directx, nothing conspicious is reported. This only happens using the latest ATI drivers.

I'm using my own a 3D game engine which has already been used in a few commercial releases, DirectX 9, shader model 3.0. System is Windows 7, 64bit, a HP 8510p notebook with a ATI HD 2600 mobility GPU.

The code works fine with the Windows 7 stock drivers, and the original HP drivers from 2009. But not with the current ATI drivers. So far I said, well, forget the ATI drivers. However, yesterday I installed the indy game "starfarer", that does not even get the device up at all using the windows 7 or HP driver, but does so using the ATI one.

It would really suck if I had to say to my eventual customers "you might not be able to play your other games, but my game will work fine if you roll back your gpu driver.".

So has anybody had similar experiences? Are there known issues with the DrawIndexedPrimitive and ATI? Any ideas what to try?

- Matt

##### Share on other sites
Are you rendering on a second thread?
Are you doing redundant-state checking on your end?
Why not some screenshots of this omitting of triangles?

What does PIX say you are passing to the call? Is it not the correct number of triangles for the primitive type (triangle list/triangle strip)?
Are the correct vertex buffers bound?

It is unlikely an ATI driver bug. More likely they have made fixes which have exposed some error on your side, and you should be glad for this chance to find and fix it.

L. Spiro Edited by L. Spiro

##### Share on other sites
Hi L. Spiro,

[quote="L. Spiro"]Are you rendering on second thread?[/quote]

[quote="L. Spiro"]Are you doing redundant-state checking?[/quote]
Uhh... sounds pretty general. Anything specific I should be looking for?

[quote="L. Spiro"]Why not some screenshots of this omitting of triangles?[/quote]
Right, why not...
[attachment=9899:missing_triangles.jpg]

So, this is where the buffers are bound:
 VERIFY(Get3DDevice()->SetFVF( m_VertexFormat.GetVertexFormatID() )); VERIFY(Get3DDevice()->SetStreamSource(0, m_VertexBuffer.m_D3D9VertexBuffer, 0, m_VertexFormat.GetVertexSize())); VERIFY(Get3DDevice()->SetIndices(m_IndexBuffer.m_D3D9IndexBuffer)); 

... and this is the render call
 //actual rendering: assert(pEffect); if (pEffect) { unsigned int numPasses = 0; VERIFY(pEffect->Begin(&numPasses, 0)); for (unsigned int pass = 0; pass < numPasses; ++pass) { VERIFY(pEffect->BeginPass(pass)); HRESULT hr; hr = Get3DDevice()->DrawIndexedPrimitive(D3DPT_TRIANGLELIST, 0, 0, m_VertexBuffer.m_NumVertices, index0, numIndices/3); if (hr!=D3D_OK) { sys::Output(DXGetErrorString(hr)); sys::Output(DXGetErrorDescription(hr)); assert(false); } VERIFY(pEffect->EndPass()); } VERIFY(pEffect->End()); } 

Btw, the effect does not care which model I take. It happens the same with a tesselated sphere. It happens in-game and it happens in the modelviewer.

What does PIX say you are passing to the call? Is it not the correct number of triangles for the primitive type (triangle list/triangle strip)?[/quote]
148 <0x064A5970> IDirect3DDevice9::DrawIndexedPrimitive(D3DPT_TRIANGLELIST, 0, 0, 4688, 0, 2612) 66862329856

I can even hardcode the parameters in this call to these values, the missing triangles bug appears not before the 512th frame.
To be honest, I used PIX the first time now that you mentioned it.

Are the correct vertex buffers bound?[/quote]
Probably. When I launch the modelviewer, there is only one vertex buffer and one index buffer thats get set at all, so there is not much chance of mixing it up, I guess. From the pictures I would rather guess the index buffer to be the problem, but it 's no definite saying, because the polygons don't share vertices.

It is unlikely an ATI driver bug. More likely they have made fixes which have exposed some error on your side, and you should be glad for this chance to find and fix it.[/quote]
That's right. But I am really curious. This one would really be a long time lurker.

Thanks for helping,

Matt

Edit: the VERIFY doesn't do anything. It's only an empty macro in this configuration. Edited by grosvenor

##### Share on other sites
I'd suggest to try the debug runtimes - available from your DirectX control panel in the Start menu. You may have some bad parameters going in which affect things with one driver but which the other is more forgiving of (software vs hardware vertex processing can sometimes trigger this too) and the debug runtimes will identify this pretty quickly for you.

##### Share on other sites

Edit: the VERIFY doesn't do anything. It's only an empty macro in this configuration.

When you say 'empty macro' what do you mean precisely? as in 'show us'....

##### Share on other sites
Empty macro, as in
 #define VERIFY(x) (x) 

(but you're right to ask for a clarification, it could also have been like
 #define VERIFY(x) () 
)
@mhagain:
Using debug runtime, all validation cranked up to maximum. The only thing reported are warnings of redundant SetRenderState and SetTextureStageState calls.
So this is probably what L. Spiro meant by "Are you doing redundant-state checking on your end?".
The answer is no, I just set all states required every frame.

- Matt

##### Share on other sites
Wow, a -1. That is what I get for helping.

Uhh... sounds pretty general. Anything specific I should be looking for?

If depth testing is disabled, it does not make sense to disable depth testing again.
Frankly, while this is just an example, the general idea of not setting the same state twice is fairly important for the performance of any game. This applies to DirectX as much as OpenGL. In either case you need to do this. If the last texture-wrap mode was CLAMP, then don’t call a DirectX or OpenGL function to set the same state to the same value.

While this feature is necessary, I mention it because it can also cause problems if you are using multiple threads and do not have proper synchronization in place. If you are using iOS or Android, remember that OpenGL ES 2 allows you to share resources between contexts, not state. I am aware that you are not using OpenGL or OpenGL ES 2, but the same idea applies when managing DirectX resources. And threading issues are the only issues that I have had that cause the problems you describe.

Basically, as far as I am concerned, I have the buggiest graphics drivers in the world. My dual ATI Radeon HD 5870’s crossfired result in about 10 or 11 reboots daily, yet I am still able to develop games in DirectX 11. Recently I had faulty shader values that caused my graphics cards to crash, but that was my fault. I am now able to have very complex cbuffers working fine once I fixed my own problems.

Basically, GeForce cards will not crash when you create retarded shaders, but ATI will. However, creating valid shaders, of any complexity, on any ATI card, is valid and will not crash. You are doing something wrong on your end, and if your answers to my questions are all reasonable, then you need to post your shader.

L. Spiro

##### Share on other sites

Wow, a -1. That is what I get for helping.

For the record the -1 didn't come from someone posting in this thread; I voted it up to cancel it out.

##### Share on other sites
I am aware that there is a performance impact through redundant SetRenderStates and so on.
But nothing beats "good enough". So, as long as it doesn't hurt the driver I won't change it. If I am ever considering performance issues, I might look into this aspect again, but not now.

So here is a simplified shader, that shows the same behavior (tris disappearing after 512 frames):
 float4 PS_ConstantColor(VS_OUTPUT pixel) : COLOR { float4 col; col.rgba=Color; return col; } VS_OUTPUT VS_NoLight( float4 position : POSITION0, float3 normal : NORMAL0, float3 texCoord : TEXCOORD0 ) { // calculate the pos/normal using the "normal" weights // and accumulate the weights to calculate the last weight float3 skinPosition = float3(0.0f, 0.0f, 0.0f); float3 skinNormal = float3(0.0f, 0.0f, 0.0f); skinPosition = mul(float4(position.xyz,1.0), World); skinNormal= mul(float4(normal.xyz,0.0), World); skinNormal = normalize(skinNormal); // normalize normal float3 diffuse = Ambient; // transform position from world space into view and then projection space VS_OUTPUT pixel; pixel.Position = mul(float4(skinPosition.xyz, 1.0f), ViewProjection); pixel.Diffuse = float4(diffuse,1); pixel.TexCoord = texCoord; pixel.CamView = normalize(skinPosition - CameraPosition) ; pixel.Normal = skinNormal; return pixel; } technique cc_shader { pass p0 { VertexShader = compile vs_3_0 VS_NoLight(); PixelShader = compile ps_3_0 PS_ConstantColor(); } } 

Another bit of information: the bug does not appear with the reference rasterizer.

##### Share on other sites
1) I did another test: run Dragon Age:Origins. Result: same artifacts occurring as in my engine. Not so with the old HP driver.

2) AMD/ATI says that you should use a tool provided by them to determine if your specific notebook is compatible to the catalyst mobility driver. Unfortunately, this site is currently unreachable. Internet rumor says to stick to the notebook manufactures gpu drivers. These are from 2009, but both my game and DA:O run without problems.
Also, using the ATI driver I also experienced a minor, but unprecedended issue with power managment, which might be another clue for compatibility issues.

So, for now I am going to conclude that it is the driver after all. L. Spiro's reasoning to take on the opportunity to find flaws in the engine is a good one. But all considered, I would say that the chances are minimal that there is something insightful to find here. Unless somebody else has encoutered the same problem and found a solution to it.

So, thanks for your help. Cu,

- Matt

##### Share on other sites
Had a similar problem with ATI cards not drawing some indexed textured primitives. Seem to remember I had to explicitly specify buffer offsets or something to assist the card whereas this could be ignored for nvidia or intel cards.
Can probably chase up the change required if it seems like a similar problem.

##### Share on other sites
Sounds promising - it would be great if you could post this change.

##### Share on other sites
I found when drawing indexed primitives the minVertexIndex parameter cannot be left at zero with ATI cards for some reason. NVIDIA and INTEL seem to work fine however. When I left this at zero, the ATI cards were missing polygons all over the place.

So your call " ->DrawIndexedPrimitive(D3DPT_TRIANGLELIST, 0, 0, m_VertexBuffer.m_NumVertices, index0, numIndices/3); "

must change to a variation like

" ->DrawIndexedPrimitive(D3DPT_TRIANGLELIST, 0, m_MinIndexNumber, m_VertexBuffer.m_NumVertices, index0, numIndices/3); "

see the MSDN entry

"The minVertexIndex and numVertices parameters specify the range of vertex indices used for each call to DrawIndexedPrimitives. These vertex indices are used to optimize vertex processing of indexed primitives by processing a sequential range of vertices prior to indexing into them. Indices used during this call cannot reference any vertices outside this range.
http://msdn.microsoft.com/en-us/library/microsoft.xna.framework.graphics.graphicsdevice.drawindexedprimitives.aspx "

##### Share on other sites
I experimented with different start vertex and index indices before. If I remember correctly, it delayed the occurrence of the bug or shifted it to other triangles. But it wasn't a reliable behavior.

Nevertheless, I reinstalled the most recent ati driver and tried what you suggested. Added 100 vertices to the beginning for padding, changed the index buffer accordingly and then rendered with minVertex=100. Exactly the same behavior as before. How exactly did you do it?

If all else fails, one thing seems to do the job: using DrawPrimitive instead of DrawIndexedPrimitive.
Which should be available from the options menu through "Enable embarrasing ATI pampering" ;)

##### Share on other sites
And what card is this?
Are you padding vertex buffers to 32 bytes?

L. Spiro

##### Share on other sites
Card is a ATI HD2600 mobility.

The vertexbuffers are XYZ|NORMAL|UV, i.e. 32 bytes per vertex.

##### Share on other sites
Yeah - shouldn't have to pander to these cards,

My graphics structure may be a little difference to yours.

I ended up grouping the textured polygons in the index buffer array and then setting up another ordinary integer array which pointed to the first element of the indexed buffer array of each texture group for each primitive call. Kind of messy, but it worked 100%. But other issue were that some cards did not support 32 bit buffers (older Intel graphics) which was a pain for larger index groups.

Later on, I decided to get rid of indexed primitives all together, and then I just grouped the elements directly into a large vertexbuffer and then called the different parts with DRAWPRIMITIVES using offsets and lengths as required.

##### Share on other sites
It's cracy, there is this awesome hardware with hundreds of features you usually do not even use, and then one has to struggle with these things.

Using a single big index & vertex buffer is a actually a good move. I remember there was an NVIDIA talk, where they suggested doing this (DX9 was hot and new at the time). But then again, risking compatibility issues with 32-bit index buffers would make me a bit nervous. Yes, I can imagine it is a pain to arrange support for 16-bit-index-buffers in such a scenario.

Fortunately, in my case, using DRAWPRIMITIVE as a cure is not a big deal. It will probably use up a little more GPU RAM and be a little slower because vertex caching becomes useless.

##### Share on other sites
It's easy enough to support 16-bit with the "one huge buffer for everything" scheme. You just logically partition by groups of 64k vertexes, then call SetStreamSource with the appropriate offset into the buffer as required. Yeah, it's a few more SetStreamSource calls. I don't know about D3D9 in this regard, but with 10 and 11 it's the case that just changing the offset in this manner has less overhead than doing a full change of the buffer (and even then the overhead is low enough anyway so you'd need a quite extreme example for it to register on perf graphs).

Some older Intels will report that they don't support stream offset but they're (half) lying - what's happening is that they don't support it in hardware, but then again they don't support any of the per-vertex pipeline in hardware either. In practice if you CreateDevice with software vertex processing you're going to get SM3 capabilities in the per-vertex pipeline, which in turn means that you'll have stream offset too. True, it's software emulated, but it's no worse than the rest of software vertex processing.

(As a curious aside - I wonder has anyone ever tried to see if you'll also get instancing under these conditions).

##### Share on other sites

I found when drawing indexed primitives the minVertexIndex parameter cannot be left at zero with ATI cards for some reason. NVIDIA and INTEL seem to work fine however. When I left this at zero, the ATI cards were missing polygons all over the place.

So your call " ->DrawIndexedPrimitive(D3DPT_TRIANGLELIST, 0, 0, m_VertexBuffer.m_NumVertices, index0, numIndices/3); "

must change to a variation like

" ->DrawIndexedPrimitive(D3DPT_TRIANGLELIST, 0, m_MinIndexNumber, m_VertexBuffer.m_NumVertices, index0, numIndices/3); "
It could be that your minIndex/maxindex values do have to be correct by strict D3D standards, but these ATI cards/drivers are the only ones to actually be so strict... E.g. it could be that other drivers are simply tolerating incorrect values when they shouldn't.
IIRC
- [font=courier new,courier,monospace]MinIndex[/font] should be that minimum value in the specified range of your index buffer
and either:
- [font=courier new,courier,monospace]MinIndex+NumVertices[/font] should be one past the maximum value in that range of your index buffer, or...
- [font=courier new,courier,monospace]NumVertices[/font] should be the number of unique values in that range of your index buffer... It's been a while since I did DX9

However, most drivers just treat these values as a hint, or simply ignore them, so it doesn't matter if you pass in wrong values. Edited by Hodgman

##### Share on other sites
@Hodgeman:
I have looked directly at the index values in the V&I-buffers. As far as I can see, the conditions you state are met.
Anyway, I am curious: why should I expect a driver to run into trouble when you pass an overly generous vertex range.
I mean, besides, "you never know" ...
Passing a vertex range that is too limited will cause trouble, if the driver actually uses this information, sure.

@mhagain:
Right, that would work. But nothing I am too eager to do. And the ATI driver problem might still remain.

##### Share on other sites
Anyway, I am curious: why should I expect a driver to run into trouble when you pass an overly generous vertex range.
I mean, besides, "you never know" ...
Passing a vertex range that is too limited will cause trouble, if the driver actually uses this information, sure
Yeah I agree, a overly generous range seems like it should be safe, and practice tells us that it is safe (so far )

For curiosity, the only hypothetical situation I can think of is:
Let's assume we're only using [font=courier new,courier,monospace]D3DPT_TRIANGLELIST[/font] and [font=courier new,courier,monospace]DrawIndexedPrimitive[/font].
The driver is emulating vertex processing, but it's done by a worker thread. The [font=courier new,courier,monospace]PrimitiveCount[/font] param is summed over a whole frame, and the sum multiplied by 3 to calculate the maximum number of verts required. To allow maximum latency, a whole-frame vertex buffer is constructed of the previously calculated max size. At the end of a frame (after this buffer's been created) the worker thread is alowed to start.
For each [font=courier new,courier,monospace]DrawIndexedPrimitive[/font] call, the worker thread reserves a portion of the whole-frame vertex-buffer equal to the draw-call's [font=courier new,courier,monospace]NumVertices[/font] value (the number of unique verts to be processed). With a generous [font=courier new,courier,monospace]NumVertices[/font] value, the whole-frame vertex-buffer will be depleted before all draw-calls have been processed, which will result in either: GPU-readable vertex allocation to extend a ring-buffer, complex CPU/GPU ring-buffer stalling routines, or when those fail or aren't implemented: simply aborting the draw-call in (silent) error... Edited by Hodgman

##### Share on other sites
Interesting theory. Not entirely impossible that somebody would do something like this.

##### Share on other sites
In case someone has the same problem and is reading this: I had the same problem myself today with my DX9 Engine and the Mobility Radeon 2600. Upgrading the Catalyst Driver from 12.4 to the 12.6 Beta for HD2000, HD3000 and HD4000 seams to fix it for me.