Untitled

posted in DruinkJournal
Published June 02, 2009
Advertisement
Proxy DLL update (See previous entry for what this is about). I've been doing what NVPerfHUD does to see what affects the frame rate (measured by PIX):
  • Forcing Z-culling has no effect. So I'm not pixel shader or fillrate bound.
  • Forcing a 1x1 scissor rect also has no effect.
  • Forcing 2x2 dummy textures has no effect. Not texture bandwidth bound
  • Not calling any DrawPrimitive() calls causes my FPS to go up from 1.9-2.0 FPS to > 100 FPS.

    So, it looks like I'm vertex shader bound, which is bad, since it should be using the fastest vertex processing possible [sad], and I don't think there's anything I can do about it really [sad]

    I've yet to try my cached vertex buffers idea, that requires deriving from IDirect3DVertexBuffer9, and I'm all derived-out after deriving from IDirect3DDevice9 and it's hundred-odd functions...
  • Previous Entry Untitled
    Next Entry Untitled
    0 likes 3 comments

    Comments

    undead
    Quote:Original post by Evil Steve
  • Not calling any DrawPrimitive() calls causes my FPS to go up from 1.9-2.0 FPS to > 100 FPS.

    So, it looks like I'm vertex shader bound, which is bad, since it should be using the fastest vertex processing possible [sad], and I don't think there's anything I can do about it really [sad]

    I've yet to try my cached vertex buffers idea, that requires deriving from IDirect3DVertexBuffer9, and I'm all derived-out after deriving from IDirect3DDevice9 and it's hundred-odd functions...

  • It's only an idea but maybe you could give it a try.

    A couple of months ago I tested my framework/engine on an Acer Aspire One. The application is usually vertex shader bound since vertex shaders are software. I tried a reference scene and it could easily sustain 25-30 fps. As a note, the hardware is shader model 2.0 and there's no way to enable HW vertex processing. Powered by a 1.6Ghz Atom, I guess my rig is far worse than your laptop.

    When enabling an optimized scene representation fps went down to 4-6 fps. That made no sense at all, as I never saw that behaviour before: the optimized scene was faster on every hardware I tested.

    The problem was in my drawprimitive call: I didn't correctly pass MinIndex and NumVertices, they were set to 0 and NumberOfVerticesInVB.

    By just sending the correct values the optimized scene now renders at an average framerate of 45fps, with peaks at 65.

    The problem doesn't affect the reference unoptimized scene, which has a VB/IB pair for each piece of geometry, thus 0 and NumberOfVerticesInVB ARE THE CORRECT VALUES.

    I guess you could check which values get passed to drawprimitive, if there are only 3 huge VBs and 160 settexture calls then this could be a similar problem.

    I don't know if creating VBs every frame is your bottleneck, as I don't do such weird things in my pipeline.

    My 2 cents.
    June 03, 2009 03:06 AM
    Evil Steve
    Quote:Original post by undead
    The problem doesn't affect the reference unoptimized scene, which has a VB/IB pair for each piece of geometry, thus 0 and NumberOfVerticesInVB ARE THE CORRECT VALUES.

    I guess you could check which values get passed to drawprimitive, if there are only 3 huge VBs and 160 settexture calls then this could be a similar problem.

    I don't know if creating VBs every frame is your bottleneck, as I don't do such weird things in my pipeline.
    Actually, I thought of that while I was writing my vertex buffer recycling code (Which is done now, I'll test it at lunch time).
    Unfortunately, since I don't have the source code (This is for Guild Wars, not my own code), I can't really fix the DP/DIP parameters without looping over the vertex buffer contents to check the values for each DP/DIP call - although that may still be faster if I'm vertex shader bound.
    June 03, 2009 03:36 AM
    undead
    Quote:Original post by Evil Steve
    Actually, I thought of that while I was writing my vertex buffer recycling code (Which is done now, I'll test it at lunch time).
    Unfortunately, since I don't have the source code (This is for Guild Wars, not my own code), I can't really fix the DP/DIP parameters without looping over the vertex buffer contents to check the values for each DP/DIP call - although that may still be faster if I'm vertex shader bound.

    DP shouldn't be a problem, as the vertices in use are implicit.
    DIP is different as you don't have a clue about minimum and maximum vertices.

    Maybe there's a (remote) chance to fix it without parsing the entire VB/IB.

    You know the VBs are always created with the same size and usage.

    The reason why somebody would like to create and fill a VB every frame is to take a set of renderable elements and append them into a unique VB. The nature of those elements could be of no interest to you. You probably don't care if they are static meshes or procedurally generated content or something else.

    Suppose a draw call is made up of 100 triangles, with the primitive type being a trilist. It is impossible for those indices to point at a vertex whose index is bigger than 300. If some vertices are shared (probably some will) then the number could be smaller but 300 is a conservative guess, surely better than 0 and NumVerticesInVB. You just have to check 300 isn't bigger than the VB size, but it's a pretty straightforward test.
    This prediction is going to fail only if there are unreferenced vertices in VB, but that would definitely be an horrible way to waste space. If it's the case I'm scared by the code hidden inside Guild Wars!

    This prediction is also going to fail if different renderable elements share the same vertices. I think this scenario is a quite unlikely one, as the CPU time required to unify a list of meshes so that they share vertices is too much.

    A "tricky" solution could be the following:

    1- first pass, for each drawprimitive save its polycount and...
    2- ... parse the indices in the index buffer. Get minimum index and first index (maximum index is guessed from minimum index + polycount)
    3- second pass, when requested to draw a primitive check if that polycount is already saved
    3a- if it's not repeat points 1 and 2 and add a new polycount
    3b- if you already saved informations for that polycount, you can guess the parameters you need by reading only the first index and calculating the offset, thus moving minimum index and guessing the maximum index accordingly

    My point is if they are going to save N meshes, it is unlikely that two big meshes will exactly have the same number of polygons.
    On the other hand, if they fill the VB with procedurally generated content, there's surely a pattern behind it. Think about a particle system. I can't see why two 1000-elements particle systems generated from the same code should have different index buffer data (except the offset of the vertices, which can be calculated with the first index).

    Speaking about huge meshes, if parsing only the first index is not enough, you could parse the first 3-4 primitives to exactly identify them. Or the fist the last and the one in the middle. The relative position of the indices must be the same if it's the same mesh. You calculate the offset between the first indices, then you read the others, apply the offset and check if the indices are the same. You could apply this test only if the polycount is bigger than a treshold value (no need to extensively test a quad!).

    Not a safe way to do things, I admit! :)
    June 03, 2009 05:57 AM
    You must log in to join the conversation.
    Don't have a GameDev.net account? Sign up!
    Advertisement
    Advertisement