Jump to content

  • Log In with Google      Sign In   
  • Create Account

mind in a box

Member Since 20 Apr 2010
Offline Last Active Apr 17 2016 05:54 PM

Posts I've Made

In Topic: Project wants to link to static libraries of a static library used by it

04 February 2016 - 08:52 PM

Thanks for the reply! This flag is exactly what I needed. However, after doing some research, it doesn't seem possible to set using CMake without using some very ugly hacks.

However, it turns out that you can compile GLEW just with a few .h./.c files yourself, and someone of my team came up with a CMakeLists.txt doing exactly that, solving the problem.


In Topic: Switching PixelShader gains performance, without any pixels on screen.

15 October 2015 - 05:19 AM


If you're not actually drawing any pixels, then your GPU frametime is probably pretty low... so your framerate is probably being dictated by the CPU frametime.

105fps is ~9.5ms frametime, and 150fps is ~6.7ms frametime. With 1000 draw-calls per frame, that's (even more approximately), about 9.5μs per draw with the complex shader, or 6.7μs per draw with the simple shader in CPU time.

 

Yeah, I should be CPU-Limited here.

 

However, I did some more tests. I can't replicate this behavior by using my own simple shaders. There is something Intel GPA seems to be doing when switching to "Simple PixelShaders" other than replacing all pixelshaders with the simple version. Probably they block setting the shader-resources or something, I have to dig more into that.


In Topic: About dynamic vertex pulling

12 October 2015 - 03:40 AM


If you're talking about merge-instancing isn't it just vertexid/size=instanceid?

 

My meshes aren't using the same vertex-counts as of yet and I would like to get around this if possible.

 


To select the mesh to render via DrawPrimitive, use StartIndexLocation/StartVertexLocation.
For example if you've got mesh A of 1000 vertices (32 bytes per vertex) and right afterwards mesh B of 500 vertices (24 bytes per vertex); then you need to set StartVertexLocation to 1334.

 

Is this really true? Don't you have to specify the index-value of the first vertex? Wouldn't you just need to set that value to 0 for the first drawcall and to 1000 for the second? I'm pretty sure thats how it works, at least for the indices.

You are probably talking about the offsets you can set while binding the buffers to the IA?

 



To identify the instance:
Create a vertex buffer filled with a uint in increasing order. You only need one, then you can reuse for all the draws. In other words:

//At initialization time
uint32_t *vertexBuffer= ...;
for( int i=0; i<4096; ++i )
vertexBuffer[i] = i;

Note: the 4096 is arbitrary.

And bind that vertex buffer as instance data. We'll call this the "DRAWID". Then when you pass StartInstanceLocation = 500, the drawID will contain 500 for the first instance, 501 for the 2nd instance, etc (SV_InstanceID is zero-based, thus we need this trick to get the actual value in the shader)

Now that you've got the instance ID, just load myWorldMatrices[drawID];

 

Won't SV_InstanceID be filled with the value I passed in the drawcall, regardless of a second vertexbuffer being bound? I guess I will have to test this, but it would save me the overhead of reading the same value as SV_InstanceID out of a buffer.


In Topic: About dynamic vertex pulling

11 October 2015 - 04:47 PM


The overhead of an actual draw call is extremely low. The biggest issue performance-wise is when you need to swap vertex/index buffers between the calls, which can be avoided by having one giant buffer and using the Start* variables.

 

I am done with the collection of all the buffers into one single bug buffer and it works pretty well, the packing at least. However, I am not quite sure how I would implement using different world-matrices without actually switching at least one constant buffer, since I can't use the instance-ID to figure out what I am currently rendering.

 

My idea would be to simple use DrawInstanced, passing only a single (maybe more) instance to render and setting the start-instance to the index the instance-data of my object is setting in a big structured-buffer bound to the vertexshader. That way I can access the instance-data using SV_InstanceID.

 

Would that be an appropriate solution or do you maybe have a better idea? The engine I'm currently working on unfortunately isn't far enough for me to test this now.

 


Yes. If you can avoid it, then better. You can also optimize for specific uses (e.g. if you only need position & orientation but no scale, send a float4 for the position and a float4 with a quaternion; if you only need need XZ position and Y is controlled globally or already baked, only send a float2. You can also use half formats to halve the bandwidth if precision isn't an issue)

 

Never really thought about this. I actually don't need scale for most of the objects I'm working with, so that is going to be a really nice optimization!

 

Thanks for your answers!


In Topic: About dynamic vertex pulling

07 October 2015 - 11:34 AM

Thanks for the reply!

 

 

TBH the article looks like a horribly complicated version of what can be achieved with one huge vertex buffer and one huge index buffer combined with StartInstanceLocation and StartIndexLocation from DrawIndexedInstanced and StartVertexLocation from DrawInstanced.

 

Not quite, the point of the technique is to minimize drawcalls further than instancing can go. It enables you to render lots of different geometry with different textures using only one single DrawInstanced-Call.

Basically it IS DrawInstanced with these two parameters, but without any of the overhead comming from the drawcalls.

 

 

It is true that LOD loses some effectiveness. However there's more to it than just the vertex shader processing power.
One triangle covering 1024 pixels is much faster than 1024 triangles covering 1024 pixels each. That's because pixels are processed in at least 2x2 blocks (aka "the small triangle problem"). Triangles

 

Right, I totally forgot about that!

 

 

You are right.
The author is using 3 matrices per instance (world, view, and projection) which is far from ideal. I suppose he did it for simplicity of the article.

 

Sure, but even copying 15000 world-matrices to a buffer would take a lot of time. I think a better approach would be to use this only for static geometry and work with indices to a pre-filled buffer with all of the instance-information.

 

Thanks for the matrix optimization-tip as well, I didn't know that!


PARTNERS