Sign in to follow this  

Fast way to throw away vertices in VS

This topic is 3309 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

D3D10: I would like to use instancing for drawing many objects. My data is stored in a buffer. The InstanceID is could be used to access the first level buffer. My only problem is the objects have different vertices count. I could draw with the vertex count of the biggest object, but would it be worth it? I would have to throw a way the vertices that hang over. But what is the best way to do this (thinking about performance)? (e.g. set them to one/the last point?) My only alternative is to draw all objects in one normal draw call (sum of all vertices) with degenerated triangles in between. The only problem here is to get something like an index, since you only have the vertexID and need to generate an index out of it which should be very expensive in my case (vertex fetch through a large buffer to determine your actual position). Except a static variable with my own faked index/instanceid would work better. edit: If I would set all vertices that are too much to the last point (then they are degenerated) would it cost really that much performance? I mean from the hardware architecture view. VS should have enough power anyway and PS should not process anything for the degenerated triangles. Anyone who can help or at least write his opinion? edit2: Does the graphics card do well (performance) with degenerated triangles? Thx, Vertex

Share this post


Link to post
Share on other sites
VS and PS share processing resources, so it's not like "VS should have enough power anyway". How much this wastes would depend on how many extra vertices you have and how complex the vertex shader is. You "degenerate triangles" have vertices whose positions are set to the same value at runtime. They are still different vertices from the hardware's POV (i.e., have different indices) and would therefore need to be calculated more than once (on the other hand, if you have any dynamic branching in the VS, all these vertices will branch the same, which would be good for performance).

This looks like a lot of work to reduce the number of calls, and might not end up being beneficial. How many different objects do you have (I mean models, not instances)?

Share this post


Link to post
Share on other sites
Quote:
Original post by ET3D
VS and PS share processing resources, so it's not like "VS should have enough power anyway". How much this wastes would depend on how many extra vertices you have and how complex the vertex shader is. You "degenerate triangles" have vertices whose positions are set to the same value at runtime. They are still different vertices from the hardware's POV (i.e., have different indices) and would therefore need to be calculated more than once (on the other hand, if you have any dynamic branching in the VS, all these vertices will branch the same, which would be good for performance).

This looks like a lot of work to reduce the number of calls, and might not end up being beneficial. How many different objects do you have (I mean models, not instances)?

What do you exactly mean by sharing processing resources. That VS data goes into PS? Registers? I don't know if I understood correctly, but do you mean that enough VS power is not enough, since ROP,PS,... are still there?

I will have about 3 to 10 different objects, each drawn in a similar way. Each object has about 15 techniques, each called by one drawinstancedcall. I'm still working on it, so it's not yet finished, but I have to find a good way for instancing.

It is for sure a hard breaking way for reducing draw calls, but it should (I hope so) be worth it. I already use instancing for simple rectangles which works great, but rectangles are static with 2 points. The objects I have problems with have a dynamic count of points.

I heard/read long time ago that the GPU may detect degenerated triangles and throws them away. I don't know if this is true!?!

I know that the VS would have to do all the extra work for the not seen triangles, but that's it, so PS and stuff before and after may not be executed (this is what I hope).

My problem is not the ALU/branching power, I am Vertex Fetch limited (bottleneck), since I have too much Buffer accesses in VS (which can't be reduced). I access a buffer (by instanceid) which tells me where I can find my data in another buffer. Performance for my simple rectangles is great though.

I don't think this would perform very well on VS, but it should save me that much that it may be faster. Or at least costs less CPU overhead.

edit: maybe I understood something still wrong, you may have ment how many points or triangles or data or whatever is needed for a mesh/object. It can be 1 to some thousands (as long as I have memory). So I know that one object with 10000 triangles makes all other objects to draw also 10000 triangles (although they may only have 1 triangle).

thx,
Vertex

Share this post


Link to post
Share on other sites
Quote:
Original post by Vertex333
What do you exactly mean by sharing processing resources.

I meant that on the graphics card the same processing units are used for both VS and PS. Therefore if you're doing more work in the VS, fewer units are available for the PS. It's not like a couple of years ago, when VS and PS had completely different pipelines, and therefore it made sense to add more work to the VS if it was underutilised.

Quote:
I will have about 3 to 10 different objects, each drawn in a similar way. Each object has about 15 techniques, each called by one drawinstancedcall.

I think that considering the small number of objects, you're unlikely to get a significant increase from batching them together compared to just batching the instances of each of them. I'd suggest that you only try a complex instancing scheme if profiling shows that the drawing calls are the bottleneck.

Share this post


Link to post
Share on other sites
Quote:
Original post by ET3D
Quote:
Original post by Vertex333
What do you exactly mean by sharing processing resources.

I meant that on the graphics card the same processing units are used for both VS and PS. Therefore if you're doing more work in the VS, fewer units are available for the PS. It's not like a couple of years ago, when VS and PS had completely different pipelines, and therefore it made sense to add more work to the VS if it was underutilised.

Quote:
I will have about 3 to 10 different objects, each drawn in a similar way. Each object has about 15 techniques, each called by one drawinstancedcall.

I think that considering the small number of objects, you're unlikely to get a significant increase from batching them together compared to just batching the instances of each of them. I'd suggest that you only try a complex instancing scheme if profiling shows that the drawing calls are the bottleneck.
Thx, I didn't think that much about the unified architecture. But isn't it true that VS performance is higher than PS performance per shader unit, because some vertices may often result in much more pixels? ok... thinking twice your message is still correct and I write "..." so thx, now I understand that better.

I will have about 100 (3 to 10 different shape-objects like rectangle, circle,... multiplied by their techniques which are up to 15) DrawInstancedCalls... for lets say some thousands of 3d-objects (each having some or many primites). Maybe you misunderstood me. I don't want to batch different shape-objects together, but for every shape-object and for every technique of it I want to batch all the 3d-objects via instancing.

Calling draw more some thousand times pulls my CPU down. With instancing at this granularity I would be able to reduce drawcalls to a limited number (about 100).

Once again my problem is that the 3d-objects can consist of different amount of vertices. Imagine I want to draw lines with different amount of edges. If I use instancing I have to set the max vertices count for the drawinstanceddrawcall and in the VS shader (where I completely create the vertex data; yes this performs very well) I have to set the overhanging (in comparision to the max vertex count) Vertices to the last point or in other words throw them away(discard them).

For numbers, I don't expect that my lines edge points differ that much, so 5 edges to max 50 edges is usual. Hence every line would be drawn (in this example) with a vertex count of 50.

Thx,
Vertex

Share this post


Link to post
Share on other sites

This topic is 3309 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this