Jump to content

View more

Image of the Day

Project built to test Unity's networking
#gamedev #indiedev #screenshotsaturday #indiedevhour #madewithunity https://t.co/vcsky4QFKr
IOTD | Top Screenshots

The latest, straight to your Inbox.

Subscribe to GameDev.net Direct to receive the latest updates and exclusive content.


Sign up now

What are your opinions on DX12/Vulkan/Mantle?

2: Adsense

Old topic!

Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.


  • You cannot reply to this topic
121 replies to this topic

#41 Seabolt   Members   

781
Like
0Likes
Like

Posted 13 March 2015 - 11:57 AM

 

 

You should already be doing that on modern D3D11/GL.

 

That's true, and I am ashamed to say I stuck too closely to the DX9 port of my engine where I didn't have nearly as much register space and needed to swap things around on a per-draw basis at times.

 

Scrapping all of that now though and moving forward with DX11 and OGL 4.x and porting in DX12 and Vulkan when they are more public. 

You guys have assuaged most of my fears about the ports though :)

 


Perception is when one imagination clashes with another

#42 CC Ricers   Members   

1491
Like
5Likes
Like

Posted 13 March 2015 - 02:16 PM

*
POPULAR

I think I'm share the same sentiments as Seabolt, and need to bite the bullet and move on to newer APIs.

I don't really have anything useful to add to this thread but as of last week:

 

GFyBqY6.jpg

 

That started changing as I started converting some projects to MonoGame Win 8/SharpDX with Shader model 4.0.


New game in progress: Project SeedWorld

My development blog: Electronic Meteor


#43 MarkS   Members   

3415
Like
0Likes
Like

Posted 15 March 2015 - 01:26 AM

...snip...


What is an "IHV"?

#44 Hodgman   Moderators   

50399
Like
1Likes
Like

Posted 15 March 2015 - 01:57 AM

...snip...

What is an "IHV"?
Independent hardware vendors - Intel, AMD, nVidia, Qualcomm, PowerVR, etc

#45 AlexPol   Members   

104
Like
0Likes
Like

Posted 15 March 2015 - 05:30 AM

Edit: Said something stupid, sorry about that :)


Edited by AlexPol, 15 March 2015 - 06:45 AM.


#46 Ameise   Members   

1148
Like
0Likes
Like

Posted 15 March 2015 - 05:54 AM

 

- Root Signatures/Shader Constant management
Again really exciting stuff, but seems like a huge potential for issues, not to mention the engine now has to be acutely aware of how frequently the constants are changed and then map them appropriately.

You should already be doing that on modern D3D11/GL.
In Ogre 2.1 we use 4 buffer slots:

  1. One for per-pass data
  2. One to store all materials (up to 273 materials per buffer due to the 64kb per const buffer restriction)
  3. One to store per-draw data
  4. One tbuffer to store per-draw data (similar to 3. but it's a tbuffer which stores more data where not having the 64kb restriction is handy)

Of all those slots, we don't really change them. Even the per-draw parameters.

The only time we need rebind buffers are when:

  1. We've exceeded one of the per-draw buffers size (so we bind a new empty buffer)
  2. We are in a different pass (we need another per-pass buffer)
  3. We have more than 273 materials overall and previous draw referenced material #0 and the current one is referencing material #280 (so we need the switch the material buffer)
  4. We change to a shader that doesn't use these bindings (very rare).

Point 2 happens very infrequently. Point 3 & 4 can be minimized by sorting by state in a RenderQueue. Point 1 happens very infrequently too, and if you're on GCN the 64kb limit gets upgraded to 2GB limit, which means you wouldn't need to switch at all (and also solves point #3 entirely).

The entire bindings don't really change often and this property can already be exploited using DX11 and GL4. DX12/Vulkan just makes the interface thiner; that's all.

 

 

How are you implementing your constant buffers? From what you've written as your #3b, it sounds like you're packing multiple materials'/objects' constants into a single large constant buffer, and perhaps indexing out of it in your draws? IIRC, that's supported only in D3D11.1+, as there is no *SSetConstantBuffer function that takes offsets until then.

 

Otherwise, if you aren't using constant buffers with offsets, how are you avoiding having to set things like object transforms and the like? If you are, how are you handling targets below D3D11.1?


Edited by Ameise, 15 March 2015 - 05:55 AM.


#47 TheChubu   Members   

9317
Like
0Likes
Like

Posted 15 March 2015 - 10:37 AM

use baseInstance parameter from glDraw*BaseInstanceBaseVertex. gl_InstanceID will still be zero based, but you can use an instanced vertex element to overcome this problem (or use an extension that exposes an extra glsl variable with the value of baseInstance)

And what if you're drawing two different meshes? ie, not instancing a single mesh.

 

How are you implementing your constant buffers? From what you've written as your #3b, it sounds like you're packing multiple materials'/objects' constants into a single large constant buffer, and perhaps indexing out of it in your draws? IIRC, that's supported only in D3D11.1+, as there is no *SSetConstantBuffer function that takes offsets until then.

I have no idea about D3D11, but prolly isn't even necessary. Just update the entire buffer in one call. Buffer is defined as an array of structs, index into that to fetch the one that corresponds to the current thing being drawn.


Edited by TheChubu, 15 March 2015 - 11:01 AM.

"I AM ZE EMPRAH OPENGL 3.3 THE CORE, I DEMAND FROM THEE ZE SHADERZ AND MATRIXEZ"

 

My journals: dustArtemis ECS framework and Making a Terrain Generator


#48 mhagain   Members   

13045
Like
2Likes
Like

Posted 15 March 2015 - 11:43 AM

And what if you're drawing two different meshes? ie, not instancing a single mesh.

 

1 is a valid value for the instance count.


It appears that the gentleman thought C++ was extremely difficult and he was overjoyed that the machine was absorbing it; he understood that good C++ is difficult but the best C++ is well-nigh unintelligible.


#49 TheChubu   Members   

9317
Like
0Likes
Like

Posted 15 March 2015 - 11:59 AM

1 is a valid value for the instance count.
Of course but the idea is to batch up data inside the constant/uniform buffers and use the instance ID for indexing. No sense doing it if you can only index one thing (ie, you end up what I am doing, one glDraw and glUniform1i call per mesh drawn).

"I AM ZE EMPRAH OPENGL 3.3 THE CORE, I DEMAND FROM THEE ZE SHADERZ AND MATRIXEZ"

 

My journals: dustArtemis ECS framework and Making a Terrain Generator


#50 vlj   Members   

1070
Like
2Likes
Like

Posted 15 March 2015 - 12:44 PM

 

Of course but the idea is to batch up data inside the constant/uniform buffers and use the instance ID for indexing. No sense doing it if you can only index one thing (ie, you end up what I am doing, one glDraw and glUniform1i call per mesh drawn).

 

 

Id comes from Instance data if I understand correctly and not gl_InstanceID. Id is different for two different instance, and a different mesh is a different instance.

 

Think of this as 2 buffers, one is instance buffer which contains only ID, the other is vertex buffer.

A first draw call would use 10 instance from the instance buffer, starting from BaseInstance 0.
A second draw call would use 1 instance from the instance buffer, starting from BaseInstance 10.

 

So if in your instance buffer you put Id in ascending order for instance, all the ID will be different.



#51 Ameise   Members   

1148
Like
0Likes
Like

Posted 15 March 2015 - 01:30 PM

 

use baseInstance parameter from glDraw*BaseInstanceBaseVertex. gl_InstanceID will still be zero based, but you can use an instanced vertex element to overcome this problem (or use an extension that exposes an extra glsl variable with the value of baseInstance)

And what if you're drawing two different meshes? ie, not instancing a single mesh.

 

 

 

How are you implementing your constant buffers? From what you've written as your #3b, it sounds like you're packing multiple materials'/objects' constants into a single large constant buffer, and perhaps indexing out of it in your draws? IIRC, that's supported only in D3D11.1+, as there is no *SSetConstantBuffer function that takes offsets until then.

I have no idea about D3D11, but prolly isn't even necessary. Just update the entire buffer in one call. Buffer is defined as an array of structs, index into that to fetch the one that corresponds to the current thing being drawn.

 

 

So, he's just saying 'for the next n draws, here are the constants', and then sets indices (somehow? Not sure how he'd track that without also updating a constant. Atomic integers?) to say 'access struct n in the huge constant buffer?

Honestly, I'd rather update smaller buffers with finer granularity as I wouldn't be stalling on one large copy.



#52 Matias Goldberg   Members   

9406
Like
1Likes
Like

Posted 16 March 2015 - 02:54 PM

How are you implementing your constant buffers? From what you've written as your #3b, it sounds like you're packing multiple materials'/objects' constants into a single large constant buffer, and perhaps indexing out of it in your draws?

Yes
 

IIRC, that's supported only in D3D11.1+, as there is no *SSetConstantBuffer function that takes offsets until then.

That's one way of doing it, and doing it that way, then you're correct. We don't use D3D11.1 functionality, though since OpenGL does support setting constant buffers by offsets, we take advantage of that to further reduce splitting some batch of draw calls.
 

Otherwise, if you aren't using constant buffers with offsets, how are you avoiding having to set things like object transforms and the like? If you are, how are you handling targets below D3D11.1?

By treating all your draws as instanced draws (even if they're just one instance) and use StartInstanceLocation.
I attach a "drawId" R32_UINT vertex buffer (instanced buffer) which is filled with 0, 1, 2, 3, 4 ... 4095 (basically, we can't batch more than 4096 draws together in the same call; that limit is not arbitrary: 4096 * 4 floats per vector = 64kb; aka the const buffer limit).
Hence the "drawId" vertex attribute will always contain the value I want as long as it is in range [0; 4096) and thus index whatever I want correctly.

This is the most compatible way of doing it which works with both D3D11 and OpenGL. There is a GL4 extension that exposes the keywords gl_DrawIDARB & gl_BaseInstanceARB which allows me to do the same without having to use an instanced vertex buffer (thus gaining some performance bits in memory fetching; though I don't know if it's noticeable since the vertex buffer is really small and doesn't consume much bandwidth; also the 4096 draws per call limit can be lifted thanks to this extension).

Edited by Matias Goldberg, 16 March 2015 - 02:59 PM.


#53 Matias Goldberg   Members   

9406
Like
2Likes
Like

Posted 16 March 2015 - 03:01 PM

1 is a valid value for the instance count.

Of course but the idea is to batch up data inside the constant/uniform buffers and use the instance ID for indexing. No sense doing it if you can only index one thing (ie, you end up what I am doing, one glDraw and glUniform1i call per mesh drawn).

Doing this for just one instance is completely valid. If you do it the way you said, although valid; your API overhead will go through the roofs, specially if you have a lot of different meshes.

Edited by Matias Goldberg, 16 March 2015 - 03:02 PM.


#54 agleed   Members   

946
Like
0Likes
Like

Posted 16 March 2015 - 03:12 PM

Otherwise, if you aren't using constant buffers with offsets, how are you avoiding having to set things like object transforms and the like? If you are, how are you handling targets below D3D11.1?

By treating all your draws as instanced draws (even if they're just one instance) and use StartInstanceLocation.

 

 

 And you have no noticeable problems with that? A year and half ago or so I did some quick tests where I just rendered (in OpenGL) all my objects using normal draw calls vs rendering all my objects using instancing with instance count =1, and it had some truly horrendous CPU overhead. Profiler showed that GPU fell asleep, but CPU for some reason took a lot longer for everything. If I remember right, for about 700 total draw calls (crytek sponza geometry + shadow pass), I saved something like 3 or 4ms by switching back to normal draw calls for everything (on an i7 3770k and GTX 770). Granted the setup was suboptimal at best, I sorted by shaders and textures used and nothing else, every mesh was in its own VB, etc. Maybe that was the reason and there's a much smaller instancing overhead otherwise?


Edited by agleed, 16 March 2015 - 03:13 PM.


#55 mhagain   Members   

13045
Like
0Likes
Like

Posted 16 March 2015 - 03:49 PM

And you have no noticeable problems with that? A year and half ago or so I did some quick tests where I just rendered (in OpenGL) all my objects using normal draw calls vs rendering all my objects using instancing with instance count =1, and it had some truly horrendous CPU overhead. Profiler showed that GPU fell asleep, but CPU for some reason took a lot longer for everything. If I remember right, for about 700 total draw calls (crytek sponza geometry + shadow pass), I saved something like 3 or 4ms by switching back to normal draw calls for everything (on an i7 3770k and GTX 770). Granted the setup was suboptimal at best, I sorted by shaders and textures used and nothing else, every mesh was in its own VB, etc. Maybe that was the reason and there's a much smaller instancing overhead otherwise?

 

This would depend on how you update the per-instance buffer.

 

If you have a small buffer - with space for only one instance - and you do a separate buffer update for each instance, then OpenGL is going to perform horribly (D3D won't).  If you have a large buffer with space for all your instances, but you update them all together, then it should run well.

 

The overhead isn't instancing, it's OpenGL's buffer objects API.


It appears that the gentleman thought C++ was extremely difficult and he was overjoyed that the machine was absorbing it; he understood that good C++ is difficult but the best C++ is well-nigh unintelligible.


#56 Matias Goldberg   Members   

9406
Like
0Likes
Like

Posted 16 March 2015 - 04:32 PM

And you have no noticeable problems with that?

Nope, we're are not.

every mesh was in its own VB

There's your problem. Every time you had to switch to the next mesh, you had to respecify the VAO state.
You could be hitting the slow path by doing that per mesh + using instancing. The driver may have been able to detect the VAO only switched buffers with the non-instanced calls; but decided to respecify the whole vertex data when using instancing.
You should keep all your meshes in the same Buffer Object, or have very Buffer Objects at least.

Also, you obviously compared an instanced version without indexing into one single buffer vs normal draw calls.
You should compare instanced version + indexing into one single buffer vs normal draw calls.
If there is higher overhead from using instancing, it is more than negated by using indexes into a single buffer.

#57 TheChubu   Members   

9317
Like
0Likes
Like

Posted 16 March 2015 - 05:37 PM

Doing this for just one instance is completely valid. If you do it the way you said, although valid; your API overhead will go through the roofs, specially if you have a lot of different meshes.
Then with the instanced method, how would you handle drawing different meshes?

 

ie, as I see it you'd have two ways of doing it:

  1. Update the mesh transform, then issue a glDraw*Instanced call with a single instance, always fetch transform in index 0. Repeat for every single mesh.
  2. Update transform UBO with all the transforms that can fit, then issue glDraw*Instanced call, repeat this draw call increasing the base instance ID by one for every single mesh until you run out of transforms in the UBO (doing the instanced index buffer trick you mentioned since instance ID is always 0).

So you always end up with one draw call per each different mesh. Thing that differs is UBO updating scheme (no scheme in first one, batching scheme in the second one).


"I AM ZE EMPRAH OPENGL 3.3 THE CORE, I DEMAND FROM THEE ZE SHADERZ AND MATRIXEZ"

 

My journals: dustArtemis ECS framework and Making a Terrain Generator


#58 vlj   Members   

1070
Like
0Likes
Like

Posted 16 March 2015 - 05:44 PM

I starting to be worried by rumors that Google may have its own low level api too. This would basically mean one API per OS which break the purpose of Vulkan in the first place...



#59 vlj   Members   

1070
Like
0Likes
Like

Posted 16 March 2015 - 05:48 PM

 

Then with the instanced method, how would you handle drawing different meshes?

 

ie, as I see it you'd have two ways of doing it:

  1. Update the mesh transform, then issue a glDraw*Instanced call with a single instance, always fetch transform in index 0. Repeat for every single mesh.
  2. Update transform UBO with all the transforms that can fit, then issue glDraw*Instanced call, repeat this draw call increasing the base instance ID by one for every single mesh until you run out of transforms in the UBO (doing the instanced index buffer trick you mentioned since instance ID is always 0).

So you always end up with one draw call per each different mesh. Thing that differs is UBO updating scheme (no scheme in first one, batching scheme in the second one).

 

 

glMultiDrawIndirect basically iterates glDraw*Instanced call over all element of bound indirect draw command buffer.



#60 Hodgman   Moderators   

50399
Like
2Likes
Like

Posted 16 March 2015 - 06:40 PM

ie, as I see it you'd have two ways of doing it:

  • Update the mesh transform, then issue a glDraw*Instanced call with a single instance, always fetch transform in index 0. Repeat for every single mesh.
  • Update transform UBO with all the transforms that can fit, then issue glDraw*Instanced call, repeat this draw call increasing the base instance ID by one for every single mesh until you run out of transforms in the UBO (doing the instanced index buffer trick you mentioned since instance ID is always 0).
So you always end up with one draw call per each different mesh. Thing that differs is UBO updating scheme (no scheme in first one, batching scheme in the second one).
The CPU cost of a draw call depends on the state changes that preceeded it.
Apparently setting the base instance ID state is much cheaper than binding a new UBO, which makes sense, as there's a tonne of resource management code that has to run behind the scenes whenever you bind any resource, especially if it's an orphaned resource.

Also, yes, updating one large UBO is going to be much cheaper than updating thousands of small ones. Especially if you use persistent unsynchronized updates.

On the GPU side, draw calls are free. What costs is context/segment switches. If two draw-calls use the same "context", the GPU bundles them together, avoiding stalls.
Certain state changes "roll the context"/"begin a segment"/etc, which means the next draw can't overlap with the previous one.
It would be interesting to find out where base-instance-id state and UBO bindings stand in regards to context rolls on different GPUs...




Old topic!

Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.