Multi-threaded Rendering

Medo Mex · 2012-12-18T20:54:34

Hi everyone, I have been thinking how could I speed up rendering, can using multi-threading speed up rendering? What other techniques that I can use to speed up rendering?

Graphics and GPU Programming Programming

Started by Medo Mex December 14, 2012 12:28 AM

18 comments, last by ATEFred 11 years, 4 months ago

Gavin Williams

986

December 16, 2012 01:01 PM

I don't know what your scene and view looks like, but you'll probably have to organize your rendering in a more sophisticated way than just looping through all your objects and drawing them. You can use spacial partitioning techniques ... chunks, quadtrees etc to reduce the number of objects that you need to render. You keep those structures up to date as you go ... so in your update() method you will add/remove/sort your objects. And by the time you get to your render() method you'll already have a small collection of objects to render. If you have a heap of objects on screen you might want to look at instancing.

Medo Mex

891

Author

December 16, 2012 01:27 PM

@Gavin Williams: Do you mean I should use std::map? Any code example out there?

Gavin Williams

986

December 16, 2012 04:47 PM

I'm sorry I don't know C++ well enough yet to even try to talk to you about code specifics. And the implementation will very much depend upon your program. You'll have to do some research into spacial partitioning techniques and why they are used, particularly in regards to rendering.

Start here ...

http://www.altdevblogaday.com/2011/02/21/spatial-partitioning-part-1-survey-of-spatial-partitioning-solutions/

If your game/app is 2d I would recommend breaking it up into a simple array of areas and then you can selectively render each area that falls under your camera. And so you can imagine that if half your objects are in area 1 and the other half are in area 2 and your camera can only see area 1 then you won't have to render any of the objects in area 2, that would result in halving the number of draw calls you need to make. That is how you'll make gains in rendering... by not rendering.

If all your objects are the same (same geometry) then you can use hardware instancing to draw many objects with one call.

What you have to realize is that making a draw call has a cost. And so you want to reduce that cost.

Medo Mex

891

Author

December 16, 2012 07:01 PM

What you have to realize is that making a draw call has a cost. And so you want to reduce that cost.[/quote]

If I have 10 mesh in the scene, 5 of them are the same, I have to repeat the following 10 times to draw them all:
- Set world matrix
- Draw

What you mean by reducing draw calls? I think you can't draw more than 1 mesh in one draw call.

Flimflam

665

December 17, 2012 06:20 AM

What you have to realize is that making a draw call has a cost. And so you want to reduce that cost.

If I have 10 mesh in the scene, 5 of them are the same, I have to repeat the following 10 times to draw them all:
- Set world matrix
- Draw

What you mean by reducing draw calls? I think you can't draw more than 1 mesh in one draw call.
[/quote]

You should look into vertex buffers. It allows you to condense a large amount of geometry into a single batch which you can then draw together with one call.

Gavin Williams

986

December 17, 2012 09:44 AM

I have to repeat the following 10 times[/QUOTE]

Aha, see that's where you can improve the performance of your program. Because you could draw those 10 objects with 6 draw calls (at least) and some might argue that it can be done in 1 draw call (and it can). But it might not always be appropriate. You do have SOME draw calls up your sleeve so you might as well use them.

I recommend you read this page : http://msdn.microsoft.com/en-us/library/windows/desktop/bb173349%28v=vs.85%29.aspx . That gives some information on drawing multiple instances of objects in DX9. You might have to search for additional resources as well on 'Instancing using DirectX9' for clarification or further discussion.

Flimflam wasn't wrong about telling you to look into vertex buffers a bit deeper, because they can be used in non-obvious ways to supply information other than geometry data ... such as instance data ! And also, as Flimflam suggested, vertex buffers can be used to hold the geometry data for more than one object (this is an alternative to the geometry instancing technique).

You'll have a bit of reading and experimentation in front of you. I would say these techniques are not trivial for somebody that is just learning about them, but they are not super difficult either, it's just that you'll have to think outside of the box to understand how they work.

Medo Mex

891

Author

December 18, 2012 01:21 AM

Okay, that's interesting, I have few questions:

1. Can I draw only the meshes that the camera can actually see? So If I have a vehicle behind the camera, I will not draw it since the camera can't see it. Is there is a method that I can use by giving arguments view, projection, world matrix to determine if the camera can actually see the mesh?
Example: bool canSee(viewMatrix, projectionMatrix, worldMatrix);

2. If I'm rendering particles, I use device->SetRenderState(D3DRS_POINTSIZE, POINT_SPRITE_SIZE_HERE); to determine the size of each point, for higher performance purposes, I will draw all the particles using one draw call by filling the vertex buffer with all vertices needed, but the problem is that I am not sure how I should set the point size, I can't use render state since render state will change all the points size, I'm looking for a way to set the size of each point sprite.

ATEFred

1,700

December 18, 2012 07:05 PM

1. There are ways of doing this, each with their own pros and cons.

You can precompute visibility using some variation of PVS (rough results, mem usage and precomputation stage which can be time consuming, fast at runtime to evaluate though). Loads of games use this.

You can use occlusion queries to render simple box representation after filling in the z buffer with very large occluders. This will tell you if any pixels of your rough bounding geometry were visible, which you can use to render the mesh afterwards. Downsides are potential sync points in between the cpu and gpu, or inaccurate results when using latent queries. Also, you still have to do a bunch of dips for the queries, so it might not help your cpu side at all, only the gpu work. Many games use this approach, Umbra is a commonly used middleware which is based on this technique.

Alternatively you could do the same kind of work manually in a compute shader, testing against a downsampled depth buffer, to do all the checks in one pass. You would still have the sync point issue though.

Another approach is to have a very small cpu side software rasterized depth buffer, which you test your bounding volumes against to generate a list of visible objects (frostbite does this).

None of these approaches are trivial to set up though, so I would only look into implementing one or more of them if you are sure that in your own app you are bottnecked by draw calls after implementing a simple spacial partitioning based frustum culling system.

2. don't use point sprites, render your own camera aligned quads. You then have access to all the information you need, and don't need to batch by size. Also, it will make it easier to move your engine to dx10/11 where there is no native support for point sprites.

Medo Mex

891

Author

December 18, 2012 08:10 PM

@ATEFred:

don't use point sprites, render your own camera aligned quads.[/quote]

Don't you think this will slow down rendering? point sprite use a single vertex while aligned quads use 4 vertex each.
10 point sprites = rendering 10 vertex
10 aligned quads = rendering 10 * 4 = 40 vertex

ATEFred

1,700

December 18, 2012 08:54 PM

@ATEFred:

don't use point sprites, render your own camera aligned quads.

Don't you think this will slow down rendering? point sprite use a single vertex while aligned quads use 4 vertex each.
10 point sprites = rendering 10 vertex
10 aligned quads = rendering 10 * 4 = 40 vertex
[/quote]

Been a while since I used dx9, so it's a bit hazy in my mind, but I didn't notice any major speed difference when moving to textured quads when moving to dx10. You pay the extra mem cost, but the flexibility you get is worth it. (rotations, motion blur type stretching, etc.).
The memory issue you can also get around (in dx10/11 at least) by having one quad you instance at will and a separate stream with minimal particle properties, or using gs for expansion.

Multi-threaded Rendering

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Multi-threaded Rendering

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines