Jump to content

  • Log In with Google      Sign In   
  • Create Account

Multi-threaded Rendering


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
19 replies to this topic

#1 Medo3337   Members   -  Reputation: 680

Like
0Likes
Like

Posted 13 December 2012 - 06:28 PM

Hi everyone,

I have been thinking how could I speed up rendering, can using multi-threading speed up rendering?

What other techniques that I can use to speed up rendering?

Sponsor:

#2 Alamar   Members   -  Reputation: 256

Like
0Likes
Like

Posted 13 December 2012 - 10:20 PM

Your question is a bit generic...

Yes, multi-threading can speed up rendering, but perhaps not how you're expecting...

One of the most common ways for speeding up rendering is by reducing state changes and render calls... For both, batch up similar meshes and render them in less calls... This is especially useful with many small objects, which seems to be pretty popular lately.

-Alamar

#3 Gavin Williams   Members   -  Reputation: 776

Like
0Likes
Like

Posted 13 December 2012 - 10:25 PM

I'd like to know more about your program. What does it do ? How does it perform now (frame-rates) ? Does it suffer from performance issues in certain areas of your code ? How many draw calls are you making / how many objects are you rendering ?

I've never used multithreading, never had to. But I might be able to give some advice on 'the other techniques' but you're going to have to provide more information.

#4 Hodgman   Moderators   -  Reputation: 31851

Like
1Likes
Like

Posted 13 December 2012 - 11:38 PM

Multi-threading can be used to speed up almost anything that's bottlenecked by computation, assuming you've got the extra CPU cores to run those extra threads.

Take note though, D3D9 is a single-threaded API; you should always make all of your D3D9 calls from a single thread only.

This doesn't mean that you can't write a threaded D3D9 renderer though -- it just means that the part of your renderer that is responsible for "submission" (calling D3D draw functions, setting states, etc) has to belong to a particular thread.
You can use threads to accelerate all the other responsibilities of a renderer -- e.g. traversing a scene to collect renderable objects, culling objects that aren't visible, sorting objects into an optimal order, determining which states will need to be set for each object, generating queues of commands for the "submission" thread to process, etc...

Edited by Hodgman, 13 December 2012 - 11:44 PM.


#5 Tispe   Members   -  Reputation: 1046

Like
0Likes
Like

Posted 14 December 2012 - 01:45 AM

The only thing I worry about is when loading resources while playing a game, which can lag your fps. You don't want to take more then 10 milliseconds copying data around. How I solve this is that my main loop is my rendering thread. I got a list of things to render, and if I need to change something in that list (requires D3D9 calls) I make sure to queue those changes up in a work package list. Then each frame I perform a work package and if less then 5 milliseconds has passed I can do another work package.

Typical work packages include copying texture-, vertex- and index data to buffers.

Any other non-D3D9 processing can be done on another thread.

#6 PhillipHamlyn   Members   -  Reputation: 454

Like
0Likes
Like

Posted 14 December 2012 - 02:26 AM

Hi Medo337

I've tried this in XNA and found rapidly that, on a non UI thread the following are good candidates;
  • world managment (asset loading etc)
  • the render queue can be managed
  • game mechanics (movement, collision, etc)
However, broadly speaking, the graphics object is tied to the UI thread. You can pass the graphics object to another thread but it cannot do any useful work there. You can't generate textures or prepare VBs or IBs on another thread since they cannot then be marshalled back to the UI thread where the rendering must take place. It follows that you cannot render on any thread other than the UI thread.

I dont think this restriction matters much anyway because the code calls to the graphics object by and large are requests to queue a particular operation, not the operation itself (which tripped me up a few times when profiling). I perceive, perhaps wrongly, that the graphics rendering pipeline seen from joe-average-programmers point of view is a Queue on which you push requests, which are then actioned in a Present() (or similar call). The Present thread blocks your UI thread as it actions all the queued work, writing to the screen, and doing heavy maths (sometimes on the GPU, sometimes on the CPU depending on your hardware capabiltiies).

Hope that helps.
Phillip

#7 Radikalizm   Crossbones+   -  Reputation: 2985

Like
1Likes
Like

Posted 14 December 2012 - 03:49 AM

Didn't anyone mention deferred rendering contexts in D3D11 yet?
They were built for exactly this purpose, although they don't actually speed up the actual rendering part, instead they make sure that you can safely build separate command buffers on different threads which can later be executed in one go by the main device.
This will only help you if there's actually a bottleneck in your application caused by working with the rendering pipeline of course, so make sure you profile first.

I don't know which version of D3D you're using, but in the case that you're using D3D11 this might be a viable option.

I gets all your texture budgets!


#8 Medo3337   Members   -  Reputation: 680

Like
0Likes
Like

Posted 14 December 2012 - 03:43 PM

@Gavin Williams: I notice that the rendering is slow in the beginning of the program (first 2-3 seconds), then the FPS start to increase till it become 61, I also notice that the player is slower in the beginning, even I'm using frame independent movement

I'm using something similar to the following code:
void render()
{
      elapsedTime = timeGetTime() - LastTime;
      // Move
      x += speed * elapsedTime;
      LastTime = timeGetTime();
}


#9 Gavin Williams   Members   -  Reputation: 776

Like
1Likes
Like

Posted 15 December 2012 - 10:00 AM

Don't worry about the first 2 to 3 seconds, maybe it's just your average frame time building up, I don't know, but I think that's unimportant. If your rendering reaches 60 fps and sits on that consistently, then that's all you need to worry about at this stage I would say. Depending on your program, there could be a few things going on at start-up that can result in shuddering or unstable frames. If you are really worried about it, you might have to profile your application using a profiling tool or simply time your method calls and start getting some specific information about how long everything is taking. Personally, timing my function calls is the approach I take when my programs start getting bigger and I run into frame-rate issues. I have a clock class (using inter-op to access the high precision timer) which i can use to mark the start and stop of a function (in the clock class) and then spit out durations to an onscreen display or text file for later inspection.

Just in regard to the above code ... does timeGetTime() retrieve fresh information from the clock or does it just return an already retrieved value. I'm guessing that it fetches a fresh measurement, which it shouldn't do, because even though the time between your two calls to timeGetTime() might be trivial here, as your physics (x+=speed*elapsedTime ... etc) gets more complicated, the difference between the two calls will become substantial and you will start losing time which will result in incorrect physics. You should have something like this ..

long timeNow = QueryClock();
elapsedTime = timeNow-timePrev;
timePrev = timeNow;

See the slight difference ! You can even separate your physics from your timing. You can have an update clock stage in your main loop and then an update physics stage, which simply reads the elapsed time.

Edit : If you start recording your frame-times and the times of your function calls you'll start seeing abnormalities such as functions performing on distinct tiers or spikes in function times. These are to be expected for a number of reasons, often to do with the operating system and your hardware. But they are not a problem. The first frame or two may often result in a time-spike. Could be a cache miss, or maybe even .NET or it's gc (but I don't know much about the gc and .NET mechanics). But generally I would say that these things can be ignored, especially if your program settles into a regular 60fps after just a few seconds.

A question to ask about your program is ... Does the character move correctly given the particular frame-times. You can get those numbers (distance, time, speed) and work that out. That way at least you can make sure your timing and physics is correct.

Edited by Gavin Williams, 15 December 2012 - 10:29 AM.


#10 Medo3337   Members   -  Reputation: 680

Like
0Likes
Like

Posted 15 December 2012 - 11:24 AM

@Gavin Williams: I figured out that the problem was with updating 'LastTime', I had to set it to LastTime = timeNow; instead of LastTime = timeGetTime();

Now it's running smoothly, I want to create two methods, one for rendering and one for updating the scene, however I'm concerned if it will affect the FPS negatively since I will have to go through models array twice:

Lets say entity.size() = 4000

void update()
{
      for(UINT i = 0; i < entity.size(); i++)
      {
            entity[i]->update(dt);
      }
}

void render()
{
      for(UINT i = 0; i < entity.size(); i++)
      {
            entity[i]->render();
      }
}

void updateAndRender()
{
      for(UINT i = 0; i < entity.size(); i++)
      {
           entity[i]->update(dt);
           entity[i]->render();
      }
}

I think calling updateAndRender(); is faster than calling update(); render(); since you only go through the entities once instead of twice.

However, I want to make the update method separate, is there is a way to do that without affecting the FPS?

#11 Gavin Williams   Members   -  Reputation: 776

Like
0Likes
Like

Posted 16 December 2012 - 07:01 AM

I don't know what your scene and view looks like, but you'll probably have to organize your rendering in a more sophisticated way than just looping through all your objects and drawing them. You can use spacial partitioning techniques ... chunks, quadtrees etc to reduce the number of objects that you need to render. You keep those structures up to date as you go ... so in your update() method you will add/remove/sort your objects. And by the time you get to your render() method you'll already have a small collection of objects to render. If you have a heap of objects on screen you might want to look at instancing.

#12 Medo3337   Members   -  Reputation: 680

Like
0Likes
Like

Posted 16 December 2012 - 07:27 AM

@Gavin Williams: Do you mean I should use std::map? Any code example out there?

#13 Gavin Williams   Members   -  Reputation: 776

Like
0Likes
Like

Posted 16 December 2012 - 10:47 AM

I'm sorry I don't know C++ well enough yet to even try to talk to you about code specifics. And the implementation will very much depend upon your program. You'll have to do some research into spacial partitioning techniques and why they are used, particularly in regards to rendering.

Start here ...

http://www.altdevblogaday.com/2011/02/21/spatial-partitioning-part-1-survey-of-spatial-partitioning-solutions/

If your game/app is 2d I would recommend breaking it up into a simple array of areas and then you can selectively render each area that falls under your camera. And so you can imagine that if half your objects are in area 1 and the other half are in area 2 and your camera can only see area 1 then you won't have to render any of the objects in area 2, that would result in halving the number of draw calls you need to make. That is how you'll make gains in rendering... by not rendering.

If all your objects are the same (same geometry) then you can use hardware instancing to draw many objects with one call.

What you have to realize is that making a draw call has a cost. And so you want to reduce that cost.

#14 Medo3337   Members   -  Reputation: 680

Like
0Likes
Like

Posted 16 December 2012 - 01:01 PM

What you have to realize is that making a draw call has a cost. And so you want to reduce that cost.


If I have 10 mesh in the scene, 5 of them are the same, I have to repeat the following 10 times to draw them all:
- Set world matrix
- Draw

What you mean by reducing draw calls? I think you can't draw more than 1 mesh in one draw call.

#15 Flimflam   Members   -  Reputation: 657

Like
0Likes
Like

Posted 17 December 2012 - 12:20 AM

What you have to realize is that making a draw call has a cost. And so you want to reduce that cost.


If I have 10 mesh in the scene, 5 of them are the same, I have to repeat the following 10 times to draw them all:
- Set world matrix
- Draw

What you mean by reducing draw calls? I think you can't draw more than 1 mesh in one draw call.


You should look into vertex buffers. It allows you to condense a large amount of geometry into a single batch which you can then draw together with one call.

#16 Gavin Williams   Members   -  Reputation: 776

Like
0Likes
Like

Posted 17 December 2012 - 03:44 AM

I have to repeat the following 10 times


Aha, see that's where you can improve the performance of your program. Because you could draw those 10 objects with 6 draw calls (at least) and some might argue that it can be done in 1 draw call (and it can). But it might not always be appropriate. You do have SOME draw calls up your sleeve so you might as well use them.

I recommend you read this page : http://msdn.microsoft.com/en-us/library/windows/desktop/bb173349%28v=vs.85%29.aspx . That gives some information on drawing multiple instances of objects in DX9. You might have to search for additional resources as well on 'Instancing using DirectX9' for clarification or further discussion.


Flimflam wasn't wrong about telling you to look into vertex buffers a bit deeper, because they can be used in non-obvious ways to supply information other than geometry data ... such as instance data ! And also, as Flimflam suggested, vertex buffers can be used to hold the geometry data for more than one object (this is an alternative to the geometry instancing technique).

You'll have a bit of reading and experimentation in front of you. I would say these techniques are not trivial for somebody that is just learning about them, but they are not super difficult either, it's just that you'll have to think outside of the box to understand how they work.

#17 Medo3337   Members   -  Reputation: 680

Like
0Likes
Like

Posted 17 December 2012 - 07:21 PM

Okay, that's interesting, I have few questions:

1. Can I draw only the meshes that the camera can actually see? So If I have a vehicle behind the camera, I will not draw it since the camera can't see it. Is there is a method that I can use by giving arguments view, projection, world matrix to determine if the camera can actually see the mesh?
Example: bool canSee(viewMatrix, projectionMatrix, worldMatrix);

2. If I'm rendering particles, I use device->SetRenderState(D3DRS_POINTSIZE, POINT_SPRITE_SIZE_HERE); to determine the size of each point, for higher performance purposes, I will draw all the particles using one draw call by filling the vertex buffer with all vertices needed, but the problem is that I am not sure how I should set the point size, I can't use render state since render state will change all the points size, I'm looking for a way to set the size of each point sprite.

#18 ATEFred   Members   -  Reputation: 1126

Like
0Likes
Like

Posted 18 December 2012 - 01:05 PM

1. There are ways of doing this, each with their own pros and cons.

You can precompute visibility using some variation of PVS (rough results, mem usage and precomputation stage which can be time consuming, fast at runtime to evaluate though). Loads of games use this.

You can use occlusion queries to render simple box representation after filling in the z buffer with very large occluders. This will tell you if any pixels of your rough bounding geometry were visible, which you can use to render the mesh afterwards. Downsides are potential sync points in between the cpu and gpu, or inaccurate results when using latent queries. Also, you still have to do a bunch of dips for the queries, so it might not help your cpu side at all, only the gpu work. Many games use this approach, Umbra is a commonly used middleware which is based on this technique.

Alternatively you could do the same kind of work manually in a compute shader, testing against a downsampled depth buffer, to do all the checks in one pass. You would still have the sync point issue though.

Another approach is to have a very small cpu side software rasterized depth buffer, which you test your bounding volumes against to generate a list of visible objects (frostbite does this).

None of these approaches are trivial to set up though, so I would only look into implementing one or more of them if you are sure that in your own app you are bottnecked by draw calls after implementing a simple spacial partitioning based frustum culling system.

2. don't use point sprites, render your own camera aligned quads. You then have access to all the information you need, and don't need to batch by size. Also, it will make it easier to move your engine to dx10/11 where there is no native support for point sprites.

#19 Medo3337   Members   -  Reputation: 680

Like
0Likes
Like

Posted 18 December 2012 - 02:10 PM

@ATEFred:

don't use point sprites, render your own camera aligned quads.


Don't you think this will slow down rendering? point sprite use a single vertex while aligned quads use 4 vertex each.
10 point sprites = rendering 10 vertex
10 aligned quads = rendering 10 * 4 = 40 vertex

#20 ATEFred   Members   -  Reputation: 1126

Like
0Likes
Like

Posted 18 December 2012 - 02:54 PM

@ATEFred:

don't use point sprites, render your own camera aligned quads.


Don't you think this will slow down rendering? point sprite use a single vertex while aligned quads use 4 vertex each.
10 point sprites = rendering 10 vertex
10 aligned quads = rendering 10 * 4 = 40 vertex


Been a while since I used dx9, so it's a bit hazy in my mind, but I didn't notice any major speed difference when moving to textured quads when moving to dx10. You pay the extra mem cost, but the flexibility you get is worth it. (rotations, motion blur type stretching, etc.).
The memory issue you can also get around (in dx10/11 at least) by having one quad you instance at will and a separate stream with minimal particle properties, or using gs for expansion.




Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS