Jump to content

  • Log In with Google      Sign In   
  • Create Account


#Actualmhagain

Posted 11 October 2012 - 03:14 PM

Yup, running queries that way is likely to be slower alright.

The reason why is because of GPU latency - because the GPU is a parallel processor it's allowed to just store out commands and data and get round to actually drawing in it's own sweet time. That may be anything up to (typically) 3 frames after the command is issued.

By fetching the result of the query immediately after it's been run, and doing it for every single model, you're breaking this parallelism. Instead of nice fast rendering you get a huge pipeline stall each time you fetch the results. The more models you have the worse it will be.

To compound the misery you're creating new query objects at runtime each frame (and you don't seem to be destroying them so you've got a resource leak too). This is all over the docs and recommendations - resource creation is expensive, don't do it at runtime, do it once only during startup.

Back to the queries.

There are two possible approaches here. The first is to create n query objects (one for each model), then go through all of your models, begin query, draw bounding geometry, end query, next model. That will give the queries some time to issue and run; the theory is that hopefully by the time the last query is issued you'll have the first one near ready (you won't, but it's nowhere near as bad as what you currently have) so then you go through all the models again, fetch the results and conditionally draw.

Because of the up to 3 frame latency you're still going to break CPU/GPU parallelism, but it won't be so bad as your current method. At least you'll get something.

The second method is a little more sophisticated in that it takes advantage of so-called "temporal coherence", i.e. the fact that this kind of visibility probably won't change too much between individual frames. So each frame you fetch the results from the previous frame's set of queries, then issue a new set for the next frame.

A variation involves testing the query to see if the results are ready yet (check your D3D documentation) and - if not - using the last valid result. If there is no last valid result (it might be the first frame, or the model might have been frustum culled on the previous one) then you must assume that the model is visible (alternatively you could force the result fetch).

Some final notes.

I mentioned n queries above, but what if you don't know what value n should have? You could just create an array (or other container) of query objects sized at some hypothetical maximum and pull from that, or you could dynamically create new query objects on-demand and store them in a list; the key though is to re-use objects that were previously created, don't create new ones if you don't have to.

You'll also find that there is some cutoff point - depending on shader complexity/etc - below which it's going to be cheaper to not bother with a query but just always draw the model instead. You'll need to experiment to find that, but it will depend on number of vertices in the model, number of indices, and other such factors.

Finally, when using bounding volumes, you're going to have cases where your viewpoint is inside the bounding volume of a model - don't run a query if that happens; the model is visible, just draw it.

And before I go, one perfectly valid point is this: you've already got 70fps, it's mission accomplished, you're fast enough, move on to the next problem. You may however have yet to add physics, sound, networking, etc, or you may want additional headroom for more complex scenes, so I'm assuming that's why you want to go faster.

Phew! All of this sounds a hell of a lot more complex than just drawing models, doesn't it? And yes, it is, so you may even find that alternative techniques - such as instancing - give you perfectly adequate performance with a whole lot less complexity than using occlusion queries.

#3mhagain

Posted 11 October 2012 - 03:04 PM

Yup, running queries that way is likely to be slower alright.

The reason why is because of GPU latency - because the GPU is a parallel processor it's allowed to just store out commands and data and get round to actually drawing in it's own sweet time. That may be anything up to (typically) 3 frames after the command is issued.

By fetching the result of the query immediately after it's been run, and doing it for every single model, you're breaking this parallelism. Instead of nice fast rendering you get a huge pipeline stall each time you fetch the results. The more models you have the worse it will be.

To compound the misery you're creating new query objects at runtime each frame (and you don't seem to be destroying them so you've got a resource leak too). This is all over the docs and recommendations - resource creation is expensive, don't do it at runtime, do it once only during startup.

Back to the queries.

There are two possible approaches here. The first is to create n query objects (one for each model), then go through all of your models, begin query, draw bounding geometry, end query, next model. That will give the queries some time to issue and run; the theory is that hopefully by the time the last query is issued you'll have the first one near ready (you won't, but it's nowhere near as bad as what you currently have) so then you go through all the models again, fetch the results and conditionally draw.

Because of the up to 3 frame latency you're still going to break CPU/GPU parallelism, but it won't be so bad as your current method. At least you'll get something.

The second method is a little more sophisticated in that it takes advantage of so-called "temporal coherence", i.e. the fact that this kind of visibility probably won't change too much between individual frames. So each frame you fetch the results from the previous frame's set of queries, then issue a new set for the next frame.

A variation involves testing the query to see if the results are ready yet (check your D3D documentation) and - if not - using the last valid result. If there is no last valid result (it might be the first frame, or the model might have been frustum culled on the previous one) then you must assume that the model is visible (alternatively you could force the result fetch).

Some final notes.

I mentioned n queries above, but what if you don't know what value n should have? You could just create an array (or other container) of query objects sized at some hypothetical maximum and pull from that, or you could dynamically create new query objects on-demand and store them in a list; the key though is to re-use objects that were previously created, don't create new ones if you don't have to.

You'll also find that there is some cutoff point - depending on shader complexity/etc - below which it's going to be cheaper to not bother with a query but just always draw the model instead. You'll need to experiment to find that, but it will depend on number of vertices in the model, number of indices, and other such factors.

Finally, when using bounding volumes, you're going to have cases where your viewpoint is inside the bounding volume of a model - don't run a query if that happens; the model is visible, just draw it.

Phew! All of this sounds a hell of a lot more complex than just drawing models, doesn't it? And yes, it is, so you may even find that alternative techniques - such as instancing - give you perfectly adequate performance with a whole lot less complexity than using occlusion queries.

#2mhagain

Posted 11 October 2012 - 02:51 PM

Yup, running queries that way is likely to be slower alright.

The reason why is because of GPU latency - because the GPU is a parallel processor it's allowed to just store out commands and data and get round to actually drawing in it's own sweet time. That may be anything up to (typically) 3 frames after the command is issued.

By fetching the result of the query immediately after it's been run, and doing it for every single model, you're breaking this parallelism. Instead of nice fast rendering you get a huge pipeline stall each time you fetch the results. The more models you have the worse it will be.

To compound the misery you're creating new query objects at runtime each frame (and you don't seem to be destroying them so you've got a resource leak too). This is all over the docs and recommendations - resource creation is expensive, don't do it at runtime, do it once only during startup.

Back to the queries.

There are two possible approaches here. The first is to create n query objects (one for each model), then go through all of your models, begin query, draw bounding geometry, end query, next model. That will give the queries some time to issue and run; the theory is that hopefully by the time the last query is issued you'll have the first one near ready (you won't, but it's nowhere near as bad as what you currently have) so then you go through all the models again, fetch the results and conditionally draw.

Because of the up to 3 frame latency you're still going to break CPU/GPU parallelism, but it won't be so bad as your current method. At least you'll get something.

The second method is a little more sophisticated in that it takes advantage of so-called "temporal coherence", i.e. the fact that this kind of visibility probably won't change too much between individual frames. So each frame you fetch the results from the previous frame's set of queries, then issue a new set for the next frame.

A variation involves testing the query to see if the results are ready yet (check your D3D documentation) and - if not - using the last valid result. If there is no last valid result (it might be the first frame, or the model might have been frustum culled on the previous one) then you must assume that the model is visible (alternatively you could force the result fetch).

Some final notes.

I mentioned n queries above, but what if you don't know what value n should have? You could just create an array (or other container) of query objects sized at some hypothetical maximum and pull from that, or you could dynamically create new query objects on-demand and store them in a list; the key though is to re-use objects that were previously created, don't create new ones if you don't have to.

You'll also find that there is some cutoff point - depending on shader complexity/etc - below which it's going to be cheaper to not bother with a query but just draw the model always instead. You'll need to experiment to find that, but it will depend on number of vertices in the model, number of indices, and other such factors.

Finally, when using bounding volumes, you're going to have cases where your viewpoint is inside the bounding volume of a model - don't run a query if that happens; the model is visible, just draw it.

Phew! All of this sounds a hell of a lot more complex than just drawing models, doesn't it? And yes, it is, so you may even find that alternative techniques - such as instancing - give you perfectly adequate performance with a whole lot less complexity than using occlusion queries.

#1mhagain

Posted 11 October 2012 - 02:47 PM

Yup, running queries that way is likely to be slower alright.

The reason why is because of GPU latency - because the GPU is a parallel processor it's allowed to just store out commands and data and get round to actually drawing in it's own sweet time. That may be anything up to (typically) 3 frames after the command is issued.

By fetching the result of the query immediately after it's been run, and doing it for every single model, you're breaking this parallelism. Instead of nice fast rendering you get a huge pipeline stall each time you fetch the results. The more models you have the worse it will be.

To compound the misery you're creating new query objects at runtime each frame (and you don't seem to be destroying them so you've got a resource leak too). This is all over the docs and recommendations - resource creation is expensive, don't do it at runtime, do it once only during startup.

Back to the queries.

There are two possible approaches here. The first is to create n query objects (one for each model), then go through all of your models, begin query, draw bounding geometry, end query, next model. That will give the queries some time to issue and run; the theory is that hopefully by the time the last query is issued you'll have the first one near ready (you won't, but it's nowhere near as bad as what you currently have) so then you go through all the models again, fetch the results and conditionally draw.

Because of the up to 3 frame latency you're still going to break CPU/GPU parallelism, but it won't be so bad as your current method. At least you'll get something.

The second method is a little more sophisticated in that it takes advantage of so-called "temporal coherence", i.e. the fact that this kind of visibility probably won't change too much between individual frames. So each frame you fetch the results from the previous frame's set of queries, then issue a new set for the next frame.

A variation involves testing the query to see if the results are ready yet (check your D3D documentation) and - if not - using the last valid result. If there is no last valid result (it might be the first frame, or the model might have been frustum culled on the previous one) then you must assume that the model is visible (alternatively you could force the result fetch).

Some final notes.

I mentioned n queries above, but what if you don't know what value n should have? You could just create an array (or other container) of query objects sized at some hypothetical maximum and pull from that, or you could dynamically create new query objects on-demand and store them in a list; the key though is to re-use objects that were previously created, don't create new ones if you don't have to.

You'll also find that there is some cutoff point - depending on shader complexity/etc - below which it's going to be cheaper to not bother with a query but just draw the model always instead. You'll need to experiment to find that, but it will depend on number of vertices in the model, number of indices, and other such factors.

Finally, when using bounding volumes, you're going to have cases where your viewpoint is inside the bounding volume of a model - don't run a query if that happens; the model is visible, just draw it.

Phew! All of this sounds a hell of a lot more complex than just drawing models, doesn't it? And yes, it is, so you may even find that alternative techniques - such as instancing - give you better performance than using occlusion queries.

PARTNERS