Jump to content

  • Log In with Google      Sign In   
  • Create Account

FREE SOFTWARE GIVEAWAY

We have 4 x Pro Licences (valued at $59 each) for 2d modular animation software Spriter to give away in this Thursday's GDNet Direct email newsletter.


Read more in this forum topic or make sure you're signed up (from the right-hand sidebar on the homepage) and read Thursday's newsletter to get in the running!


DX11 slow on integrated graphics


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
7 replies to this topic

#1 Lambo   Members   -  Reputation: 100

Like
0Likes
Like

Posted 24 April 2012 - 11:37 AM

Hi, i'm writing a simple DX11 app and stumbled on a fact that it's running quite slow on integrated graphics with hardware future level of DX10. More precisely i'm getting only like 18fps on my office Intel GMA X4500, but my desktop ATI HD6850 renders the same code at around 1900fps. I expected code to run slower, but not like 100 times. Here's what I got from PIX:

Posted Image

I can't imagine, that OMSetRenderTargets or Canstant buffer updates can take that much time, so it must be buffer clearings? guess timings are not accurate at all... What could help improve performance on this machine, could it be that slow down is caused by 32bit texture format?

Sponsor:

#2 nept   Members   -  Reputation: 96

Like
-4Likes
Like

Posted 24 April 2012 - 12:47 PM

I believe that is a DX10 card so you would be in software emulation if you call DX11.

#3 mhagain   Crossbones+   -  Reputation: 8284

Like
0Likes
Like

Posted 24 April 2012 - 01:36 PM

I believe that is a DX10 card so you would be in software emulation if you call DX11.


Nope. If you use a feature level of D3D10 you get hardware acceleration of D3D10-level features - that's kind of the whole point of feature levels.

It appears that the gentleman thought C++ was extremely difficult and he was overjoyed that the machine was absorbing it; he understood that good C++ is difficult but the best C++ is well-nigh unintelligible.


#4 kubera   Members   -  Reputation: 971

Like
0Likes
Like

Posted 24 April 2012 - 03:17 PM

Hi!

There are several methods for tunning shaders.
Maybe you would reinvestigate your shaders.
For example, developers often are removing if statements if possible, etc. (for better threading)
Other method suggested by Intel is generating static and dynamic shadow maps into separate textures because of the frequency of changes.
Maybe your Intel can not optimize your code.
The solution would be a research for these quick algorithms.
Now Intel GPUs are not too fast Posted Image, but the future would be better.

#5 Bacterius   Crossbones+   -  Reputation: 9299

Like
0Likes
Like

Posted 24 April 2012 - 06:23 PM

Well what do you expect, it's an integrated card. It's not supposed to be fast. That said, 18 fps does seem a bit slow. The timings do make sense because GPU's are asynchronous devices, so when you tell them to Draw() you actually tell them to "Draw() as soon as you can", which is instantaneous, and then later on when you call a constant buffer update or a render target change you are then forced to wait on the GPU to finish rendering since you can't update resources which are being used. This is why those calls take forever.

Try doing it with a very simple test case and see if you get the same timings/results. It could just be that your integrated graphics card isn't very optimized for DX11-reduced/DX10 (it may very well have been slapped on it as an afterthought).

The slowsort algorithm is a perfect illustration of the multiply and surrender paradigm, which is perhaps the single most important paradigm in the development of reluctant algorithms. The basic multiply and surrender strategy consists in replacing the problem at hand by two or more subproblems, each slightly simpler than the original, and continue multiplying subproblems and subsubproblems recursively in this fashion as long as possible. At some point the subproblems will all become so simple that their solution can no longer be postponed, and we will have to surrender. Experience shows that, in most cases, by the time this point is reached the total work will be substantially higher than what could have been wasted by a more direct approach.

 

- Pessimal Algorithms and Simplexity Analysis


#6 mhagain   Crossbones+   -  Reputation: 8284

Like
1Likes
Like

Posted 24 April 2012 - 07:12 PM

I see that you're doing a lot of render target setting and clearing here as well. This is not going to play well with integrated graphics, which are quite weak in terms of fillrate.

Depending on what you're doing you may be able to get away without clearing the render targets. If you're drawing over the full extents of the target, for example, you really don't need to clear as everything is going to be covered anyway - that should get you back a few frames.

For your final draw, do you even need a depth/stencil view? All that you're doing is blasting the end results of your render to the screen, so you may be able to drop the depth/stencil, and disable depth test/depth write for this part of the draw.

Also very important to consider is that if you're clearing depth you should also clear stencil at the same time - even if you're not using it. This is because depth and stencil are often interleaved with 24 bits for depth and 8 for stencil (it's not clear from your shot if you have this format) so clearing both together can get you a MUCH faster clear.

Finally, those R32G32 textures are not going to perform well at all on this kind of hardware. Consider a simpler format - do you really need all that precision?

It appears that the gentleman thought C++ was extremely difficult and he was overjoyed that the machine was absorbing it; he understood that good C++ is difficult but the best C++ is well-nigh unintelligible.


#7 Lambo   Members   -  Reputation: 100

Like
0Likes
Like

Posted 25 April 2012 - 01:57 AM

Thanks for the tips. I will try to optimize shaders later, yesterday was just to tired. Here's somethings I tried and results:
changed texture format to R16G16: +0 fps
removed three useless render target clearings: +1 fps
added clear depth flag to last depth/stencil clear: +7 fps
Think i can't remove last depth test, because I would need to implement some sort of geometry sorting by depth. My scene: render variance shadow map to R16G16, perform gaussian blur on X axis, then gaussian on Y axis, then render whole scene. I will try to get more accurate timings with flush() command, maybe then I can track the culprit, or just certify that this GPU can't render squat.

#8 Adam_42   Crossbones+   -  Reputation: 2619

Like
0Likes
Like

Posted 27 April 2012 - 04:54 PM

I suspect you're shader bound somewhere if changing the texture format didn't help at all. Try switching each pixel shader in turn with one that returns a constant colour. Note which changes have the biggest effect on FPS.

Are you using the bilinear filter to optimize the gaussian blur?




Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS