Direct3D9 vs OpenGL 2.0 - discrepancy in speed

Started by
14 comments, last by maxest 12 years, 5 months ago
Hey guys,

I have some discrepancies in speed in my two renderers.
The test scene: some model house (3000 faces) rendered 2500 times in the same place. Yeah, I know, massive overdraw. But that's not what I'm testing (I oriten camera so that it only sees the "clear color"). I wanted to test the geometry throughput and time of draw calls. As far as I know, OGL is better when it comes to sole draw calls. I can assure you that nothing but the clearing commands and draw calls are called each frame. And yet, a bit surprisingly, OpenGL is getting 21 FPS and Direct3D9 is getting 23 FPS. How is that possible? I expected them to be equal, maybe to OpenGL's favour (because of the cost of draw call). But the practice shows exactly the opposite.

Has anyone experienced something similar?
Advertisement
It could be any number of things.

It could be how you are using the API. It could be the drivers.
Drivers are complex beasts. Some drivers even have specific code branches for specific sets of rendering calls to speed up commercial games (which can affect YOUR performance).
The two API's use different shading languages (HLSL vs GLSL) which, depending on how they are compiled, could lead to shaders that perform differently.

In the very end, all you can do is optimizing your rendering for whichever API and system you are targeting.

In the very end, all you can do is optimizing your rendering for whichever API and system you are targeting.
[/quote]
I'd gladly like to do that, but how if I encounter problems at the very beginning. I just can't see what I should do to make OGL faster in plain geometry processing... I guess I will need to conduct some more tests.


In the very end, all you can do is optimizing your rendering for whichever API and system you are targeting.

I'd gladly like to do that, but how if I encounter problems at the very beginning. I just can't see what I should do to make OGL faster in plain geometry processing... I guess I will need to conduct some more tests.
[/quote]

How are you sending your data to GL? Immediate mode functions (glVertex()), Vertex Arrays, VBOs? There are a lot of different paths you can take in GL2.0, and they can have significantly different performance profiles. VBOs are generally going to be fastest, as the other two methods require you to copy your geometry data from system ram to GPU memory every frame. (As a note, VBOs can do that as well, but only if you ask them to and/or change the buffer contents). So if you're using DrawPrimitive on D3D, and glBegin()/glEnd() on OGL, that right there is a huge difference in the way you're actually getting your data to the card.
Oh please... of course I don't use Immediate mode :P
I use VBO of course. I have code of both renderers organized in such a way that exhibits as much of similarities of both APIs as possible. Thus, I use VBO and IBO for data storage

btw: I once tested this Unigine's DX11 benchmark and I recall that the performance of D3D9 and OGL's renderers was pretty much the same. So I would say that both renderers should yield the same performance in my, much simpler, case
So, your total frame times are 47.6ms vs 43.5ms (always use ms, not FPS). Is that GPU-bound or CPU-bound?

4.1ms difference is quite a bit... But also, 40+ms total just for draw-calls is ridiculous to begin with. In my renderer, drawing 500 objects takes about 1ms of CPU time (extrapolated to 2500 objects, that's 5ms) -- so the fact that it's taking you 40+ms is a red-flag to begin with.

Perhaps you should post some drawing code for your D3D/GL versions?
My experience is that OpenGL has faster draw calls (but see the next para) whereas D3D makes more efficient use of vertex buffers (the OpenGL VBO API is crocked, to be honest, and has needed almost constant patching to keep it up to speed). In my own code I've never been able to get OpenGL to match D3D's performance, and in extreme cases D3D can run up to twice as fast.

The wild card here is D3D10 or 11 class hardware. If you have D3D10 or 11 class hardware, and if you're running on Windows 7 (and presumably Vista but I haven't tested) then DrawPrimitive doesn't suffer the same CPU-side overhead as otherwise; it's just as fast as OpenGL calls. This is with standard D3D9 too (not even 9ex). My current renderer has the ability to switch between various batching modes, including unbatched, and the unbatched version is very competitive; over 3,000 draw calls per frame with a loss of just over 10% performance in this scenario (compared to dropping to under a quarter of the speed on XP/D3D9-class hardware). With lighter loads (~500 draw calls) it's actually faster than the batched version.

Benching the same scene using GPU Shark shows D3D9 hitting 97% GPU usage but OpenGL hitting 89%, so OpenGL seems to be spending more CPU time in the driver too.

So - closing the gap between draw call performance, plus better handling of vertex buffers, plus more efficient GPU usage, and toss in the fact that most Windows OpenGL drivers are Not Very Good, and you see D3D going ahead. How times change.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

Hogman: there is really no point in putting code here ;). I spent a lot of time to force my renderer to call as few functions as possible. For the scene I'm testing GL Intercepts logs only this:

glClearColor( ??? )
glClearDepth( ??? )
glClear( ??? )
glDrawElements( ??? ) GLSL=5 Textures[ (0,3) (1,1) (31,2) (31,3) ]
glDrawElements( ??? ) GLSL=5 Textures[ (0,3) (1,1) (31,2) (31,3) ]
glDrawElements( ??? ) GLSL=5 Textures[ (0,3) (1,1) (31,2) (31,3) ]
...
glDrawElements( ??? ) GLSL=5 Textures[ (0,3) (1,1) (31,2) (31,3) ]
wglSwapBuffers( ??? )

DX's log should be quite similar.

As mhagain pointed out, it could be drivers. I recall testing my game on my previous renderer (also support for D3D9 and OGL, but more clumsy code) and the performance was that D3D9 was +/- 35% faster than OGL. Moreover, tested under Linux, the very same game, gave like 50% drop with respect to D3D9. So the Linux's OGL version was like 30% slower than the Windows' OGL version. Honestly, it's shocking to me that the driver could cause so much damage.

I would really like to finally could state: yes, that's the driver. But then I have Unigine's Heaven demo in mind which runs at the same speed for both renderers. Maybe it's because it uses OpenGL 3.2 (or something like that)? But I doubt that could have such big impact.

Since I want to have a cross-platform renderer I thought that maybe I could actually drop D3D9 and focus on OGL (it's always easier to maintain only one of them). But when I see the perf. difference I just can't afford it :P
I've also been maintaining a renderer for both Direct3D9 and OpenGL 2.0, which should use as similar shaders and other API objects (ie. vertex buffers vs VBO's) on both API's as possible, and have consistently found Direct3D9 to have slightly better performance on NVIDIA & AMD hardware. On Intel integrated chips like the HD Graphics the difference is ridiculous - although the rendering is correct, OpenGL gets single-digit framerates where Direct3D9 would get something like 40-50. Practically, at least if one wants to support Intel chips at all, one has to maintain a Direct3D renderer too.
I've just checked the speed of rasterization. I rendered 100 times the same object (3000 faces) in the same place (massive overdraw). So the number of draw calls is 100 (not much) and the number of tris is rather modest average. In that case I get 34 FPS for D3D9 and 36 FPS for OGL. To me that would be rather weird if D3D9 was faster in geometry processing and OGL faster in rasterization - they are just APIs using the same piece of silicon. Nevertheless, I finally got GL faster than D3D.
For D3D9 I use HLSL
For OGL I use GLSL

This is interesting that in my game running on my old renderer, where OGL's performance was evidently worse than D3D9's, I used ARB shaders (and the whole Cg toolkit). Could it be that GLSL is more effective than pure "old-school" ARB ASM shaders?

This topic is closed to new replies.

Advertisement