Large Query Count?

Started by
11 comments, last by 3TATUK2 10 years, 5 months ago

Using a time query, I get 8 ms/frame when no tris hit the screen. Maybe your card is just 8x better than mine :> What kind of GPU is that? Mine is a really low end card, radeon HD 5450... Curious to compare GPU benchmark stats from videocardbenchmark.net . . . I get a G3D Mark of 234 . . . So I'm "pretty sure" this is vertex bound. After all, all vertices sent through still need to be processed and transformed through the vertex shader before it can even be determined that they're offscreen to bypass the fragment shader.

Yeah, I'm glad that it *works* - but slightly disappointed it doesn't really come close to good engines' optimizations. Like I've mentioned before as a comparison, the darkplaces quake engine that xonotic runs with this same exact map on my system runs at 200+ FPS. From what I've heard they use a combination of portals, and pre-baked PVS

Advertisement
I have a GeForce 660M which actually is quite a lot faster.
http://www.notebookcheck.net/NVIDIA-GeForce-GTX-660M.71859.0.html

Is that radeon card the mobile version or the desktop version?


8ms is still a bit slow, but already three times faster then the 25ms which I assume you measured on the CPU. This means that you are CPU bound. Unless this is due to some crazy physics or fluid simulations running in the background, your draw calls are too expensive. Occlusion culling does help here (as long as it reduces the draw call count) but you should double check your rendering code first.

Interesting things to consider are: How many draw calls are there per frame (or on averge how many triangles per draw call) and how many state changes to set them up? What are the hotspots that the profiler reports? How much time is spend in the application, and how much in the OpenGL runtime? Are there any blocking requests that wait on the GPU (these can be very subtile and easy to overlook like a simple glGetError() or a query that wasn't ready)?

I have the HD 5450 desktop card...

The difference between my 8ms report and the 25ms is for a few reasons. . . First off, I was doing a depth pre-pass which basically doubles the amount of vertex transformation time, so that's up to 16 ms right there, secondly the non-rendering part of the CPU frame takes about 4 ms so you're up to 20, and the 25ms report was based on an older version of my map which was actually closer to 250K instead of this latest one which is lower at about 200K.

So the 25ms is still "accurate" accounting for those factors and still shows that it's GPU bound.

I really think it's just slower than your card and that's all. . .

My octree occlusion implementation has essentially been unchanged for about two or three days now, and I really believe it's probably "as good as it'll get". Pre-baked PVS is "just that much" better. Like I said I get about 2x better performance on this same map in the xonotic engine . . . Because of PVS.

I've been thinking about how to potentially even start a PVS implementation, and it essentially amounts to something like this:

For each base octree node, or even spatial grid cube maybe . . .

For each of 6 camera view frustum directions . . .

Render each individual triangle wrapped in an occlusion query . . .

If the triangle passes, then add it to the list of "visible triangles" in that 1/6 frustum section for that octree node/spatial grid cube

Then. . . during runtime, determine which octree node/spatial grid cube the camera is in... intersect the camera frustum with the 6 view directional frustums to get (i guess) at most which 3 frustums are potentially visible, then render all triangles in the "visible triangles" lists of the frustums which are intersected . . .

This topic is closed to new replies.

Advertisement