Here's something slightly stupid but potentially fun you can do. Using your own rendering code, create a scene with 10000 similarly-sized boxes in random positions so that they are all visible in the camera view. Have them all use the same as-simple-as-possible material, no textures, and have 1 unshadowed directional light shine on them. You can use instancing if you want, but make sure your engine is otherwise doing everything it usually does (ie. frustum culling, batch grouping/sorting etc.)
Now break out the profiler and check where you are bottlenecked, and if you have the time & possibility, check which commercial or open-source rendering engines you did just beat if you replicate the same scene using them
Naturally, this does not have direct real-world applicability as usually there are many different objects, materials, lights etc. in a scene but still it should show the raw upper limit of your rendering code's CPU throughput. Personally, this helped me identify cache miss issues in my own code that would otherwise have gone unnoticed.
The 10000 box challenge
What I did lately was trying to figure out what the best way to draw metric ton of cubes (or quads actually) from a gpu perspective. Basically put everything into one huge vbo and draw that. I was using 256000 cubes and concluded that on my gtx560 the fastest was to use a plain indexed vbo (~6.3ms), followed by "instancing" via geometry shader (~7ms), unindexed vbo (~10ms) and instancing via divisors in opengl (24ms).
So my preferred approach at the end was the geometry shader because it also has lowest storage requirements in vram.
So my preferred approach at the end was the geometry shader because it also has lowest storage requirements in vram.
In the geometry shader case, did you do culling also in the GS? Or in the plain indexed case, would you modify the index buffer to select what to draw? (Disregard if you were always drawing everything)
I modified my gfxapi Geometry demo scene to render 10k cubes (instead of the default 50). Without any other changes to my render code, I get 15-20fps on my Macbook Air. According to Very Sleepy profiler, the majority of the time is spent inside the Intel HD 3000 GPU driver.
The test code shader computes two directional light contributions (one from camera, one towards the camera).
Note though that my code is not apples-to-apples comparable to rendering engines - it does not have a renderer or a scene system: it's simply a hard-coded rendering loop on top of a low-level graphics API abstraction (see gfxapi in my sig).
The test code shader computes two directional light contributions (one from camera, one towards the camera).
Note though that my code is not apples-to-apples comparable to rendering engines - it does not have a renderer or a scene system: it's simply a hard-coded rendering loop on top of a low-level graphics API abstraction (see gfxapi in my sig).
I wrote a sort of benchmark for 3D with flash using the GPU
post here:
http://blog.bwhiting.co.uk/?p=362
demo here:
http://bwhiting.co.uk/b3d/stress2/ <-----
press "n" twice to select a cube mesh
press "+" to keep adding 500 cubes
press "m" to change material, from very simple colour to normal mapped
wasd/up down left right to to fly around and get all the cubes into the viewport (scene stats on top left)
maybe someone with an EPIC graphics card and an i7 could hit the 10,000 cube mark (on my machine it really starts to chug - 25 fps with flash player 11.3 release build)
press "space" to toggle the rotations (10,000 of these will be quite intensive)
good luck and hope no machines explode
post here:
http://blog.bwhiting.co.uk/?p=362
demo here:
http://bwhiting.co.uk/b3d/stress2/ <-----
press "n" twice to select a cube mesh
press "+" to keep adding 500 cubes
press "m" to change material, from very simple colour to normal mapped
wasd/up down left right to to fly around and get all the cubes into the viewport (scene stats on top left)
maybe someone with an EPIC graphics card and an i7 could hit the 10,000 cube mark (on my machine it really starts to chug - 25 fps with flash player 11.3 release build)
press "space" to toggle the rotations (10,000 of these will be quite intensive)
good luck and hope no machines explode
Nice demo bwhiting. On my macbook air, upping the content amount until 10k cubes were visible, I got about 18fps (pressed spacebar to stop the animation, which helps a bit). The fan got quite audible, but no explosions at least
Cool demo! On a fairly powerful notebook (GTX 670M) I got 50fps with 10000 objects, which is roughly as fast as Unity
I hit 100k no problem with my i7 2600K and AMD 6950, even with rotations and normal mapping turned on. You puny mortals with your laptops can bow before the might of my desktop!
I hit 100k no problem with my i7 2600K and AMD 6950, even with rotations and normal mapping turned on. You puny mortals with your laptops can bow before the might of my desktop!
Pathetic desktop.
My laptop hit 100k no problem, normal mapped + anim. 45fps.
i7 quadcore, Nvidia 4200M
As an additional note: there is no difference in framerate on my machine between any of the stages. No shading, normal mapped, or other.
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement