Jump to content

  • Log In with Google      Sign In   
  • Create Account


The 10000 box challenge


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
14 replies to this topic

#1 AgentC   Members   -  Reputation: 1262

Like
1Likes
Like

Posted 28 April 2012 - 10:12 AM

Here's something slightly stupid but potentially fun you can do. Using your own rendering code, create a scene with 10000 similarly-sized boxes in random positions so that they are all visible in the camera view. Have them all use the same as-simple-as-possible material, no textures, and have 1 unshadowed directional light shine on them. You can use instancing if you want, but make sure your engine is otherwise doing everything it usually does (ie. frustum culling, batch grouping/sorting etc.)

Now break out the profiler and check where you are bottlenecked, and if you have the time & possibility, check which commercial or open-source rendering engines you did just beat if you replicate the same scene using them Posted Image

Naturally, this does not have direct real-world applicability as usually there are many different objects, materials, lights etc. in a scene but still it should show the raw upper limit of your rendering code's CPU throughput. Personally, this helped me identify cache miss issues in my own code that would otherwise have gone unnoticed.

Every time you add a boolean member variable, God kills a kitten. Every time you create a Manager class, God kills a kitten. Every time you create a Singleton...

Urho3D (engine)  Hessian (C64 game project)


Sponsor:

#2 japro   Members   -  Reputation: 887

Like
0Likes
Like

Posted 28 April 2012 - 10:21 AM

What I did lately was trying to figure out what the best way to draw metric ton of cubes (or quads actually) from a gpu perspective. Basically put everything into one huge vbo and draw that. I was using 256000 cubes and concluded that on my gtx560 the fastest was to use a plain indexed vbo (~6.3ms), followed by "instancing" via geometry shader (~7ms), unindexed vbo (~10ms) and instancing via divisors in opengl (24ms).
So my preferred approach at the end was the geometry shader because it also has lowest storage requirements in vram.

#3 AgentC   Members   -  Reputation: 1262

Like
0Likes
Like

Posted 28 April 2012 - 06:34 PM

In the geometry shader case, did you do culling also in the GS? Or in the plain indexed case, would you modify the index buffer to select what to draw? (Disregard if you were always drawing everything)

Every time you add a boolean member variable, God kills a kitten. Every time you create a Manager class, God kills a kitten. Every time you create a Singleton...

Urho3D (engine)  Hessian (C64 game project)


#4 clb   Members   -  Reputation: 1780

Like
0Likes
Like

Posted 14 June 2012 - 02:34 PM

I modified my gfxapi Geometry demo scene to render 10k cubes (instead of the default 50). Without any other changes to my render code, I get 15-20fps on my Macbook Air. According to Very Sleepy profiler, the majority of the time is spent inside the Intel HD 3000 GPU driver.

The test code shader computes two directional light contributions (one from camera, one towards the camera).

Note though that my code is not apples-to-apples comparable to rendering engines - it does not have a renderer or a scene system: it's simply a hard-coded rendering loop on top of a low-level graphics API abstraction (see gfxapi in my sig).
Me+PC=clb.demon.fi | C++ Math and Geometry library: MathGeoLib, test it live! | C++ Game Networking: kNet | 2D Bin Packing: RectangleBinPack | Use gcc/clang/emcc from VS: vs-tool | Resume+Portfolio | gfxapi, test it live!

#5 bwhiting   Members   -  Reputation: 691

Like
1Likes
Like

Posted 15 June 2012 - 10:29 AM

I wrote a sort of benchmark for 3D with flash using the GPU

post here:
http://blog.bwhiting.co.uk/?p=362
demo here:
http://bwhiting.co.uk/b3d/stress2/ <-----

press "n" twice to select a cube mesh
press "+" to keep adding 500 cubes
press "m" to change material, from very simple colour to normal mapped

wasd/up down left right to to fly around and get all the cubes into the viewport (scene stats on top left)

maybe someone with an EPIC graphics card and an i7 could hit the 10,000 cube mark (on my machine it really starts to chug - 25 fps with flash player 11.3 release build)

press "space" to toggle the rotations (10,000 of these will be quite intensive)


good luck and hope no machines explode

#6 clb   Members   -  Reputation: 1780

Like
0Likes
Like

Posted 16 June 2012 - 04:28 AM

Nice demo bwhiting. On my macbook air, upping the content amount until 10k cubes were visible, I got about 18fps (pressed spacebar to stop the animation, which helps a bit). The fan got quite audible, but no explosions at least :)
Me+PC=clb.demon.fi | C++ Math and Geometry library: MathGeoLib, test it live! | C++ Game Networking: kNet | 2D Bin Packing: RectangleBinPack | Use gcc/clang/emcc from VS: vs-tool | Resume+Portfolio | gfxapi, test it live!

#7 AgentC   Members   -  Reputation: 1262

Like
0Likes
Like

Posted 16 June 2012 - 06:06 AM

Cool demo! On a fairly powerful notebook (GTX 670M) I got 50fps with 10000 objects, which is roughly as fast as Unity :)

Every time you add a boolean member variable, God kills a kitten. Every time you create a Manager class, God kills a kitten. Every time you create a Singleton...

Urho3D (engine)  Hessian (C64 game project)


#8 MJP   Moderators   -  Reputation: 10647

Like
1Likes
Like

Posted 16 June 2012 - 01:18 PM

I hit 100k no problem with my i7 2600K and AMD 6950, even with rotations and normal mapping turned on. You puny mortals with your laptops can bow before the might of my desktop! Posted Image

Edited by MJP, 16 June 2012 - 01:23 PM.


#9 Madhed   Crossbones+   -  Reputation: 2713

Like
0Likes
Like

Posted 16 June 2012 - 01:45 PM

100k visible, normal maps + animation = 35fps
Core 2 Quad 2.5GHz, 550GTX Ti

#10 Washu   Senior Moderators   -  Reputation: 4689

Like
0Likes
Like

Posted 16 June 2012 - 02:13 PM

I hit 100k no problem with my i7 2600K and AMD 6950, even with rotations and normal mapping turned on. You puny mortals with your laptops can bow before the might of my desktop! Posted Image

Pathetic desktop. Posted Image

My laptop hit 100k no problem, normal mapped + anim. 45fps.

i7 quadcore, Nvidia 4200M

As an additional note: there is no difference in framerate on my machine between any of the stages. No shading, normal mapped, or other.

Edited by Washu, 16 June 2012 - 02:16 PM.

In time the project grows, the ignorance of its devs it shows, with many a convoluted function, it plunges into deep compunction, the price of failure is high, Washu's mirth is nigh.
ScapeCode - Blog | SlimDX


#11 Waterlimon   Crossbones+   -  Reputation: 2436

Like
0Likes
Like

Posted 16 June 2012 - 02:14 PM

intel i5 2500

radeon hd 6870 (i think its factory overclocked by a bit)

got something like 17k visible without fps dropping below 60. If i remember right one or more of (material/shape/rotation) didnt really affect fps.

I remember testing this same thing earlier...

o3o


#12 Washu   Senior Moderators   -  Reputation: 4689

Like
0Likes
Like

Posted 16 June 2012 - 02:20 PM

intel i5 2500

radeon hd 6870 (i think its factory overclocked by a bit)

got something like 17k visible without fps dropping below 60. If i remember right one or more of (material/shape/rotation) didnt really affect fps.

I remember testing this same thing earlier...

Yes, just went back and tested again, none of the shapes changed the framerate at all (nor the ms per frame), nor did any of the other stages. Which means this demo is CPU bound most likely, probably the culling code.

In time the project grows, the ignorance of its devs it shows, with many a convoluted function, it plunges into deep compunction, the price of failure is high, Washu's mirth is nigh.
ScapeCode - Blog | SlimDX


#13 bwhiting   Members   -  Reputation: 691

Like
0Likes
Like

Posted 18 June 2012 - 02:30 AM


intel i5 2500

radeon hd 6870 (i think its factory overclocked by a bit)

got something like 17k visible without fps dropping below 60. If i remember right one or more of (material/shape/rotation) didnt really affect fps.

I remember testing this same thing earlier...

Yes, just went back and tested again, none of the shapes changed the framerate at all (nor the ms per frame), nor did any of the other stages. Which means this demo is CPU bound most likely, probably the culling code.


Thanks for giving it a whirl, the timing for the culling in ms is displayed in the top left (1st line of green text) and for me doesn't usually go over 2ms even for very large numbers of objects,, as far as I know the culling cannot be speeded up any more without using a hierarchical bounding structure. Am currently writing a post about the technique I use, its nothing new by any means but might help someone having the code out there and someone might be able to improve it.

The demo is definitely CPU bound though for anyone with a good graphics card, and the bulk of the time is spent in the issuing of the drawTriangles() function. If I remember rightly its around the 30% mark maybe even more.

This is a shame as a really eats into time left for the CPU to work on anything else.

For those of you who have a good idea about bottlenecks is this commonplace and is it usually that high, I imagine its to do with how adobe wraps implements things under the hood, it just simply be a limitation of the speed of ActionScript,

If you were to natively issue draw calls without changing state to render a single triangle would you expect it to still be so expensive that 5,000 calls on a middle end machine?

#14 mhagain   Crossbones+   -  Reputation: 7610

Like
0Likes
Like

Posted 19 June 2012 - 01:38 PM

Neat. I can handle them at about 530fps on a puny laptop with a Radeon 6490m. 100k boxes gives me 73fps, which comfortably sits ~10fps above my refresh rate so I've some headroom for transients.

This is just using some pretty standard D3D11 instancing; per-instance data consists of a matrix and a colour, each box shares a static vertex buffer (8 vertexes, position only) and index buffer (36 indexes), they start out as 1x1 cubes and the matrix expands them to their proper scale.

I haven't profiled but I already know that I'm bottlenecking on CPU-side matrix transforms and instance buffer uploads. I could reduce that by just using the position and the scale as per-instance data and constructing a matrix in the vertex shader, but that would be cheating by optimizing for this benchmark. I might do it anyway for fun.

Update: yeah, that was a useful boost. Trading a CPU-side matrix transform per-box and a larger per-instance vertex versus an extra GPU-side matrix transform per-vertex (as well as constructing a matrix on the fly in my vertex shader) was well worth it - 100k case up to 120fps. Now I gotta look for other areas I can similarly optimize...

Edited by mhagain, 20 June 2012 - 03:28 AM.

It appears that the gentleman thought C++ was extremely difficult and he was overjoyed that the machine was absorbing it; he understood that good C++ is difficult but the best C++ is well-nigh unintelligible.


#15 Narf the Mouse   Members   -  Reputation: 318

Like
0Likes
Like

Posted 19 June 2012 - 10:13 PM

67k at 57 FPS on my wimpy old GTS 250.




Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS