intel i5 2500
radeon hd 6870 (i think its factory overclocked by a bit)
got something like 17k visible without fps dropping below 60. If i remember right one or more of (material/shape/rotation) didnt really affect fps.
I remember testing this same thing earlier...
The 10000 box challenge
intel i5 2500
radeon hd 6870 (i think its factory overclocked by a bit)
got something like 17k visible without fps dropping below 60. If i remember right one or more of (material/shape/rotation) didnt really affect fps.
I remember testing this same thing earlier...
Yes, just went back and tested again, none of the shapes changed the framerate at all (nor the ms per frame), nor did any of the other stages. Which means this demo is CPU bound most likely, probably the culling code.
[quote name='Waterlimon' timestamp='1339877683' post='4949883']
intel i5 2500
radeon hd 6870 (i think its factory overclocked by a bit)
got something like 17k visible without fps dropping below 60. If i remember right one or more of (material/shape/rotation) didnt really affect fps.
I remember testing this same thing earlier...
Yes, just went back and tested again, none of the shapes changed the framerate at all (nor the ms per frame), nor did any of the other stages. Which means this demo is CPU bound most likely, probably the culling code.
[/quote]
Thanks for giving it a whirl, the timing for the culling in ms is displayed in the top left (1st line of green text) and for me doesn't usually go over 2ms even for very large numbers of objects,, as far as I know the culling cannot be speeded up any more without using a hierarchical bounding structure. Am currently writing a post about the technique I use, its nothing new by any means but might help someone having the code out there and someone might be able to improve it.
The demo is definitely CPU bound though for anyone with a good graphics card, and the bulk of the time is spent in the issuing of the drawTriangles() function. If I remember rightly its around the 30% mark maybe even more.
This is a shame as a really eats into time left for the CPU to work on anything else.
For those of you who have a good idea about bottlenecks is this commonplace and is it usually that high, I imagine its to do with how adobe wraps implements things under the hood, it just simply be a limitation of the speed of ActionScript,
If you were to natively issue draw calls without changing state to render a single triangle would you expect it to still be so expensive that 5,000 calls on a middle end machine?
Neat. I can handle them at about 530fps on a puny laptop with a Radeon 6490m. 100k boxes gives me 73fps, which comfortably sits ~10fps above my refresh rate so I've some headroom for transients.
This is just using some pretty standard D3D11 instancing; per-instance data consists of a matrix and a colour, each box shares a static vertex buffer (8 vertexes, position only) and index buffer (36 indexes), they start out as 1x1 cubes and the matrix expands them to their proper scale.
I haven't profiled but I already know that I'm bottlenecking on CPU-side matrix transforms and instance buffer uploads. I could reduce that by just using the position and the scale as per-instance data and constructing a matrix in the vertex shader, but that would be cheating by optimizing for this benchmark. I might do it anyway for fun.
Update: yeah, that was a useful boost. Trading a CPU-side matrix transform per-box and a larger per-instance vertex versus an extra GPU-side matrix transform per-vertex (as well as constructing a matrix on the fly in my vertex shader) was well worth it - 100k case up to 120fps. Now I gotta look for other areas I can similarly optimize...
This is just using some pretty standard D3D11 instancing; per-instance data consists of a matrix and a colour, each box shares a static vertex buffer (8 vertexes, position only) and index buffer (36 indexes), they start out as 1x1 cubes and the matrix expands them to their proper scale.
I haven't profiled but I already know that I'm bottlenecking on CPU-side matrix transforms and instance buffer uploads. I could reduce that by just using the position and the scale as per-instance data and constructing a matrix in the vertex shader, but that would be cheating by optimizing for this benchmark. I might do it anyway for fun.
Update: yeah, that was a useful boost. Trading a CPU-side matrix transform per-box and a larger per-instance vertex versus an extra GPU-side matrix transform per-vertex (as well as constructing a matrix on the fly in my vertex shader) was well worth it - 100k case up to 120fps. Now I gotta look for other areas I can similarly optimize...
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement