Jump to content

  • Log In with Google      Sign In   
  • Create Account

Interested in a FREE copy of HTML5 game maker Construct 2?

We'll be giving away three Personal Edition licences in next Tuesday's GDNet Direct email newsletter!

Sign up from the right-hand sidebar on our homepage and read Tuesday's newsletter for details!


We're also offering banner ads on our site from just $5! 1. Details HERE. 2. GDNet+ Subscriptions HERE. 3. Ad upload HERE.


#Actualtheagentd

Posted 21 June 2013 - 12:58 PM

EDIT: This seem to be caused by glDrawElementsInstanced() being extremely CPU intensive. See my 4th post further down.

 

Hello.

I have this situation where I need to render a large number of instances of a few different very simple meshes (100-300 triangles). Rendering them one and one turned out to be too CPU intensive, so instancing seemed like the perfect solution except for the fact that it requires OGL3. Therefore I came up with a "pseudo"-instancing method where I duplicated and stored my model 128 times in a VBO (instead of just once) and could therefore render up to 128 tiles in a single draw call by uploading instance positions to a uniform vec3[] which was used by a shader to position each instance.

Now I've also implemented an OGL3 version where I upload my per-instance data using super efficient manually synchronized VBO mapping instead of using glUniform3f and render the geometry using real instancing. However, this turned out to be remarkably slower on Nvidia hardware up to the point where my pseudo-instancing was 40% faster than real instancing. On the other hand, on AMD and Intel hardware real instancing is (sometimes much) faster.

 

Here's my test result data. Test1 = real instancing, Test2 = pseudo-instancing.

 

 

AMD HD5500
Test1: 27
Test2: 22
 
AMD HD6970
Test1: 225
Test2: 57
 
AMD HD7790
Test1: 195
Test2: 35
 
Nvidia GTX 295
Test1: 68
Test2: 89
 
Nvidia GTX 460M (laptop)
Test1: 83
Test2: 87
 
Intel HD5000*
Test1: 60
Test2: 54
 
GT 630 (rebranded 500 series GPU)
Test1: 54
Test2: 55
 
*Tested at much lower rendering resolutions to reduce the fragment bottleneck, so the FPS numbers on this test is much higher than it should be compared to the other cards.
 
 
My pseudo-instancing uses 128x more bandwidth (and memory), around 10x more draw calls and much less efficient memory uploading to the GPU than real instancing. I cannot for the love of god fathom why in the world this would be faster on Nvidia cards.

#1theagentd

Posted 15 June 2013 - 04:44 PM

Hello.

I have this situation where I need to render a large number of instances of a few different very simple meshes (100-300 triangles). Rendering them one and one turned out to be too CPU intensive, so instancing seemed like the perfect solution except for the fact that it requires OGL3. Therefore I came up with a "pseudo"-instancing method where I duplicated and stored my model 128 times in a VBO (instead of just once) and could therefore render up to 128 tiles in a single draw call by uploading instance positions to a uniform vec3[] which was used by a shader to position each instance.

Now I've also implemented an OGL3 version where I upload my per-instance data using super efficient manually synchronized VBO mapping instead of using glUniform3f and render the geometry using real instancing. However, this turned out to be remarkably slower on Nvidia hardware up to the point where my pseudo-instancing was 40% faster than real instancing. On the other hand, on AMD and Intel hardware real instancing is (sometimes much) faster.

 

Here's my test result data. Test1 = real instancing, Test2 = pseudo-instancing.

 

 

AMD HD5500
Test1: 27
Test2: 22
 
AMD HD6970
Test1: 225
Test2: 57
 
AMD HD7790
Test1: 195
Test2: 35
 
Nvidia GTX 295
Test1: 68
Test2: 89
 
Nvidia GTX 460M (laptop)
Test1: 83
Test2: 87
 
Intel HD5000*
Test1: 60
Test2: 54
 
GT 630 (rebranded 500 series GPU)
Test1: 54
Test2: 55
 
*Tested at much lower rendering resolutions to reduce the fragment bottleneck, so the FPS numbers on this test is much higher than it should be compared to the other cards.
 
 
My pseudo-instancing uses 128x more bandwidth (and memory), around 10x more draw calls and much less efficient memory uploading to the GPU than real instancing. I cannot for the love of god fathom why in the world this would be faster on Nvidia cards.

PARTNERS