Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 21 Feb 2010
Offline Last Active Oct 26 2012 11:45 AM

Posts I've Made

In Topic: VAO, VBO speed

14 October 2012 - 08:40 AM

Everything said is correct. Binding shader program and VAO should be outside of the loop. Binding VBO is meaningless, since it is part of VBO (as beans222 said).
BUT, performance boost is not assured after all this changes since drivers optimize most of the operations. I could firmly claim that NV drivers will not update uniforms if the values are not changed from the previous call. So, dirty-flags and optimized updates are nice but do not necessary lead to higher performance. On the other hand, the number of driver's call certainly can be a bottleneck.

Uhm... I think I have misunderstood how thinks work. If you have 10 objects that requires 10 different shaders... should I only bind the shader once outside the drawing loop?

In Topic: VAO, VBO speed

14 October 2012 - 08:34 AM

1. 4000 x (bindvertexarray(0) + gluseprogram(0)) = un-needed
remove them at the end of the function

Why is this un-needed? I thought you were supposed to do this to clean things up for the next object, that might not have the same setup and shaders?

3. why are you creating 6 doubles at the start of the function?
they're kinda slow compared to regular float, and unless you can justify their use, replace them with float for the entire program
also, remove them from the function, and use the position values directly
either in the form of vectors, or matrices

heres the thing: the cpu and gpu likes to use its cache
ive been told that the cache is 800 times faster than reading from ram, in that respect, creating temp variables (the doubles) is fast once theyve been created
you are however creating 2 matrices for 4000 objects, that SLOW
and... on top of that, you are rendering ___4000___ objects, without justification!

Yeah, I agree about the floats and I should probably not update MVP unless something has been updated. However, I tried to disable all code until the shader bind, and that did not increase performance. Still 20fps, so the bottleneck is not in the matrix math. =/ 4000 object might seem a lot without justification... yes. What I am trying to do is converting my engine to opengl 3.2. Before this change the fps was 60, with drawing and updating. The drawing was not a bottleneck, at all.

the gpu likes to render lots of stuff in _one_ call so if there's any way you can remodel a little, and make rendering more composite
ie. gather some objects near each other and make them into 1 draw call, so that you perhaps go from 4000 to 1000, or even better 400 calls
there's lots of options here!
somewhat regretfully, this is the state of the modern day rasterizer
it's powerful in parallell, but each core is a little slow, and its very very bandwidth limited (talk less, render more!)
by talking i mean sending commands to the GPU, or uploading data (if you upload lots of data in one go, that's usually very fast )

One thing that might be an optimization is to draw everything in one grid cell in one draw. If lucky, the objects are scattered over multiple cells. Thanks for that.

In Topic: Debug faster than release

21 February 2010 - 06:51 AM

Original post by KulSeran
Are you using threading? the timing differences between release code running faster than debug code can bring out some really strange bugs if you didn't write proper threading code.

Yes, I'm using pthreads to separate the engine into several concurrent components/threads. That was interesting and I will look into it (even if I haven't changed anything with the threading)...

In Topic: Debug faster than release

21 February 2010 - 06:41 AM

Original post by Buckeye
I'm not a Linux or gcc person so I don't know if it's a possibility, but are you, perhaps, using the debug library for the release exe?

Would be very happy if it was that simple. I haven't changed anything about how I build or which libraries...