Jump to content
  • Advertisement
Sign in to follow this  
zm_qiu

OpenGL Speed up animation of large polygonal model

This topic is 3531 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi, there I am developing a OpenGL program on Cocoa/OS X to animate large polygonal model (triangle count around 1M). I am frustrated to see that I cannot achieve decent frame rate (30FPS or above). I tried all techniques I know (avoid glFlush, double buffer, display list, avoid state switch...). Via opengl driver monitor/opengl profiler, I have the following facts: The video memory is 50% used (not fully utilized) The GPU core is always nearly 100% utilized In view's drawRect function most time is spent on [[self openGLContext] flushBuffer] (or glFlush) with average calling time as 150ms. I believe that the bottleneck is at GPU side after I checked CPU utilization, memory, disk usage etc. Given all the facts, what can I do to speed up the animation? My machine is Macbook Pro, 2.4 GHz Intel Core 2 Duo, 2G memory, 512M video memory. Maybe the only thing is to optimize algorithm instead of implementation? I really appreciate your help. ZM

Share this post


Link to post
Share on other sites
Advertisement
from http://developer.apple.com/documentation/GraphicsImaging/Conceptual/OpenGL-MacProgGuide/opengl_performance/chapter_13_section_2.html

Quote:
For double-buffered contexts, the current OpenGL command buffer is not sent to the graphics processor until glFlush or glFinish is called, a buffer swap is requested, or the command buffer is full.


Basically this means that, on double-buffered contexts, a glFlush is not normally needed because a buffer swap request is equivalent to a glFlush call...

Share this post


Link to post
Share on other sites
Quote:
Original post by larvyde
from http://developer.apple.com/documentation/GraphicsImaging/Conceptual/OpenGL-MacProgGuide/opengl_performance/chapter_13_section_2.html

Basically this means that, on double-buffered contexts, a glFlush is not normally needed because a buffer swap request is equivalent to a glFlush call...



Thank you for your quick help, Larvyde.

Initially I used glFlush but changed it to [openglContext flushBuffer] after reading some documents. I thought it would improve performance but strangely no much difference: given the same setting (triangle count, frame rate), glFlush and flushBuffer yield the same performance.

My understanding now is that the GPU core is in the hotspot while there are unused resources like CPU, main memory and video memory. I really hope to make full use of them to make things faster.

Share this post


Link to post
Share on other sites
Quote:
Original post by zm_qiu
My understanding now is that the GPU core is in the hotspot while there are unused resources like CPU, main memory and video memory. I really hope to make full use of them to make things faster.


Most likely that won't help you, if the bottleneck is indeed the GPU. You mentioned display lists, is your object static, or does it change every frame ? You may look into vertex buffers(VBO), but for static objects don't expect any significant speedups. Also, is it textured/lighted/filled polygons or wireframe ?

The thing is that GPU hardware is limiting factor on how much triangles you can process(with T&L) per second, and there's nothing you can do about that. If your object is 1M tris, for 30FPS you need to process 30M tris per second. Depending on the type of the application (CAD/game/etc.) you may either need to reduce the polygon count dynamically (LOD) or use low-poly model with normal-mapping (where normal map is generated from high-poly model).

Share this post


Link to post
Share on other sites
A static model of 1,000,000 textured and lit triangles should render at 30+ fps on that machine - they would on my first gen MacBook Pro. Make sure you are hitting the performance path in every respect though:

Render all geometry from VBO, and make sure each batch is fairly large (minimum 4k vertices per batch). Use only indexed triangles (indices stored in a separate VBO), and make sure your vertices are a multiple of 32 bytes in length (one cache line). For even higher performance, reorder your vertices for cache coherency.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!