Jump to content
  • Advertisement
Sign in to follow this  
mv348

[SOLVED] Rendering models still too slow - advice?

This topic is 2219 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

So I'm working with the FBX model format, which is a bear. I've spent the past few weeks learning about and writing shaders to move vertex deformaiton to the CPU. Made signfificant improvement in rendering time, but mesh rendering (all done with vbo and shader, via a call to glDrawElements(...) ) takes about 5 milliseconds to render (not counting, per frame updates on bone matrices and whatnot). The model has 1368 vertices, but since attributes are stored per polygon vertex, (and each polygon is a triangle) there are 1368*3 = 4104 sets of vertex attributes, rendered via indexing.

Considering that there's only 20 millisecond per frame to run at 50 fps that seems a bit much.

My application is a win32 application written in C++ using Visual Studio 2010.

Here is some info about my hardware:

System info:


Laptop: Dell XPS L511Z
Processor:
Intel® Core™
i5-2410M CPU @
2.30 Ghz 2.30Ghz
Installed Memory (RAM) 6.00 GB.
System type: 64 bit operating system.
Operating system: Windows 7.
[/quote]

Graphics card:

Card type: Nvidia GeForce GT 525M
Driver Version: 285.77
DirectX support: 11
CUDA Cores: 96
Graphics clock: 600 Mhz
Processor clock:: 1200 Mhz
Memory clock: 900 Mhz (1800 Mhz data rate)
Memoryinterface: 128 bit
Total available graphics memory: 3797 MB
Dedicated video memory: 1024 MB DDR3
System video memory: 0 MB
Shared System Memory: 2773 MB
Video BIOS version: 70.08.53.00.07
IRQ: 16
Bus: PCI Express x 16 Gen 2
[/quote]

So what can help improve my rendering speed?

Ideas I have so far are:

1. Use fullscreen mode. Can I expect a significant advantage here? I realize that currently my application is only using whatever time-slice windows is granting it to work. Would fullscreen mode eliminate this and make a significant difference?

2. For each vertex I have bone indices and weights stored as vertex attributes. My present model uses a maximum of 3 bone influences per vertex. Would I get a speed advantage out of storing these attributes in 2 vbo's of type ivec4 and vec4 instead of having 6 vbo's for these attributes?

Any more ideas or suggestions would be greatly appreciated. Edited by mv348

Share this post


Link to post
Share on other sites
Advertisement
First up: how are you measuring these costs in ms, and do you know whether the GPU or the CPU is the bottleneck? Edited by Hodgman

Share this post


Link to post
Share on other sites
I'm measuring the costs using a simple timer function in my C++ code. The 5ms is measured by starting and stopping the timer immediately before my call to glDrawElements.

Here's my render code:

void VBOMesh::Draw(int pMaterialIndex, ShadingMode pShadingMode) const
{
watch->startWatch("VBOMesh::draw");
// Where to start.
GLsizei lOffset = mSubMeshes[pMaterialIndex]->IndexOffset * sizeof(unsigned int);
if ( pShadingMode == SHADING_MODE_SHADED)
{
const GLsizei lElementCount = mSubMeshes[pMaterialIndex]->TriangleCount * 3;
glDrawElements(GL_TRIANGLES, lElementCount, GL_UNSIGNED_INT, reinterpret_cast<const GLvoid *>(lOffset));
}
else
{
for (int lIndex = 0; lIndex < mSubMeshes[pMaterialIndex]->TriangleCount; ++lIndex)
{
// Draw line loop for every triangle.
glDrawElements(GL_LINE_LOOP, TRIANGLE_VERTEX_COUNT, GL_UNSIGNED_INT, reinterpret_cast<const GLvoid *>(lOffset));
lOffset += sizeof(unsigned int) * TRIANGLE_VERTEX_COUNT;
}
}
watch->stopWatch();
}



I don't know for sure but my best guess since this is just a single opengl call, is that the GPU is the bottleneck.

Share this post


Link to post
Share on other sites
What timer are you using? IIRC, Windows only guarantees 5ms accuracy on some timers. If its based on QueryPerformanceCounter its probably fine though.

Try drawing the model multiple times (maybe ten or a hundred times) and see if the render time goes up linearly.

Share this post


Link to post
Share on other sites
I am using <time.h> and the clock() timer function. Has worked fine for me in the past. Based on how rapidly the frames are moving, the timer seems accurate.

I tried drawing it 4 times per frame and it did indeed go up linearly. Consistently about 5 ms per draw.

Share this post


Link to post
Share on other sites
Ok, this means that you're only timing the CPU-side cost of submitting commands to OpenGL. You're not timing the GPU at all.

The GPU has a large amount of latency from the CPU (usually at least an entire frame) -- when you call a GL function, such as [font=courier new,courier,monospace]glDrawElements[/font], all that function does is write the arguments into a queue, which the GPU will consume much later on (perhaps several frames later).

That said, it is extremely strange for [font=courier new,courier,monospace]glDrawElements[/font] to be consuming 5ms of CPU time! This indicates the your OpenGL driver is internally doing a LOT of work on the CPU in order to submit this draw-command, such as moving data from main-RAM buffers over to GPU-RAM buffers, or otherwise preparing GPU resources that will be required by the draw-command...

You mentioned CPU vertex deformation, are you generating vertex data on the CPU every frame? Or, do you have any other dynamic data that the CPU is generating every frame?

Share this post


Link to post
Share on other sites
Sorry, that was a typo. Vertex deformations are being handled by the GPU. I tried implementing full screen mode but it did not make a noticible difference. And I had the same thought; the call to glDrawElements should take like no CPU time at all. Its as it if is working in immediate mode. Are there some GL flags or options I am maybe forgetting to set to disable immediate mode rendering?

My setup is:

at initialization of the model:
set up and fill vbo's for vertex positions, indices, texture coordinates, bone indices, and bone weights. This is only done once.

immediately before rendering:
updated bone matrices are passed as uniforms to shader (maybe about 20 matrices). vbo's are bound and attribute pointers set. (this takes less then 1 ms of cpu time)

rendering:
start watch
call glDrawElements()
stop watch

Consistenly it takes about 5ms for the glDrawElements command to return. Edited by mv348

Share this post


Link to post
Share on other sites
Still grappling with this issue.

Is there any way I could detect if the GPU is pulling data from the CPU when I call glDrawElements? Could this be a problem with GLEW?

Share this post


Link to post
Share on other sites
The clock() function is only guaranteed to be accurate within about 50 milliseconds - I would suggest moving to a high accuracy timer (timeGetTime with timeBeginPeriod, or QueryPerformanceCounter).

There is no particular requirement that the OpenGL driver allocate VBOs in graphics memory, though it would be strange for it not too unless you have run out of VRAM.

What video card is this? Are your drivers up to date? how much other data (textures, renderbuffers, etc.) are you uploading to the card?

Share this post


Link to post
Share on other sites
I don't believe clock will work either. Can you at least run it in fraps without vsync just so we can verify that a working timer works and that you have a reasonable framerate? To me it sounds like your timer is just not working properly yet.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!