• Create Account

## Questions about mesh rendering performance

Old topic!

Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

22 replies to this topic

### #1BlueSpud  Members

Posted 11 December 2013 - 01:48 PM

Hello,

I've been working on my mesh rendering lately for my game engine and with large models I have run into problems. I know that the best thing in OpenGL is to avoid state changes to improve performance. With the model I was rendering, with one solid texture, I was getting ~35-40 fps. After I added multi texturing, I get  ~25-28. I'm using draw lists and here is my code:

//make the list
int materiali = 0;
list = glGenLists(1);
glNewList(list, GL_COMPILE);
glBegin(GL_TRIANGLES);
for (int i = 0; i < ModelRegistry.models[m].m.obj.size(); i++)
{
if (i == ModelRegistry.models[m].m.materialFaces[materiali].i)
{
//we have a texture change here
for (int i = 0; i < Materials.size(); i++)
{

if  (strcmp( ModelRegistry.models[m].m.materialFaces[materiali].name.c_str(), Materials[i].name.c_str()) == 0)
{
materiali++;
if (strcmp( lastMaterial.c_str(), Materials[i].name.c_str()) != 0)
{
glEnd();
lastMaterial = Materials[i].name;
glActiveTexture(GL_TEXTURE0);
glBindTexture(GL_TEXTURE_2D, Materials[i].textureID);
glActiveTexture(GL_TEXTURE1);
glBindTexture(GL_TEXTURE_2D, specularID);
glActiveTexture(GL_TEXTURE2);
glBindTexture(GL_TEXTURE_2D, normalID);
glBegin(GL_TRIANGLES);
}
break;
}
}

}
glNormal3f(ModelRegistry.models[m].m.obj[i].nx1, ModelRegistry.models[m].m.obj[i].ny1, ModelRegistry.models[m].m.obj[i].nz1);
glTexCoord2f(ModelRegistry.models[m].m.obj[i].tx1, ModelRegistry.models[m].m.obj[i].ty1);
glVertex3f(ModelRegistry.models[m].m.obj[i].x1,ModelRegistry.models[m].m.obj[i].y1,ModelRegistry.models[m].m.obj[i].z1);
glNormal3f(ModelRegistry.models[m].m.obj[i].nx2, ModelRegistry.models[m].m.obj[i].ny2, ModelRegistry.models[m].m.obj[i].nz2);
glTexCoord2f(ModelRegistry.models[m].m.obj[i].tx2, ModelRegistry.models[m].m.obj[i].ty2);
glVertex3f(ModelRegistry.models[m].m.obj[i].x2,ModelRegistry.models[m].m.obj[i].y2,ModelRegistry.models[m].m.obj[i].z2);
glNormal3f(ModelRegistry.models[m].m.obj[i].nx3, ModelRegistry.models[m].m.obj[i].ny3, ModelRegistry.models[m].m.obj[i].nz3);
glTexCoord2f(ModelRegistry.models[m].m.obj[i].tx3, ModelRegistry.models[m].m.obj[i].ty3);
glVertex3f(ModelRegistry.models[m].m.obj[i].x3,ModelRegistry.models[m].m.obj[i].y3,ModelRegistry.models[m].m.obj[i].z3);

}
glEnd();
glEndList();



Here are my questions, they should be fairly basic, even if you don't understand the code.

Can new textures be bound inside glBegin() ?

Is there anything faster than glBindTexture() ?

Is the accessing data from the std::vector slowing down the rendering?

Does the GPU have to go through the for loop every time the list is called?

Those are other questions, but I think those are the big ones. Any input would be appreciated because the visual result of the rendering is great, just the frame rate isn't. Thanks.

### #2laztrezort  Members

Posted 11 December 2013 - 04:42 PM

Sorry I'm not going to directly answer your question, but is there a specific reason you are not using "modern" OpenGL? E.g. hardware constraints, portability, or something?

It seems to me that the most effective path to optimization is to use a more modern approach - VBOs, Shaders, etc, if at all possible.

### #3BlueSpud  Members

Posted 11 December 2013 - 04:49 PM

Sorry I'm not going to directly answer your question, but is there a specific reason you are not using "modern" OpenGL? E.g. hardware constraints, portability, or something?

It seems to me that the most effective path to optimization is to use a more modern approach - VBOs, Shaders, etc, if at all possible.

I was under the impression that Draw Lists were the fastest. I am using shaders, just not for textures. I'm using program 0 to render just textures. As for not using other aspects of modern OpenGl, I want to have this run on lower end computers.

### #4L. Spiro  Members

Posted 11 December 2013 - 07:33 PM

I was under the impression that Draw Lists were the fastest.

Your power level of mistaken…it’s over 9,000!!!

I am using shaders, just not for textures.

Why would you do that? Why would you ever mix fixed-functionality and programmable pipelines? Are you maintaining 2 separate lighting pipelines?

As for not using other aspects of modern OpenGl, I want to have this run on lower end computers.

You are aware that any version of OpenGL that supports shaders (which you are using) also supports VBO’s and IBO’s, right?
VBO’s and IBO’s have been core since OpenGL 1.5.
Shaders have been core since OpenGL 2.0.

In short, your excuse about compatibility makes no sense and it doesn’t make sense to discuss performance issues until you start using VBO’s and IBO’s.

Ask again when you have switched to VBO’s and IBO’s (and preferably shaders for anything, not just “everything but textures”).

L. Spiro

### #5laztrezort  Members

Posted 11 December 2013 - 07:54 PM

Sorry I'm not going to directly answer your question, but is there a specific reason you are not using "modern" OpenGL? E.g. hardware constraints, portability, or something?

It seems to me that the most effective path to optimization is to use a more modern approach - VBOs, Shaders, etc, if at all possible.

I was under the impression that Draw Lists were the fastest. I am using shaders, just not for textures. I'm using program 0 to render just textures. As for not using other aspects of modern OpenGl, I want to have this run on lower end computers.
I'm not familiar with fixed mode, so I couldn't say on improving performance if you are sticking with that. Be aware, however, that display lists were deprecated way back in 3.1 (or maybe 3.0?), I'd hazard a guess that they are probably 'emulated' somehow by hardware nowadays.

### #6BlueSpud  Members

Posted 11 December 2013 - 08:14 PM

I was under the impression that Draw Lists were the fastest.

Your power level of mistaken…it’s over 9,000!!!

I am using shaders, just not for textures.

Why would you do that? Why would you ever mix fixed-functionality and programmable pipelines? Are you maintaining 2 separate lighting pipelines?

As for not using other aspects of modern OpenGl, I want to have this run on lower end computers.

You are aware that any version of OpenGL that supports shaders (which you are using) also supports VBO’s and IBO’s, right?
VBO’s and IBO’s have been core since OpenGL 1.5.
Shaders have been core since OpenGL 2.0.
In short, your excuse about compatibility makes no sense and it doesn’t make sense to discuss performance issues until you start using VBO’s and IBO’s.
Ask again when you have switched to VBO’s and IBO’s (and preferably shaders for anything, not just “everything but textures”).
L. Spiro
Maybe I was not specific enough. I have a deferred lighting system in place, so I am not using the fixed functionality lighting. I render it in several passes, one being the albedo. I use shaders for all the other passes except that. I simply don't use a shader for that pass because binding program 0 yields the same results as creating a simple shader to render geometry with texture. If just using a simple shader is faster, it would be easy to create a shader to do that, I would do it but I haven't been able to notice a difference.

As for the draw lists vs the vertex buffer objects, I was unaware that vbos were in the older OpenGL versions and I thought it was added it 3.2. I've tried both vertex buffer objects and display lists. Based on my experiences, draw lists are significantly faster, and I've also seen that on the internet. It could just be my video card though. I've also read that internally, the data is stored the same way as vbos in some cases. To me they just seem easier to implement and control, but that's just my opinion. The preformence problem might be somewhere else, and I'll see if I can track that down. Thank you for your input.

### #7mark ds  Members

Posted 11 December 2013 - 08:39 PM

How many times does this section...

glActiveTexture(GL_TEXTURE0);
glBindTexture(GL_TEXTURE_2D, Materials[i].textureID);
glActiveTexture(GL_TEXTURE1);
glBindTexture(GL_TEXTURE_2D, specularID);
glActiveTexture(GL_TEXTURE2);
glBindTexture(GL_TEXTURE_2D, normalID);



...actually get called - how many texture binds are there? If the code is being entered hundreds of times that is going to be a problem. Try pre-sorting (off-line) all your triangle by texture.

### #8L. Spiro  Members

Posted 11 December 2013 - 08:56 PM

To me they just seem easier to implement and control

This may be the problem. If you have the mindset that applies to display lists and you want per-frame control, you are probably misusing VBO’s.
If you are updating VBO’s frequently, you very well may have poorer performance with them.

Based on my experiences, draw lists are significantly faster, and I've also seen that on the internet.

Display lists can never be faster than a properly used VBO because display lists always have the additional overhead of memory copies that properly used VBO’s do not.

And while we are on performance, you have 2 strcmp()’s inside a nested loop.

Assign materials ID’s and do a simple integer compare.

I simply don't use a shader for that pass because binding program 0 yields the same results as creating a simple shader to render geometry with texture. If just using a simple shader is faster, it would be easy to create a shader to do that, I would do it but I haven't been able to notice a difference.

It is faster to use a simple shader.

The fixed-function pipeline is just emulated via shaders.  They often do more work than is necessary.

I render it in several passes, one being the albedo.

A deferred renderer should make no more than 1 pass to render the necessary components for later lighting etc.

It’s bad enough that you are using the slower fixed-function pipeline to do the albedo pass, but even worse that you are making an extra pass for it, something you would not need to do if you were just using shaders for everything.

L. Spiro

Posted 12 December 2013 - 12:38 AM

A display list is simply an array of commands + data. A VBO is an array of data only. So each command the gpu has to go "oh this is a glVertex3f call, the next 12 bytes are x,y,z floats". Time is wasted in determining the next command. A VBO you just say here is an array of vertices, draw it. It doesn't have to analyze every single piece of data and what the command is for it. It knows they are verts and that you want to draw them.

### #10BlueSpud  Members

Posted 12 December 2013 - 02:18 PM

To me they just seem easier to implement and control

This may be the problem. If you have the mindset that applies to display lists and you want per-frame control, you are probably misusing VBO’s.
If you are updating VBO’s frequently, you very well may have poorer performance with them.

Based on my experiences, draw lists are significantly faster, and I've also seen that on the internet.

Display lists can never be faster than a properly used VBO because display lists always have the additional overhead of memory copies that properly used VBO’s do not.

And while we are on performance, you have 2 strcmp()’s inside a nested loop.

Assign materials ID’s and do a simple integer compare.

I simply don't use a shader for that pass because binding program 0 yields the same results as creating a simple shader to render geometry with texture. If just using a simple shader is faster, it would be easy to create a shader to do that, I would do it but I haven't been able to notice a difference.

It is faster to use a simple shader.

The fixed-function pipeline is just emulated via shaders.  They often do more work than is necessary.

I render it in several passes, one being the albedo.

A deferred renderer should make no more than 1 pass to render the necessary components for later lighting etc.

It’s bad enough that you are using the slower fixed-function pipeline to do the albedo pass, but even worse that you are making an extra pass for it, something you would not need to do if you were just using shaders for everything.

L. Spiro

I optimized the deferred renderer a bit, making it use one pass and one shader for all three components and that did help significantly. I'm not sure if you are familiar with the obj file format, but that is what I'm using. I also don't take the material file from the obj, I take it from another file that specifies position, collision mesh, etc. I want all the models to be reusable with different mtl files so I use names instead of ids. It makes more sense to compare the names once in a display list than use some mesh-mtl specific id numbers, because they would need to match. It would end up being more load time than just doing the comparing. The second comparing just helps keep OpenGl state changing down. Eventually I'll sort the mesh, but I think the mesh I'm using is already sorted. Thanks for you help, but the framerate is fine after all the optimizations, so I think I'm good.

### #11Aks9  Members

Posted 12 December 2013 - 03:23 PM

Display lists were faster than VBOs on NV cards. I believe they are still faster, but cannot firmly claim since I'm not using them for a long time.

Well, your problem is in misusing DLs. You should move your texture selection code out of DL and the performance would be at least for the order of magnitude higher than now (at least  on NV).

Also, ignore what other have said about DLs, since they don't understand how they work. I won't replay to each separately, but:

- if used properly, DLs are faster since they are using optimizations beyond regular VBOs,

- you can use whatever you want while creating DLs; glCallList would not repeat it, but a compiled and optimized drawing code.

So, in short, remove texture manipulation code out of DL and report performance. If you have to change textures inside DL, break it into separate DLs or collect textures into atlases. The later will be faster, but the former is easier for the start.

P.S. I beg posters not to attach meaningless gigantic images! They reduce readability, make thread huge (since naive replays would contain the same images) and frivolous.

Edited by Aks9, 12 December 2013 - 03:38 PM.

Posted 12 December 2013 - 08:39 PM

I believe they are still faster, but cannot firmly claim since I'm not using them for a long time.

They don't even exist after GL 3.x

### #13BlueSpud  Members

Posted 12 December 2013 - 08:57 PM

Display lists were faster than VBOs on NV cards. I believe they are still faster, but cannot firmly claim since I'm not using them for a long time.

Well, your problem is in misusing DLs. You should move your texture selection code out of DL and the performance would be at least for the order of magnitude higher than now (at least  on NV).

Also, ignore what other have said about DLs, since they don't understand how they work. I won't replay to each separately, but:

- if used properly, DLs are faster since they are using optimizations beyond regular VBOs,

- you can use whatever you want while creating DLs; glCallList would not repeat it, but a compiled and optimized drawing code.

So, in short, remove texture manipulation code out of DL and report performance. If you have to change textures inside DL, break it into separate DLs or collect textures into atlases. The later will be faster, but the former is easier for the start.

P.S. I beg posters not to attach meaningless gigantic images! They reduce readability, make thread huge (since naive replays would contain the same images) and frivolous.

I moved the texture control outside of the display lists and made it separate into separate lists, but there wasn't much of a performance increase. Its probably a good practice to anyways, so I'll keep it that way. Thanks.

### #14Chris_F  Members

Posted 12 December 2013 - 09:26 PM

I believe they are still faster, but cannot firmly claim since I'm not using them for a long time.

They don't even exist after GL 3.x

I think most people are still relying on GL_ARB_compatibility.

### #15Aks9  Members

Posted 13 December 2013 - 02:59 AM

They don't even exist after GL 3.x

Yes, they do exist in the Compatibility profile.

I moved the texture control outside of the display lists and made it separate into separate lists, but there wasn't much of a performance increase. Its probably a good practice to anyways, so I'll keep it that way. Thanks.

Don't do that! You exchanged one problem with another. Leave texture manipulation code outside DLs. Choosing active texture unit and binding textures should be outside DLs. Use DLs just like VBOs. Drivers will optimized layout and access, but binding textures is something that (probably) makes problem with that optimization. Try it and tell us whether there is a speed boost or not.

### #16Aks9  Members

Posted 13 December 2013 - 03:02 AM

I think most people are still relying on GL_ARB_compatibility.

GL_ARB_compatibility exists only in GL 3.1. From GL 3.2 there are profiles. GL_ARB_compatibility extension is deprecated too.

### #17Kaptein  Prime Members

Posted 13 December 2013 - 08:51 AM

I believe they are still faster, but cannot firmly claim since I'm not using them for a long time.

They don't even exist after GL 3.x

I think most people are still relying on GL_ARB_compatibility.

While true, people who ask for help in these forums should not exit a discussion with the illusion that they are doing the right thing in the longer term.

Some features of OpenGL are too old to be considered safe to use. This includes display lists.

This doesn't mean no one should use them. It's just a reminder. Using 3.x features is always the right choice. It has major penetration right now, and will stay for a long time. Compatibility mode or not. (I personally use compatibility mode, not for any particular reason)

Edited by Kaptein, 13 December 2013 - 08:54 AM.

### #18L. Spiro  Members

Posted 13 December 2013 - 10:03 AM

Also, ignore what other have said about DLs, since they don't understand how they work.

Overruled.

I generally only post when I have reasonable cause for what I post (because the main objective on this site is to post facts).

So when you came along and said I didn’t understand how display lists work, I said, “Well, yes that’s true. While writing my book on OpenGL I have been closely working with Apple staff on how VBO’s work—for example I can tell you why glDrawElements() calls glDrawElements_ES2Exec() instead of glDrawElements_IMM_ES2Exec(), but you know I never really talked to Apple’s staff about display lists because they don’t exist today”.

It’s 12:23 AM my time because that’s how long it took for my friend in America who writes OpenGL drivers for a living to awaken.

Now, I’m writing a book related to OpenGL, and I have been working closely with Apple on how they implement their drivers because my book focuses on optimizations and best practices. I thought it prudent to get a second opinion from someone not from Apple who writes OpenGL drivers just to be sure.

One of us understands how display lists works, the other one did some testing on apparently old NVIDA hardware, which may have been horribly flawed just because getting VBO’s right is non-trivial.
Misuse VBO’s: Bad performance.

I have neither the time nor patience to entertain the idea that the best-case display lists are faster than the best-case VBO’s. No matter how you think a display list can be optimized, a VBO can be optimized the same way once instead of every frame. And even if you don’t see a GPU limit in either case, you definitely 100% see a CPU-bound case on display lists. It’s basic human common sense. But why take my word for it when I can just provide you with quotes from my friend who makes OpenGL drivers for a living?

as they likely told you there are two primary paths for rendering vertex data in that driver stackâ€¦ depending upon whether or not the underlying HW can handle that particular state vector directly
basically a "fast path" and a "slower path" â€¦the latter ends up munging the underlying data in order to put it in a form valid for HW acceleration
DLs themselves are quite difficult to optimize sensibly
and the DL optimizer is a fairly fragile piece of code (generally speakingâ€¦ and not at all unique to this platform)
typically one will be either on par or well above DLs when it comes to VBO based rendering (concerning performance)
furthermore DLs do nothing to solve the problems associated with moving large amounts of mutable data down the VA path
since DLs by definition only deal with immutable (i.e. STATIC) content
write a performance benchmark and you can see this for yourself
the DL optimizer will have to reformat the data
and of course that reformatting step will impose a copy
now whether or not that is the only copy (i.e. from client to server) will depend as well
you'll have a copy in ALL vertex submission pathsâ€¦ at a minimum you'll have to copy the data from a client side store into a GPU mapped buffer (i.e. via something like Buffer[sub]Data for the VBO side)
one doesn't typically write directly into a mapped pointer (via MapBuffer[Range])

in any case reformatting your data into a sensible ordering (native to the underlying HW) as a offline step is ALWAYS a better approach
versus forcing the server to do this at COMPILE (in the DL case) or (far worse) at EXECUTE time

which would be the two situations one would run into with DLs

Sadly, he gave me information which he requested I not share, but is basically inline with the concept that “anything display lists can do, VBO’s can do better”.

I respectfully overrule Aks9 and reiterate that no discussion regarding performance is valid until we start discussing VBO’s.
This is not a site where we say that we tested something long ago on one brand of video cards and start giving sweeping advice to everyone else based on that.
This is a site where we can all be wrong, but if our goal is to be helpful we will do our best to make sure what we say is correct, even if that means contacting Apple and other OpenGL driver developers.

Ditch display lists and use VBO’s.

L. Spiro

Edited by L. Spiro, 13 December 2013 - 11:07 AM.

### #19Aks9  Members

Posted 13 December 2013 - 01:10 PM

So when you came along and said I didn’t understand how display lists work, I said, “Well, yes that’s true. While writing my book on OpenGL I have been closely working with Apple staff on how VBO’s work—for example I can tell you why glDrawElements() calls glDrawElements_ES2Exec() instead of glDrawElements_IMM_ES2Exec(), but you know I never really talked to Apple’s staff about display lists because they don’t exist today”.

Respect! May we hear what the book in question is? Ups, I didn't notice it isn't finished yet. I hope we will hear about it soon.

One of us understands how display lists works, the other one did some testing on apparently old NVIDA hardware, which may have been horribly flawed just because getting VBO’s right is non-trivial.
Misuse VBO’s: Bad performance.

I don't understand this. What's the point?

I have neither the time nor patience to entertain the idea that the best-case display lists are faster than the best-case VBO’s. No matter how you think a display list can be optimized, a VBO can be optimized the same way once instead of every frame. And even if you don’t see a GPU limit in either case, you definitely 100% see a CPU-bound case on display lists. It’s basic human common sense.

I really don't understand what you wanted to say with these statements.

VBOs are far simpler than DLs, and I really don't understand how DLs achieve better performance, but last time I tried DLs are superior on NV hardware, of course for static geometry. No one would use DLs for dynamic geometry. Binding mechanism, address resolution, cache misses may be the reasons for that. Only resident buffers (introduced through NV bindless extensions) could compete DLs.

What did you mean with "every frame" optimization? Do you think VBO (whatever usage hint is used) is optimized on the per frame bases? I'm not a driver programmer, but I really don't think it is reasonable.

The other two sentences is even less comprehensible.

But why take my word for it when I can just provide you with quotes from my friend who makes OpenGL drivers for a living?

When read this I started to rub my hands hoping that we will hear something new. Maybe I was a little rude, but if it provoked insight in the implementation it was worth every word. But... In the citation we heard everything that we already know. The only uncertain claim is that COMPILING DLs is not an offline optimization. Actually it is, and it is in the way drivers do (probably better than we think it should be).

In short, I do not propagate usage of DLs. I, personally, haven't used them for many years. But if somebody asked for an advice how to use them, I think it is better to help him and prevent code refectory if it serves him well.

I'm sorry if I was rude. Some claims provoked me to react that way.

### #20Chris_F  Members

Posted 13 December 2013 - 01:26 PM

While true, people who ask for help in these forums should not exit a discussion with the illusion that they are doing the right thing in the longer term.
Some features of OpenGL are too old to be considered safe to use. This includes display lists.

This doesn't mean no one should use them. It's just a reminder. Using 3.x features is always the right choice. It has major penetration right now, and will stay for a long time. Compatibility mode or not. (I personally use compatibility mode, not for any particular reason)

The way I see it is that compatibility is about supporting legacy code. If you are writing new code from scratch, and are using legacy features, then you are misusing it. I have a feeling that core features would be better optimized and less buggy if more people were actually sticking to them.

Edited by Chris_F, 13 December 2013 - 01:28 PM.

Old topic!

Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.