• Announcements

    • khawk

      Download the Game Design and Indie Game Marketing Freebook   07/19/17

      GameDev.net and CRC Press have teamed up to bring a free ebook of content curated from top titles published by CRC Press. The freebook, Practices of Game Design & Indie Game Marketing, includes chapters from The Art of Game Design: A Book of Lenses, A Practical Guide to Indie Game Marketing, and An Architectural Approach to Level Design. The GameDev.net FreeBook is relevant to game designers, developers, and those interested in learning more about the challenges in game development. We know game development can be a tough discipline and business, so we picked several chapters from CRC Press titles that we thought would be of interest to you, the GameDev.net audience, in your journey to design, develop, and market your next game. The free ebook is available through CRC Press by clicking here. The Curated Books The Art of Game Design: A Book of Lenses, Second Edition, by Jesse Schell Presents 100+ sets of questions, or different lenses, for viewing a game’s design, encompassing diverse fields such as psychology, architecture, music, film, software engineering, theme park design, mathematics, anthropology, and more. Written by one of the world's top game designers, this book describes the deepest and most fundamental principles of game design, demonstrating how tactics used in board, card, and athletic games also work in video games. It provides practical instruction on creating world-class games that will be played again and again. View it here. A Practical Guide to Indie Game Marketing, by Joel Dreskin Marketing is an essential but too frequently overlooked or minimized component of the release plan for indie games. A Practical Guide to Indie Game Marketing provides you with the tools needed to build visibility and sell your indie games. With special focus on those developers with small budgets and limited staff and resources, this book is packed with tangible recommendations and techniques that you can put to use immediately. As a seasoned professional of the indie game arena, author Joel Dreskin gives you insight into practical, real-world experiences of marketing numerous successful games and also provides stories of the failures. View it here. An Architectural Approach to Level Design This is one of the first books to integrate architectural and spatial design theory with the field of level design. The book presents architectural techniques and theories for level designers to use in their own work. It connects architecture and level design in different ways that address the practical elements of how designers construct space and the experiential elements of how and why humans interact with this space. Throughout the text, readers learn skills for spatial layout, evoking emotion through gamespaces, and creating better levels through architectural theory. View it here. Learn more and download the ebook by clicking here. Did you know? GameDev.net and CRC Press also recently teamed up to bring GDNet+ Members up to a 20% discount on all CRC Press books. Learn more about this and other benefits here.
Sign in to follow this  
Followers 0
BlueSpud

OpenGL
Questions about mesh rendering performance

22 posts in this topic

Hello,

I've been working on my mesh rendering lately for my game engine and with large models I have run into problems. I know that the best thing in OpenGL is to avoid state changes to improve performance. With the model I was rendering, with one solid texture, I was getting ~35-40 fps. After I added multi texturing, I get  ~25-28. I'm using draw lists and here is my code:

//make the list
        int materiali = 0;
        list = glGenLists(1);
        glNewList(list, GL_COMPILE);
        glBegin(GL_TRIANGLES);
        for (int i = 0; i < ModelRegistry.models[m].m.obj.size(); i++)
        {
            if (i == ModelRegistry.models[m].m.materialFaces[materiali].i)
                {
                    //we have a texture change here
                    for (int i = 0; i < Materials.size(); i++)
                        {

                            if  (strcmp( ModelRegistry.models[m].m.materialFaces[materiali].name.c_str(), Materials[i].name.c_str()) == 0)
                                {
                                    materiali++;
                                    if (strcmp( lastMaterial.c_str(), Materials[i].name.c_str()) != 0)
                                    {
                                        glEnd();
                                        lastMaterial = Materials[i].name;
                                        glActiveTexture(GL_TEXTURE0);
                                        glBindTexture(GL_TEXTURE_2D, Materials[i].textureID);
                                        glActiveTexture(GL_TEXTURE1);
                                        glBindTexture(GL_TEXTURE_2D, specularID);
                                        glActiveTexture(GL_TEXTURE2);
                                        glBindTexture(GL_TEXTURE_2D, normalID);
                                        glBegin(GL_TRIANGLES);
                                    }
                                    break;
                                }
                        }

                }
            glNormal3f(ModelRegistry.models[m].m.obj[i].nx1, ModelRegistry.models[m].m.obj[i].ny1, ModelRegistry.models[m].m.obj[i].nz1);
            glTexCoord2f(ModelRegistry.models[m].m.obj[i].tx1, ModelRegistry.models[m].m.obj[i].ty1);
            glVertex3f(ModelRegistry.models[m].m.obj[i].x1,ModelRegistry.models[m].m.obj[i].y1,ModelRegistry.models[m].m.obj[i].z1);
            glNormal3f(ModelRegistry.models[m].m.obj[i].nx2, ModelRegistry.models[m].m.obj[i].ny2, ModelRegistry.models[m].m.obj[i].nz2);
            glTexCoord2f(ModelRegistry.models[m].m.obj[i].tx2, ModelRegistry.models[m].m.obj[i].ty2);
            glVertex3f(ModelRegistry.models[m].m.obj[i].x2,ModelRegistry.models[m].m.obj[i].y2,ModelRegistry.models[m].m.obj[i].z2);
            glNormal3f(ModelRegistry.models[m].m.obj[i].nx3, ModelRegistry.models[m].m.obj[i].ny3, ModelRegistry.models[m].m.obj[i].nz3);
            glTexCoord2f(ModelRegistry.models[m].m.obj[i].tx3, ModelRegistry.models[m].m.obj[i].ty3);
            glVertex3f(ModelRegistry.models[m].m.obj[i].x3,ModelRegistry.models[m].m.obj[i].y3,ModelRegistry.models[m].m.obj[i].z3);

        }
               glEnd();
            glEndList();

Here are my questions, they should be fairly basic, even if you don't understand the code.

 

Can new textures be bound inside glBegin() ?

 

Is there anything faster than glBindTexture() ?

 

Is the accessing data from the std::vector slowing down the rendering?

 

Does the GPU have to go through the for loop every time the list is called?

 

Those are other questions, but I think those are the big ones. Any input would be appreciated because the visual result of the rendering is great, just the frame rate isn't. Thanks.

0

Share this post


Link to post
Share on other sites
Sorry I'm not going to directly answer your question, but is there a specific reason you are not using "modern" OpenGL? E.g. hardware constraints, portability, or something?

It seems to me that the most effective path to optimization is to use a more modern approach - VBOs, Shaders, etc, if at all possible.
0

Share this post


Link to post
Share on other sites

Sorry I'm not going to directly answer your question, but is there a specific reason you are not using "modern" OpenGL? E.g. hardware constraints, portability, or something?

It seems to me that the most effective path to optimization is to use a more modern approach - VBOs, Shaders, etc, if at all possible.

I was under the impression that Draw Lists were the fastest. I am using shaders, just not for textures. I'm using program 0 to render just textures. As for not using other aspects of modern OpenGl, I want to have this run on lower end computers.

0

Share this post


Link to post
Share on other sites

I was under the impression that Draw Lists were the fastest.

Your power level of mistaken…it’s over 9,000!!!

 

I am using shaders, just not for textures.

Why would you do that? Why would you ever mix fixed-functionality and programmable pipelines? Are you maintaining 2 separate lighting pipelines?

 

As for not using other aspects of modern OpenGl, I want to have this run on lower end computers.

You are aware that any version of OpenGL that supports shaders (which you are using) also supports VBO’s and IBO’s, right?
VBO’s and IBO’s have been core since OpenGL 1.5.
Shaders have been core since OpenGL 2.0.


In short, your excuse about compatibility makes no sense and it doesn’t make sense to discuss performance issues until you start using VBO’s and IBO’s.


Ask again when you have switched to VBO’s and IBO’s (and preferably shaders for anything, not just “everything but textures”).


L. Spiro

1

Share this post


Link to post
Share on other sites


Sorry I'm not going to directly answer your question, but is there a specific reason you are not using "modern" OpenGL? E.g. hardware constraints, portability, or something?

It seems to me that the most effective path to optimization is to use a more modern approach - VBOs, Shaders, etc, if at all possible.

I was under the impression that Draw Lists were the fastest. I am using shaders, just not for textures. I'm using program 0 to render just textures. As for not using other aspects of modern OpenGl, I want to have this run on lower end computers.
I'm not familiar with fixed mode, so I couldn't say on improving performance if you are sticking with that. Be aware, however, that display lists were deprecated way back in 3.1 (or maybe 3.0?), I'd hazard a guess that they are probably 'emulated' somehow by hardware nowadays.
0

Share this post


Link to post
Share on other sites

I was under the impression that Draw Lists were the fastest.

Your power level of mistaken…it’s over 9,000!!!
 

I am using shaders, just not for textures.

Why would you do that? Why would you ever mix fixed-functionality and programmable pipelines? Are you maintaining 2 separate lighting pipelines?
 

As for not using other aspects of modern OpenGl, I want to have this run on lower end computers.

You are aware that any version of OpenGL that supports shaders (which you are using) also supports VBO’s and IBO’s, right?
VBO’s and IBO’s have been core since OpenGL 1.5.
Shaders have been core since OpenGL 2.0.
In short, your excuse about compatibility makes no sense and it doesn’t make sense to discuss performance issues until you start using VBO’s and IBO’s.
Ask again when you have switched to VBO’s and IBO’s (and preferably shaders for anything, not just “everything but textures”).
L. Spiro
Maybe I was not specific enough. I have a deferred lighting system in place, so I am not using the fixed functionality lighting. I render it in several passes, one being the albedo. I use shaders for all the other passes except that. I simply don't use a shader for that pass because binding program 0 yields the same results as creating a simple shader to render geometry with texture. If just using a simple shader is faster, it would be easy to create a shader to do that, I would do it but I haven't been able to notice a difference.

As for the draw lists vs the vertex buffer objects, I was unaware that vbos were in the older OpenGL versions and I thought it was added it 3.2. I've tried both vertex buffer objects and display lists. Based on my experiences, draw lists are significantly faster, and I've also seen that on the internet. It could just be my video card though. I've also read that internally, the data is stored the same way as vbos in some cases. To me they just seem easier to implement and control, but that's just my opinion. The preformence problem might be somewhere else, and I'll see if I can track that down. Thank you for your input.
1

Share this post


Link to post
Share on other sites

How many times does this section...

glActiveTexture(GL_TEXTURE0);
glBindTexture(GL_TEXTURE_2D, Materials[i].textureID);
glActiveTexture(GL_TEXTURE1);
glBindTexture(GL_TEXTURE_2D, specularID);
glActiveTexture(GL_TEXTURE2);
glBindTexture(GL_TEXTURE_2D, normalID);

...actually get called - how many texture binds are there? If the code is being entered hundreds of times that is going to be a problem. Try pre-sorting (off-line) all your triangle by texture.

0

Share this post


Link to post
Share on other sites

To me they just seem easier to implement and control

This may be the problem. If you have the mindset that applies to display lists and you want per-frame control, you are probably misusing VBO’s.
If you are updating VBO’s frequently, you very well may have poorer performance with them.

 

 


Based on my experiences, draw lists are significantly faster, and I've also seen that on the internet.

Display lists can never be faster than a properly used VBO because display lists [i]always[/i] have the additional overhead of memory copies that properly used VBO’s do not.



And while we are on performance, you have 2 strcmp()’s inside a nested loop.

donotwant.jpg

 

 

Assign materials ID’s and do a simple integer compare.

 

 

 


I simply don't use a shader for that pass because binding program 0 yields the same results as creating a simple shader to render geometry with texture. If just using a simple shader is faster, it would be easy to create a shader to do that, I would do it but I haven't been able to notice a difference.

It is faster to use a simple shader.

The fixed-function pipeline is just emulated via shaders.  They often do more work than is necessary.

 

 


I render it in several passes, one being the albedo.

A deferred renderer should make no more than 1 pass to render the necessary components for later lighting etc.

It’s bad enough that you are using the slower fixed-function pipeline to do the albedo pass, but even worse that you are making an extra pass for it, something you would not need to do if you were just using shaders for everything.

 

 

L. Spiro

1

Share this post


Link to post
Share on other sites

A display list is simply an array of commands + data. A VBO is an array of data only. So each command the gpu has to go "oh this is a glVertex3f call, the next 12 bytes are x,y,z floats". Time is wasted in determining the next command. A VBO you just say here is an array of vertices, draw it. It doesn't have to analyze every single piece of data and what the command is for it. It knows they are verts and that you want to draw them.

0

Share this post


Link to post
Share on other sites

 

To me they just seem easier to implement and control

This may be the problem. If you have the mindset that applies to display lists and you want per-frame control, you are probably misusing VBO’s.
If you are updating VBO’s frequently, you very well may have poorer performance with them.

 

 

 

 


Based on my experiences, draw lists are significantly faster, and I've also seen that on the internet.

Display lists can never be faster than a properly used VBO because display lists always have the additional overhead of memory copies that properly used VBO’s do not.


And while we are on performance, you have 2 strcmp()’s inside a nested loop.

donotwant.jpg

 

 

Assign materials ID’s and do a simple integer compare.

 

 

 

 

 


I simply don't use a shader for that pass because binding program 0 yields the same results as creating a simple shader to render geometry with texture. If just using a simple shader is faster, it would be easy to create a shader to do that, I would do it but I haven't been able to notice a difference.

It is faster to use a simple shader.

The fixed-function pipeline is just emulated via shaders.  They often do more work than is necessary.

 

 

 

 


I render it in several passes, one being the albedo.

A deferred renderer should make no more than 1 pass to render the necessary components for later lighting etc.

It’s bad enough that you are using the slower fixed-function pipeline to do the albedo pass, but even worse that you are making an extra pass for it, something you would not need to do if you were just using shaders for everything.

 

 

L. Spiro

 

I optimized the deferred renderer a bit, making it use one pass and one shader for all three components and that did help significantly. I'm not sure if you are familiar with the obj file format, but that is what I'm using. I also don't take the material file from the obj, I take it from another file that specifies position, collision mesh, etc. I want all the models to be reusable with different mtl files so I use names instead of ids. It makes more sense to compare the names once in a display list than use some mesh-mtl specific id numbers, because they would need to match. It would end up being more load time than just doing the comparing. The second comparing just helps keep OpenGl state changing down. Eventually I'll sort the mesh, but I think the mesh I'm using is already sorted. Thanks for you help, but the framerate is fine after all the optimizations, so I think I'm good.

0

Share this post


Link to post
Share on other sites

Display lists were faster than VBOs on NV cards. I believe they are still faster, but cannot firmly claim since I'm not using them for a long time.

Well, your problem is in misusing DLs. You should move your texture selection code out of DL and the performance would be at least for the order of magnitude higher than now (at least  on NV).

 

Also, ignore what other have said about DLs, since they don't understand how they work. I won't replay to each separately, but:

- if used properly, DLs are faster since they are using optimizations beyond regular VBOs,

- you can use whatever you want while creating DLs; glCallList would not repeat it, but a compiled and optimized drawing code.

 

So, in short, remove texture manipulation code out of DL and report performance. If you have to change textures inside DL, break it into separate DLs or collect textures into atlases. The later will be faster, but the former is easier for the start. smile.png

 

P.S. I beg posters not to attach meaningless gigantic images! They reduce readability, make thread huge (since naive replays would contain the same images) and frivolous.

Edited by Aks9
0

Share this post


Link to post
Share on other sites

Display lists were faster than VBOs on NV cards. I believe they are still faster, but cannot firmly claim since I'm not using them for a long time.

Well, your problem is in misusing DLs. You should move your texture selection code out of DL and the performance would be at least for the order of magnitude higher than now (at least  on NV).

 

Also, ignore what other have said about DLs, since they don't understand how they work. I won't replay to each separately, but:

- if used properly, DLs are faster since they are using optimizations beyond regular VBOs,

- you can use whatever you want while creating DLs; glCallList would not repeat it, but a compiled and optimized drawing code.

 

So, in short, remove texture manipulation code out of DL and report performance. If you have to change textures inside DL, break it into separate DLs or collect textures into atlases. The later will be faster, but the former is easier for the start. smile.png

 

P.S. I beg posters not to attach meaningless gigantic images! They reduce readability, make thread huge (since naive replays would contain the same images) and frivolous.

I moved the texture control outside of the display lists and made it separate into separate lists, but there wasn't much of a performance increase. Its probably a good practice to anyways, so I'll keep it that way. Thanks.

0

Share this post


Link to post
Share on other sites

 

 

I believe they are still faster, but cannot firmly claim since I'm not using them for a long time.

They don't even exist after GL 3.x

 

 

I think most people are still relying on GL_ARB_compatibility.

1

Share this post


Link to post
Share on other sites

They don't even exist after GL 3.x

 

Yes, they do exist in the Compatibility profile.

 

I moved the texture control outside of the display lists and made it separate into separate lists, but there wasn't much of a performance increase. Its probably a good practice to anyways, so I'll keep it that way. Thanks.

 

Don't do that! You exchanged one problem with another. Leave texture manipulation code outside DLs. Choosing active texture unit and binding textures should be outside DLs. Use DLs just like VBOs. Drivers will optimized layout and access, but binding textures is something that (probably) makes problem with that optimization. Try it and tell us whether there is a speed boost or not.

0

Share this post


Link to post
Share on other sites

 

I think most people are still relying on GL_ARB_compatibility.

 

GL_ARB_compatibility exists only in GL 3.1. From GL 3.2 there are profiles. GL_ARB_compatibility extension is deprecated too.

0

Share this post


Link to post
Share on other sites

Also, ignore what other have said about DLs, since they don't understand how they work.

Overruled.

I generally only post when I have reasonable cause for what I post (because the main objective on this site is to post facts).

So when you came along and said I didn’t understand how display lists work, I said, “Well, yes that’s true. While writing my book on OpenGL I have been closely working with Apple staff on how VBO’s work—for example I can tell you why glDrawElements() calls glDrawElements_ES2Exec() instead of glDrawElements_IMM_ES2Exec(), but you know I never really talked to Apple’s staff about display lists because they don’t exist today”.

It’s 12:23 AM my time because that’s how long it took for my friend in America who writes OpenGL drivers for a living to awaken.

Now, I’m writing a book related to OpenGL, and I have been working closely with Apple on how they implement their drivers because my book focuses on optimizations and best practices. I thought it prudent to get a second opinion from someone not from Apple who writes OpenGL drivers just to be sure.

One of us understands how display lists works, the other one did some testing on apparently old NVIDA hardware, which may have been horribly flawed just because getting VBO’s right is non-trivial.
Misuse VBO’s: Bad performance.


I have neither the time nor patience to entertain the idea that the best-case display lists are faster than the best-case VBO’s. No matter how you think a display list can be optimized, a VBO can be optimized the same way once instead of every frame. And even if you don’t see a GPU limit in either case, you definitely 100% see a CPU-bound case on display lists. It’s basic human common sense. But why take my word for it when I can just provide you with quotes from my friend who makes OpenGL drivers for a living?
 

as they likely told you there are two primary paths for rendering vertex data in that driver stack… depending upon whether or not the underlying HW can handle that particular state vector directly
basically a "fast path" and a "slower path" …the latter ends up munging the underlying data in order to put it in a form valid for HW acceleration
DLs themselves are quite difficult to optimize sensibly
and the DL optimizer is a fairly fragile piece of code (generally speaking… and not at all unique to this platform)
typically one will be either on par or well above DLs when it comes to VBO based rendering (concerning performance)
furthermore DLs do nothing to solve the problems associated with moving large amounts of mutable data down the VA path
since DLs by definition only deal with immutable (i.e. STATIC) content
write a performance benchmark and you can see this for yourself
the DL optimizer will have to reformat the data
and of course that reformatting step will impose a copy
now whether or not that is the only copy (i.e. from client to server) will depend as well
you'll have a copy in ALL vertex submission paths… at a minimum you'll have to copy the data from a client side store into a GPU mapped buffer (i.e. via something like Buffer[sub]Data for the VBO side)
one doesn't typically write directly into a mapped pointer (via MapBuffer[Range])

in any case reformatting your data into a sensible ordering (native to the underlying HW) as a offline step is ALWAYS a better approach
versus forcing the server to do this at COMPILE (in the DL case) or (far worse) at EXECUTE time

which would be the two situations one would run into with DLs

Sadly, he gave me information which he requested I not share, but is basically inline with the concept that “anything display lists can do, VBO’s can do better”.


I respectfully overrule Aks9 and reiterate that no discussion regarding performance is valid until we start discussing VBO’s.
This is not a site where we say that we tested something long ago on one brand of video cards and start giving sweeping advice to everyone else based on that.
This is a site where we can all be wrong, but if our goal is to be helpful we will do our best to make sure what we say is correct, even if that means contacting Apple and other OpenGL driver developers.


Ditch display lists and use VBO’s.


L. Spiro

Edited by L. Spiro
1

Share this post


Link to post
Share on other sites

So when you came along and said I didn’t understand how display lists work, I said, “Well, yes that’s true. While writing my book on OpenGL I have been closely working with Apple staff on how VBO’s work—for example I can tell you why glDrawElements() calls glDrawElements_ES2Exec() instead of glDrawElements_IMM_ES2Exec(), but you know I never really talked to Apple’s staff about display lists because they don’t exist today”.

Respect! May we hear what the book in question is? Ups, I didn't notice it isn't finished yet. I hope we will hear about it soon.

 

 

One of us understands how display lists works, the other one did some testing on apparently old NVIDA hardware, which may have been horribly flawed just because getting VBO’s right is non-trivial.
Misuse VBO’s: Bad performance.

I don't understand this. What's the point?

 

 

I have neither the time nor patience to entertain the idea that the best-case display lists are faster than the best-case VBO’s. No matter how you think a display list can be optimized, a VBO can be optimized the same way once instead of every frame. And even if you don’t see a GPU limit in either case, you definitely 100% see a CPU-bound case on display lists. It’s basic human common sense. 

I really don't understand what you wanted to say with these statements.

VBOs are far simpler than DLs, and I really don't understand how DLs achieve better performance, but last time I tried DLs are superior on NV hardware, of course for static geometry. No one would use DLs for dynamic geometry. Binding mechanism, address resolution, cache misses may be the reasons for that. Only resident buffers (introduced through NV bindless extensions) could compete DLs.

 

 

What did you mean with "every frame" optimization? Do you think VBO (whatever usage hint is used) is optimized on the per frame bases? I'm not a driver programmer, but I really don't think it is reasonable.

 

 

The other two sentences is even less comprehensible.

 

 But why take my word for it when I can just provide you with quotes from my friend who makes OpenGL drivers for a living?

When read this I started to rub my hands hoping that we will hear something new. Maybe I was a little rude, but if it provoked insight in the implementation it was worth every word. But... In the citation we heard everything that we already know. The only uncertain claim is that COMPILING DLs is not an offline optimization. Actually it is, and it is in the way drivers do (probably better than we think it should be).

 

In short, I do not propagate usage of DLs. I, personally, haven't used them for many years. But if somebody asked for an advice how to use them, I think it is better to help him and prevent code refectory if it serves him well.

 

I'm sorry if I was rude. Some claims provoked me to react that way.

0

Share this post


Link to post
Share on other sites
While true, people who ask for help in these forums should not exit a discussion with the illusion that they are doing the right thing in the longer term.
Some features of OpenGL are too old to be considered safe to use. This includes display lists.

This doesn't mean no one should use them. It's just a reminder. Using 3.x features is always the right choice. It has major penetration right now, and will stay for a long time. Compatibility mode or not. (I personally use compatibility mode, not for any particular reason)

 

The way I see it is that compatibility is about supporting legacy code. If you are writing new code from scratch, and are using legacy features, then you are misusing it. I have a feeling that core features would be better optimized and less buggy if more people were actually sticking to them.

Edited by Chris_F
0

Share this post


Link to post
Share on other sites


No matter how you think a display list can be optimized, a VBO can be optimized the same way once instead of every frame.

Why would I need to optimize static geometry every frame? I've been going under the assumption that when you create a display list with the compile option, it never plays with the data again, just reads it.

0

Share this post


Link to post
Share on other sites

Why would I need to optimize static geometry every frame? I've been going under the assumption that when you create a display list with the compile option, it never plays with the data again, just reads it.

 

Exactly! Furthermore, drivers pack your data in the optimal way along with all relevant information for later access.

VBOs are much simpler, and you should know how to pack data, what attributes to activate etc.

But, VBOs are what we have in core profile and should use.

 

It is a long story why there are profiles and why the old deprecated functionality still exists.

If you are happy with DLs continue to use them, but on the proper way, and they'll serve you well.

If you want to switch to non-deprecated functionality abandon them. VBOs are certainly the right way to do things, but you have to know your hw better.

0

Share this post


Link to post
Share on other sites

Exactly! Furthermore, drivers pack your data in the optimal way along with all relevant information for later access. ... If you are happy with DLs continue to use them, but on the proper way, and they'll serve you well. If you want to switch to non-deprecated functionality abandon them. VBOs are certainly the right way to do things, but you have to know your hw better.

Right depends on your goals. For most of us, right isn't defined by core but most efficient use of the hardware (fastest performance).

So yes, agreed. If DL works for you use it. If you want more control over how your GPU memory is utilized, use static VBOs but you'll take a performance hit if you use them alone and you have to be smart about how you encode your data within them. If you happen to be running on NVidia and want VBOs with display list performance, use NV bindless to launch your batches with those VBOs. If not, then substitute VAOs in place of bindless -- it doesn't perform as well but it's better than nothing.

Also, the more data you pack in your VBOs (the larger your batches), the less likely you are to be CPU bound launching batches (which is for the most part what bindless and VAOs strive to reduce).
0

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0

  • Similar Content

    • By DaniDesu
      #include "MyEngine.h" int main() { MyEngine myEngine; myEngine.run(); return 0; } MyEngine.h
      #pragma once #include "MyWindow.h" #include "MyShaders.h" #include "MyShapes.h" class MyEngine { private: GLFWwindow * myWindowHandle; MyWindow * myWindow; public: MyEngine(); ~MyEngine(); void run(); }; MyEngine.cpp
      #include "MyEngine.h" MyEngine::MyEngine() { MyWindow myWindow(800, 600, "My Game Engine"); this->myWindow = &myWindow; myWindow.createWindow(); this->myWindowHandle = myWindow.getWindowHandle(); // Load all OpenGL function pointers for use gladLoadGLLoader((GLADloadproc)glfwGetProcAddress); } MyEngine::~MyEngine() { this->myWindow->destroyWindow(); } void MyEngine::run() { MyShaders myShaders("VertexShader.glsl", "FragmentShader.glsl"); MyShapes myShapes; GLuint vertexArrayObjectHandle; float coordinates[] = { 0.5f, 0.5f, 0.0f, 0.5f, -0.5f, 0.0f, -0.5f, 0.5f, 0.0f }; vertexArrayObjectHandle = myShapes.drawTriangle(coordinates); while (!glfwWindowShouldClose(this->myWindowHandle)) { glClearColor(0.5f, 0.5f, 0.5f, 1.0f); glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT); // Draw something glUseProgram(myShaders.getShaderProgram()); glBindVertexArray(vertexArrayObjectHandle); glDrawArrays(GL_TRIANGLES, 0, 3); glfwSwapBuffers(this->myWindowHandle); glfwPollEvents(); } } MyShaders.h
      #pragma once #include <glad\glad.h> #include <GLFW\glfw3.h> #include "MyFileHandler.h" class MyShaders { private: const char * vertexShaderFileName; const char * fragmentShaderFileName; const char * vertexShaderCode; const char * fragmentShaderCode; GLuint vertexShaderHandle; GLuint fragmentShaderHandle; GLuint shaderProgram; void compileShaders(); public: MyShaders(const char * vertexShaderFileName, const char * fragmentShaderFileName); ~MyShaders(); GLuint getShaderProgram(); const char * getVertexShaderCode(); const char * getFragmentShaderCode(); }; MyShaders.cpp
      #include "MyShaders.h" MyShaders::MyShaders(const char * vertexShaderFileName, const char * fragmentShaderFileName) { this->vertexShaderFileName = vertexShaderFileName; this->fragmentShaderFileName = fragmentShaderFileName; // Load shaders from files MyFileHandler myVertexShaderFileHandler(this->vertexShaderFileName); this->vertexShaderCode = myVertexShaderFileHandler.readFile(); MyFileHandler myFragmentShaderFileHandler(this->fragmentShaderFileName); this->fragmentShaderCode = myFragmentShaderFileHandler.readFile(); // Compile shaders this->compileShaders(); } MyShaders::~MyShaders() { } void MyShaders::compileShaders() { this->vertexShaderHandle = glCreateShader(GL_VERTEX_SHADER); this->fragmentShaderHandle = glCreateShader(GL_FRAGMENT_SHADER); glShaderSource(this->vertexShaderHandle, 1, &(this->vertexShaderCode), NULL); glShaderSource(this->fragmentShaderHandle, 1, &(this->fragmentShaderCode), NULL); glCompileShader(this->vertexShaderHandle); glCompileShader(this->fragmentShaderHandle); this->shaderProgram = glCreateProgram(); glAttachShader(this->shaderProgram, this->vertexShaderHandle); glAttachShader(this->shaderProgram, this->fragmentShaderHandle); glLinkProgram(this->shaderProgram); return; } GLuint MyShaders::getShaderProgram() { return this->shaderProgram; } const char * MyShaders::getVertexShaderCode() { return this->vertexShaderCode; } const char * MyShaders::getFragmentShaderCode() { return this->fragmentShaderCode; } MyWindow.h
      #pragma once #include <glad\glad.h> #include <GLFW\glfw3.h> class MyWindow { private: GLFWwindow * windowHandle; int windowWidth; int windowHeight; const char * windowTitle; public: MyWindow(int windowWidth, int windowHeight, const char * windowTitle); ~MyWindow(); GLFWwindow * getWindowHandle(); void createWindow(); void MyWindow::destroyWindow(); }; MyWindow.cpp
      #include "MyWindow.h" MyWindow::MyWindow(int windowWidth, int windowHeight, const char * windowTitle) { this->windowHandle = NULL; this->windowWidth = windowWidth; this->windowWidth = windowWidth; this->windowHeight = windowHeight; this->windowTitle = windowTitle; glfwInit(); } MyWindow::~MyWindow() { } GLFWwindow * MyWindow::getWindowHandle() { return this->windowHandle; } void MyWindow::createWindow() { // Use OpenGL 3.3 and GLSL 3.3 glfwWindowHint(GLFW_CONTEXT_VERSION_MINOR, 3); glfwWindowHint(GLFW_CONTEXT_VERSION_MAJOR, 3); // Limit backwards compatibility glfwWindowHint(GLFW_OPENGL_PROFILE, GLFW_OPENGL_CORE_PROFILE); glfwWindowHint(GLFW_OPENGL_FORWARD_COMPAT, GL_TRUE); // Prevent resizing window glfwWindowHint(GLFW_RESIZABLE, GL_FALSE); // Create window this->windowHandle = glfwCreateWindow(this->windowWidth, this->windowHeight, this->windowTitle, NULL, NULL); glfwMakeContextCurrent(this->windowHandle); } void MyWindow::destroyWindow() { glfwTerminate(); } MyShapes.h
      #pragma once #include <glad\glad.h> #include <GLFW\glfw3.h> class MyShapes { public: MyShapes(); ~MyShapes(); GLuint & drawTriangle(float coordinates[]); }; MyShapes.cpp
      #include "MyShapes.h" MyShapes::MyShapes() { } MyShapes::~MyShapes() { } GLuint & MyShapes::drawTriangle(float coordinates[]) { GLuint vertexBufferObject{}; GLuint vertexArrayObject{}; // Create a VAO glGenVertexArrays(1, &vertexArrayObject); glBindVertexArray(vertexArrayObject); // Send vertices to the GPU glGenBuffers(1, &vertexBufferObject); glBindBuffer(GL_ARRAY_BUFFER, vertexBufferObject); glBufferData(GL_ARRAY_BUFFER, sizeof(coordinates), coordinates, GL_STATIC_DRAW); // Dertermine the interpretation of the array buffer glVertexAttribPointer(0, 3, GL_FLOAT, GL_FALSE, 3*sizeof(float), (void *)0); glEnableVertexAttribArray(0); // Unbind the buffers glBindBuffer(GL_ARRAY_BUFFER, 0); glBindVertexArray(0); return vertexArrayObject; } MyFileHandler.h
      #pragma once #include <cstdio> #include <cstdlib> class MyFileHandler { private: const char * fileName; unsigned long fileSize; void setFileSize(); public: MyFileHandler(const char * fileName); ~MyFileHandler(); unsigned long getFileSize(); const char * readFile(); }; MyFileHandler.cpp
      #include "MyFileHandler.h" MyFileHandler::MyFileHandler(const char * fileName) { this->fileName = fileName; this->setFileSize(); } MyFileHandler::~MyFileHandler() { } void MyFileHandler::setFileSize() { FILE * fileHandle = NULL; fopen_s(&fileHandle, this->fileName, "rb"); fseek(fileHandle, 0L, SEEK_END); this->fileSize = ftell(fileHandle); rewind(fileHandle); fclose(fileHandle); return; } unsigned long MyFileHandler::getFileSize() { return (this->fileSize); } const char * MyFileHandler::readFile() { char * buffer = (char *)malloc((this->fileSize)+1); FILE * fileHandle = NULL; fopen_s(&fileHandle, this->fileName, "rb"); fread(buffer, this->fileSize, sizeof(char), fileHandle); fclose(fileHandle); buffer[this->fileSize] = '\0'; return buffer; } VertexShader.glsl
      #version 330 core layout (location = 0) vec3 VertexPositions; void main() { gl_Position = vec4(VertexPositions, 1.0f); } FragmentShader.glsl
      #version 330 core out vec4 FragmentColor; void main() { FragmentColor = vec4(1.0f, 0.0f, 0.0f, 1.0f); } I am attempting to create a simple engine/graphics utility using some object-oriented paradigms. My first goal is to get some output from my engine, namely, a simple red triangle.
      For this goal, the MyShapes class will be responsible for defining shapes such as triangles, polygons etc. Currently, there is only a drawTriangle() method implemented, because I first wanted to see whether it works or not before attempting to code other shape drawing methods.
      The constructor of the MyEngine class creates a GLFW window (GLAD is also initialized here to load all OpenGL functionality), and the myEngine.run() method in Main.cpp is responsible for firing up the engine. In this run() method, the shaders get loaded from files via the help of my FileHandler class. The vertices for the triangle are processed by the myShapes.drawTriangle() method where a vertex array object, a vertex buffer object and vertrex attributes are set for this purpose.
      The while loop in the run() method should be outputting me the desired red triangle, but all I get is a grey window area. Why?
      Note: The shaders are compiling and linking without any errors.
      (Note: I am aware that this code is not using any good software engineering practices (e.g. exceptions, error handling). I am planning to implement them later, once I get the hang of OpenGL.)

       
    • By KarimIO
      EDIT: I thought this was restricted to Attribute-Created GL contexts, but it isn't, so I rewrote the post.
      Hey guys, whenever I call SwapBuffers(hDC), I get a crash, and I get a "Too many posts were made to a semaphore." from Windows as I call SwapBuffers. What could be the cause of this?
      Update: No crash occurs if I don't draw, just clear and swap.
      static PIXELFORMATDESCRIPTOR pfd = // pfd Tells Windows How We Want Things To Be { sizeof(PIXELFORMATDESCRIPTOR), // Size Of This Pixel Format Descriptor 1, // Version Number PFD_DRAW_TO_WINDOW | // Format Must Support Window PFD_SUPPORT_OPENGL | // Format Must Support OpenGL PFD_DOUBLEBUFFER, // Must Support Double Buffering PFD_TYPE_RGBA, // Request An RGBA Format 32, // Select Our Color Depth 0, 0, 0, 0, 0, 0, // Color Bits Ignored 0, // No Alpha Buffer 0, // Shift Bit Ignored 0, // No Accumulation Buffer 0, 0, 0, 0, // Accumulation Bits Ignored 24, // 24Bit Z-Buffer (Depth Buffer) 0, // No Stencil Buffer 0, // No Auxiliary Buffer PFD_MAIN_PLANE, // Main Drawing Layer 0, // Reserved 0, 0, 0 // Layer Masks Ignored }; if (!(hDC = GetDC(windowHandle))) return false; unsigned int PixelFormat; if (!(PixelFormat = ChoosePixelFormat(hDC, &pfd))) return false; if (!SetPixelFormat(hDC, PixelFormat, &pfd)) return false; hRC = wglCreateContext(hDC); if (!hRC) { std::cout << "wglCreateContext Failed!\n"; return false; } if (wglMakeCurrent(hDC, hRC) == NULL) { std::cout << "Make Context Current Second Failed!\n"; return false; } ... // OGL Buffer Initialization glClear(GL_DEPTH_BUFFER_BIT | GL_COLOR_BUFFER_BIT); glBindVertexArray(vao); glUseProgram(myprogram); glDrawElements(GL_TRIANGLES, indexCount, GL_UNSIGNED_SHORT, (void *)indexStart); SwapBuffers(GetDC(window_handle));  
    • By Tchom
      Hey devs!
       
      I've been working on a OpenGL ES 2.0 android engine and I have begun implementing some simple (point) lighting. I had something fairly simple working, so I tried to get fancy and added color-tinting light. And it works great... with only one or two lights. Any more than that, the application drops about 15 frames per light added (my ideal is at least 4 or 5). I know implementing lighting is expensive, I just didn't think it was that expensive. I'm fairly new to the world of OpenGL and GLSL, so there is a good chance I've written some crappy shader code. If anyone had any feedback or tips on how I can optimize this code, please let me know.
       
      Vertex Shader
      uniform mat4 u_MVPMatrix; uniform mat4 u_MVMatrix; attribute vec4 a_Position; attribute vec3 a_Normal; attribute vec2 a_TexCoordinate; varying vec3 v_Position; varying vec3 v_Normal; varying vec2 v_TexCoordinate; void main() { v_Position = vec3(u_MVMatrix * a_Position); v_TexCoordinate = a_TexCoordinate; v_Normal = vec3(u_MVMatrix * vec4(a_Normal, 0.0)); gl_Position = u_MVPMatrix * a_Position; } Fragment Shader
      precision mediump float; uniform vec4 u_LightPos["+numLights+"]; uniform vec4 u_LightColours["+numLights+"]; uniform float u_LightPower["+numLights+"]; uniform sampler2D u_Texture; varying vec3 v_Position; varying vec3 v_Normal; varying vec2 v_TexCoordinate; void main() { gl_FragColor = (texture2D(u_Texture, v_TexCoordinate)); float diffuse = 0.0; vec4 colourSum = vec4(1.0); for (int i = 0; i < "+numLights+"; i++) { vec3 toPointLight = vec3(u_LightPos[i]); float distance = length(toPointLight - v_Position); vec3 lightVector = normalize(toPointLight - v_Position); float diffuseDiff = 0.0; // The diffuse difference contributed from current light diffuseDiff = max(dot(v_Normal, lightVector), 0.0); diffuseDiff = diffuseDiff * (1.0 / (1.0 + ((1.0-u_LightPower[i])* distance * distance))); //Determine attenuatio diffuse += diffuseDiff; gl_FragColor.rgb *= vec3(1.0) / ((vec3(1.0) + ((vec3(1.0) - vec3(u_LightColours[i]))*diffuseDiff))); //The expensive part } diffuse += 0.1; //Add ambient light gl_FragColor.rgb *= diffuse; } Am I making any rookie mistakes? Or am I just being unrealistic about what I can do? Thanks in advance
    • By yahiko00
      Hi,
      Not sure to post at the right place, if not, please forgive me...
      For a game project I am working on, I would like to implement a 2D starfield as a background.
      I do not want to deal with static tiles, since I plan to slowly animate the starfield. So, I am trying to figure out how to generate a random starfield for the entire map.
      I feel that using a uniform distribution for the stars will not do the trick. Instead I would like something similar to the screenshot below, taken from the game Star Wars: Empire At War (all credits to Lucasfilm, Disney, and so on...).

      Is there someone who could have an idea of a distribution which could result in such a starfield?
      Any insight would be appreciated
    • By afraidofdark
      I have just noticed that, in quake 3 and half - life, dynamic models are effected from light map. For example in dark areas, gun that player holds seems darker. How did they achieve this effect ? I can use image based lighting techniques however (Like placing an environment probe and using it for reflections and ambient lighting), this tech wasn't used in games back then, so there must be a simpler method to do this.
      Here is a link that shows how modern engines does it. Indirect Lighting Cache It would be nice if you know a paper that explains this technique. Can I apply this to quake 3' s light map generator and bsp format ?
  • Popular Now