Jump to content
  • Advertisement
Sign in to follow this  
MiguelMartin

OpenGL Not sure how to implement this.

This topic is 2514 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

So currently I'm writing a Rendering Engine with OpenGL and possibly some other API's in the future (very far away ;)). Anyhow I have encountered a problem, I for some reason cannot seem to figure out how I should render my scene multiple times into different viewports. The first thing I did was create one display list and just encapsulate every function that was called prior to it into that display list.

For example:

// Render everything
renderer->beginDrawing(); // Creates a display list.
renderer->renderRectangle(); // Draws a rectangle (stores it in the display list), with transformations
renderer->finishDrawing(); // Ends the display list and renders everything to the screen.


Now I tried it and it seemed to work at first, but then I tried adding more objects in my scene and the transformations of the objects were stuffing up. I.E. If I were to rotate one object everything else would rotate, for some reason whenever I tried loading the identity matrix for every object in the scene, but it just wouldn't do reset the matrix for every object? I wasn't sure if a display list saved calls to glLoadIdentity or glTranslatef and etc., so I scratched that plan.

Then I decided adding display lists for every object on the fly, then loop through all the objects and render them (calling the display list). Now it works and all, but I'd imagine if I had a huge scene that it would cause a lot of overhead or something. I was thinking of using VBO's instead of display lists, from what I hear they're pretty light weight and not deprecated? I would still imagine some overhead, if I was making VBO's on the fly, every frame. Should I do this, or try to "add" objects to some sort of list and then render that list (full of objects) all at once with one function call.

For example:

// Outside of the game-loop (initialization code)
renderer->addObjectToRender(rect); // Add a rectangle to render.


// Inside the game-loop
renderer->renderScene(); // Renders the ENTIRE scene all at ONCE.


Now I would think that this wouldn't cause that much over-head as it only allocates memory for a VBO/Display List once (for every object) during the entire program (or at least scene). The only problem I think I have is, I don't think it would be as "dynamic" of some sort, perhaps? I'm not sure, that's why I'm asking.

Any help would be appreciated, I have no idea how expensive it is to create a VBO/Display List on every frame, that's why I was thinking to add objects to render and then just render the entire scene.

Many thanks for reading this smile.png.

Share this post


Link to post
Share on other sites
Advertisement
VBO are not deprecated and are likely to never be. They are the equivalent of a new[] for allocating memory.
Why do you need to create a VBO/display list every frame?
Are there new objects being created constantly?

I wasn't sure if a display list saved calls to glLoadIdentity or glTranslatef and etc[/quote]
Yes they do. It should be documented in the spec.

Share this post


Link to post
Share on other sites
Hello,


Now I tried it and it seemed to work at first, but then I tried adding more objects in my scene and the transformations of the objects were stuffing up. I.E. If I were to rotate one object everything else would rotate, for some reason whenever I tried loading the identity matrix for every object in the scene, but it just wouldn't do reset the matrix for every object? I wasn't sure if a display list saved calls to glLoadIdentity or glTranslatef and etc., so I scratched that plan.

Display lists do store the matrix stack. But instead of loading the identity, you should probably consider pushing and popping of matrices to and from the matrix stack. As glLoadIdentity would also reset the view matrix, which is probably not what you want.
So in practice it goes something like this:
glLoadIdentity();
glLookAt()… // setting up view

glPushMatrix();
glScale(), glRotate(), glTranslate…
// draw first object
glPopMatrix(); // no we only have the view matrix on the stack.

glPushMatrix();
glScale(), glRotate(), glTranslate…
// draw second object
glPopMatrix();



Then I decided adding display lists for every object on the fly, then loop through all the objects and render them (calling the display list). Now it works and all, but I'd imagine if I had a huge scene that it would cause a lot of overhead or something. I was thinking of using VBO's instead of display lists, from what I hear they're pretty light weight and not deprecated? I would still imagine some overhead, if I was making VBO's on the fly, every frame. Should I do this, or try to "add" objects to some sort of list and then render that list (full of objects) all at once with one function call.

If you plan to render many objects (which is kind of the point when building an engine, right? smile.png) then better don’t use display lists. They are incredibly slow compared to VBOs, since a display list essentially only compiles a command buffer. If you render 10.000 vertices, you have 10.000 times a glVertex() in the command buffer and each single command requires a GPU cycle to process. VBOs on the other hand store only geometry data and you simply tell the GPU (in the command buffer) where the data starts and how many vertices to draw. This makes the command buffer tremendously smaller and the overall processing much faster. If you want to make it right, you will need Vertex Array Objects (VAOs) to tell the GPU how to read the vertex data from the VBOs. If you need some code samples I’d recommend the OpenGL samples pack.
In my experience it is the best resource for core profile compliant programming.

The matrix stack, however, can help you to organize your matrices, but in the end you’d have to upload them in uniforms (or even better Uniform Buffer Objects (UBOs)) to the GPU and bind them to your shaders. You’d better only update your UBO if the matrices changed.
Dropping the old fixed-function stuff (display lists etc) takes up a little while, but is totally worth it.


Now I would think that this wouldn't cause that much over-head as it only allocates memory for a VBO/Display List once (for every object) during the entire program (or at least scene). The only problem I think I have is, I don't think it would be as "dynamic" of some sort, perhaps? I'm not sure, that's why I'm asking.

As for static and dynamic geometry: Usually you have for each object a VBO. I think batching multiple objects into a single VBO causes more problems than it solves (especially if dynamic objects are involved). It would be more helpful to sort your objects firstly, to minimize state changes and second in a front-to-back order to exploit early-z culling. VBOs have the nice property that you can update them (in contrast to display list, which can’t be changed). So, dynamic objects are possible with VBOs, too.


I have no idea how expensive it is to create a VBO/Display List on every frame, that's why I was thinking to add objects to render and then just render the entire scene.

In general you don’t want to create any resources during rendering (neither OpenGL objects nor new/malloc), since it slows you down. Create everything you need beforehand and only create geometry on the fly if there is no other way (usually there is).

Good luck with your engine! smile.png
Cheers

Share this post


Link to post
Share on other sites

If you plan to render many objects (which is kind of the point when building an engine, right? ) then better don’t use display lists. They are incredibly slow compared to VBOs, since a display list essentially only compiles a command buffer. If you render 10.000 vertices, you have 10.000 times a glVertex() in the command buffer and each single command requires a GPU cycle to process. VBOs on the other hand store only geometry data and you simply tell the GPU (in the command buffer) where the data starts and how many vertices to draw. This makes the command buffer tremendously smaller and the overall processing much faster.

While this might be how it would appear if you only read the standard, it's not strictly-speaking true. In reality, the driver does a significant amount of optimisation when you compile a display list, including packing the data into internal VBOs where appropriate. The optimiser is generally pretty good, and in the case of NVidia's drivers, often beats hand-optimised VBOs on performance.

@OP: because display lists take a long time to compile, and because VBOs take time to write, you should create both of these *once*, during the initialisation/loading phase of your program. After that, each frame you just iterate through all the objects you wish to render, and draw them - there is no need to try and pack the entire frame into a display list/VBO.

Share this post


Link to post
Share on other sites
Hello again,


While this might be how it would appear if you only read the standard, it's not strictly-speaking true. In reality, the driver does a significant amount of optimisation when you compile a display list, including packing the data into internal VBOs where appropriate.

Oh yeah, that’s true. However, I don’t like having to rely on an optimizer that optimizes deprecated code. (In the end you never know which GPUs your customers have.) Sooner or later, when the architecture has changed enough the old optimizations won’t be tailored for the new architectures anymore. So, we either end up with non-optimal code or we give the guys over in Santa Clara a big headache, while they are (again) optimizing deprecated stuff. smile.png
So, I think it is best to drop the deprecated things right away.


The optimiser is generally pretty good, and in the case of NVidia's drivers, often beats hand-optimised VBOs on performance.

Well, that impresses me. In my travels I have never seen a display list being remarkably faster than a VBO (except perhaps in small tests with like some dozens of vertices but never on large scale). So I’d rather disagree with the “often”. But, I’m open to learn new things. smile.png If you have some benchmark at hand I’d be happy to take a look.

Cheers!

Share this post


Link to post
Share on other sites

Well, that impresses me. In my travels I have never seen a display list being remarkably faster than a VBO (except perhaps in small tests with like some dozens of vertices but never on large scale). So I’d rather disagree with the “often”. But, I’m open to learn new things. If you have some benchmark at hand I’d be happy to take a look.

It's not my observation, it's the official party line from NVidia. The best reference I have to hand is this comment on display list performance, but if you go through their OpenGL presentations, they repeat it over and over...

Share this post


Link to post
Share on other sites

[quote name='Tsus' timestamp='1327607173' post='4906509']
Well, that impresses me. In my travels I have never seen a display list being remarkably faster than a VBO (except perhaps in small tests with like some dozens of vertices but never on large scale). So I’d rather disagree with the “often”. But, I’m open to learn new things. If you have some benchmark at hand I’d be happy to take a look.

It's not my observation, it's the official party line from NVidia. The best reference I have to hand is this comment on display list performance, but if you go through their OpenGL presentations, they repeat it over and over...
[/quote]

Thank you for the reference to the slides! I looked into it. They claim there that display lists are the fastest method… (This is news to me.) I only believe it, if I see it in a benchmark I ran on my machine here.

So… I found a benchmark from Louis Bavoil (brilliant guy from Nvidia) and gave it a try. The data gathered on the website was old, but luckily there was source code.

I didn’t do many experiments, I just tried it on a GeForce 310M.
2 million triangles (DL: 16fps, VBO: 57fps)
4 million triangles (DL: 5.5fps, VBO: 30fps)

I figured it is not a fair comparison on such an old machine. So I checked it out on a Fermi (GTX 460).
2 million triangles (DL: 19.5, 23.8*fps VBO: 61.4fps)
4 million triangles (DL: 10.6, 11.8*fps VBO: 61.2fps)
8 million triangles (DL: 6.4*fps VBO: 44.5fps)
12 million triangles (DL: 4.5*fps VBO: 29.7fps)

The VBO experiments were hold back by Vsync so I went up with the triangle count until something happened. At 8 million the display list compiled a few minutes and I eventually shot down the process, because I didn’t want to wait any longer (not practical). The values marked with * indicate tests where I turned on batching (divided the data up to 4 display lists).

Mhm. I'm a little sad to see that Nvidia spreads such rumors.
Well, who knows what kind of benchmark they constructed, right?. Maybe there are special cases where display lists do beat VBOs.

@miguelishawt: Sorry for going so off-topic here. smile.png

Share this post


Link to post
Share on other sites


Mhm. I'm a little sad to see that Nvidia spreads such rumors.



They aren't rumours. In many real world circumstances Nvidia's display list performance is untouchable. My own personal app recently migrated to VAO/VBO from displaylists/vertexarrays and I'm still not quite back at display list performance on my Nvidia machines. Rendering speeds on AMD/ATI is way up, however.

Blasting a single 12M triangle dlist doesn't seem like a real-world test, at least in the world of video games.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!