Jump to content
  • Advertisement
Sign in to follow this  
Max_Payne

OpenGL Modern Fast Rendering?

This topic is 4829 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

This is a thread about tips and tricks for fast rendering in modern 3D engines. A friend and I are currently designing a renderer module, and I have been pondering how to do things efficiently for a while. Our game will be setup in a large city environment, with both indoors and outdoors. We plan to have dynamic lighting everywhere, because lightmaps would consume too much memory since there would be several square kilometers of surface area in the map. We don't really care if the dynamic lighting doesn't look very realistic. What we mostly care about is having some form of lighting with reasonably probable variations in the illumination, and not just fullbright graphics everywhere. This is what I have come up with so far: 1. The renderer will store a large vertex buffer containing all the polygons in the scene with their normals and texture coordinates. This might very well be large enough to require 32 bit vertex indices. 2. The scene will be in a scene partition, and rendered according to what parts of the scene are in the frustum. All polygons in visible scene nodes will be renderer by passing their 32 bit vertex indices to the renderer. 3. Dynamic models will be renderer directly from in-RAM vertex arrays, for dynamism reasons (nope, we aren't animating them in shaders, because its not practical for us). 4. Materials will encompass all rendering properties of a surface, and will be bound as you would bind a texture in OpenGL. There will be only one material per surface. 5. The renderer will store the "rendering actions", along with their associated states (material bound, transformation matrix), in an std::vector<> with a large reserve, but not render things immediately. 6. The renderer will sort the rendering actions by material using std::sort, and will make sure to put all alpha materials (transparency) last (this could actually be done at insertion, and a stable sort could be used, if its not much slower). For actions using the same material, the index buffers will be combined into one, to minimize the required API calls. 7. The renderer will render with no materials, to do a "Z-fill" pass. 8. The renderer will render with materials and shaders enabled (shaders are associated with materials), the previous Z-pass should eliminate overdraw and save alot of shader performance. The above is based on a few assumptions: - Material state and shader state changes are the most expensive actions - The z-fill pass will save shader performance, which is much needed - It is possible to sort 5000 to 15000 render actions (by material index number) in a frame in a very short time using std::sort (hopefully less than 5ms). - It will be much faster to render a few hundred index arrays, for polys sharing the same material, than do the required API calls to render each index array individually. Feedback and suggestions of various other speedup tricks are welcome ;)

Share this post


Link to post
Share on other sites
Advertisement
What range of hardware (CPU/GPU) will you be targeting?

Your base level rendering seems reasonable for a start to get something running. To render a large city environment with reasonable detail, you will have to spend a lot of time on LOD and HSR techniques. Do you have plans for that stuff?

Share this post


Link to post
Share on other sites
Quote:
Original post by ganchmaster
What range of hardware (CPU/GPU) will you be targeting?

Your base level rendering seems reasonable for a start to get something running. To render a large city environment with reasonable detail, you will have to spend a lot of time on LOD and HSR techniques. Do you have plans for that stuff?


Indeed, I'm just concentrating on the optimized rendering pipeline at the moment.

Our targeted minimum requirements are:
- 1GB RAM
- 1GB HD space
- Radeon 9600 128MB/GeForce FX 128MB/PS2.0 hardware
- 2.4 GHz P4 / AMD 2400+

I just did some testing. On a 1.5 GHz machine, according to the unix timers (1 microsecond precision), it takes 0,3 ms to sort 35000 integers using std::sort and 0,4ms using std::stable_sort, so that shouldn't be too much of a problem. It might take 2 ms if I add the cost of pointer dereferencing on an object and such. So the sorting per material definitely seems viable to save performance. I will probably need to insert the alpha objects in a different vector to keep them last in the rendering.

For the LOD, I'm planning on having automatic mesh LOD on all static models. As for the city itself, I have plans to discard the inside, and possibly not render all the static meshes that decorate the buildings at all from a long range. I also plan on using displacement mapped textures for the details on the walls of buildings, to minimize the polygon count of that.

Share this post


Link to post
Share on other sites
Quote:
Original post by superpig
Have you looked into Precomputed Radiance Transfer and Spherical Harmonics?


Nope, I haven't. I have heard the names before, but that's mostly it.

I have some tight restrictions for any precomputing though:
1. The amount of stored data as a result of precomputations must be *very* small
2. It must precompute very fast. Maps are to be procedurally generated and there can't be any unreasonable waiting time.
3. It must render fast.

Do you have links to any PRT tutorials that aren't overly complicated?

As for spherical harmonics, how do they work, how fast is it, how expensive is the precomputation?

Share this post


Link to post
Share on other sites
For info on spherical harmonics check out this paper. It's pretty simple to understand. If you're using DX you can also check out the PRT engine in D3DX.

Quote:
Dynamic models will be renderer directly from in-RAM vertex arrays, for dynamism reasons (nope, we aren't animating them in shaders, because its not practical for us).


Prehaps I'm reading this wrong but it seems you're gonna have models in ram that you're going to be altering on a per-vertex basis on the CPU per-frame (or at least per-update or something similar), this is a bad idea you don't want to touch geometry on a per-vertex basis on the CPU, that's what vertex shaders are for.

Share this post


Link to post
Share on other sites
Quote:
Original post by Monder
For info on spherical harmonics check out this paper. It's pretty simple to understand. If you're using DX you can also check out the PRT engine in D3DX.


I will check out spherical harmonics. But are they fast on modern hardware?

Quote:
Prehaps I'm reading this wrong but it seems you're gonna have models in ram that you're going to be altering on a per-vertex basis on the CPU per-frame (or at least per-update or something similar), this is a bad idea you don't want to touch geometry on a per-vertex basis on the CPU, that's what vertex shaders are for.


Actually, its the best way of doing things for us. Animating models in hardware is not possible on all hardware, and if we want per-polygon hit tracing and collision detection, it requires that it must be done on the CPU anyways. And its not like sending batches of vertices in a vertex array is really a problem with modern hardware.

Share this post


Link to post
Share on other sites
Quote:
Original post by Max_Payne
Actually, its the best way of doing things for us. Animating models in hardware is not possible on all hardware, and if we want per-polygon hit tracing and collision detection, it requires that it must be done on the CPU anyways. And its not like sending batches of vertices in a vertex array is really a problem with modern hardware.


Animating models in hardware is possible on all of your target platforms, though.

It's a good point that you must effectively animate the polygons on the CPU if you want to do collision detection per-polygon. However, very few games really need per-polygon collision detection on skinned objects. And even if yours is one of them, you will have to animate the normals also, and if you're doing normal-mapping, the binormals and tangents. So that can start to add up. Many games are CPU bound, and if you are doing a large city with indoors and ourdoors, 5000-15000 render batches, and per-polygon collision detection, and moving all of the vertex data for characters dynamically onto the card every frame, I bet yours will be one of them ;). So you might not want to write off vertex shader animation just yet. In particular, don't avoid it just because you haven't it done it before, if that's the case.

Share this post


Link to post
Share on other sites
Quote:
Original post by Max_Payne
Quote:
Prehaps I'm reading this wrong but it seems you're gonna have models in ram that you're going to be altering on a per-vertex basis on the CPU per-frame (or at least per-update or something similar), this is a bad idea you don't want to touch geometry on a per-vertex basis on the CPU, that's what vertex shaders are for.


Actually, its the best way of doing things for us. Animating models in hardware is not possible on all hardware, and if we want per-polygon hit tracing and collision detection, it requires that it must be done on the CPU anyways. And its not like sending batches of vertices in a vertex array is really a problem with modern hardware.

While I largely agree, it really depends on your type of game. Even if you want full per-poly shooting (or similar) you likely don't have to do that every model, every frame. If you start with bounding box checks first then you can lazily evaluate the current mesh positions on the cpu as and when you need them (and possibly even cheat with results from previous frames, if sufficiently similar).

Share this post


Link to post
Share on other sites
I am designing a rendering architecture very similar to yours and am wondering of how to sort the renderables to get as few costful state/resource changes as possible:

1-5 with 1 beeing the most costful (or less often changed) attribute

1. shader
2. lighting
3. material
4. textures
5. vertex buffer/FVF

especially I am usure where to arrange the changing of the vertex buffer (SetStreamSource)

Does this sort sequence make sense?

Thanks, Constantin

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!