Jump to content

  • Log In with Google      Sign In   
  • Create Account

L. Spiro

Member Since 29 Oct 2003
Offline Last Active Today, 01:00 AM

Posts I've Made

In Topic: BSP trees with modern OpenGL

Yesterday, 03:20 PM

A BSP is not the preferred method for rendering translucent objects.
Use a standard render-queue and sort full objects (sub-meshes) back-to-front.

How would I be able to tell OpenGl in which order it should render the individual polygons, without having to buffer my vertex/index data over and over again?

It doesn’t matter if the vertices are all in 1 single large vertex buffer. You need multiple index buffers, 1 for each sub-mesh (a single draw call of a translucent part of the overall model). I don’t know what you mean by “buffering” your data. Once the VBO and IBO’s are created buffering is done. You just use a render-queue to decide the order in which to draw everything.

L. Spiro

In Topic: Optimization philosophy and what to do when performance doesn't cut it?

16 September 2014 - 04:36 PM

2 phrases I hate are, “Make games, not engines,” and, “Premature optimization is the root of all evil.”
While they may stem from sound advice, these over-simplified statements cause more strife than good as they lend themselves to more and more people misunderstanding what they mean.
In several cases, people were writing a game and putting reusable code into its own section when they realized, “Wait a minute, am I writing an engine?  Oh no!!”.  They are literally asking, “Please help!  I keep making an engine while I make my game!!  How can I make the game without making an engine?”.
In the same way, thanks to reading an over-simplified mantra regarding optimizations, people often end up going out of their way not to optimize until in-theory the very end (in practice, never).
The fact is that many people wouldn’t be doing anything wrong at all if they had never read that.  Most of it is common sense.
#1: The “premature optimization” mantra holds truest if you aren’t sure if an algorithm will even work.  Prototypes definitely do not need to be concerned with performance.  They often have to be entirely rewritten, but if you have a deadline, slow is better than nothing.  This is where you would push off the rewrite until later when you have more time.
#2: There is nothing wrong with taking a moment to simplify a mathematical equation, is there?  Especially if things jump out to you quickly.  Likewise, there are tons of simple things that should constantly be jumping out at you as you write code.
If the order of the loop doesn’t matter, for ( size_t i = vec.size(); i--; ) is never worse than for ( size_t i = 0; i < vec.size(); ++i ).
++i is never worse than i++.
If you are writing a matrix routine:
		CMatrix4x4 & MatrixRotationZLH( float _fA ) {
			_11 = ::cos( _fA );
			_12 = ::sin( _fA );
			_13 = 0.0f;
			_14 = 0.0f;
			_21 = -::sin( _fA );
			_22 = ::cos( _fA );
			_23 = 0.0f;
			_24 = 0.0f;
			return (*this);
…it’s obvious that you shouldn’t be calling ::sin() and ::cos() on the same value multiple times.
		CMatrix4x4 & MatrixRotationZLH( float _fA ) {
			float fS = ::sin( _fA );
			float fC = ::cos( _fA );
			_11 = fC;
			_12 = fS;
			_13 = 0.0f;
			_14 = 0.0f;
			_21 = -fS;
			_22 = fC;
			_23 = 0.0f;
			_24 = 0.0f;
			return (*this);
It’s not premature to save redundant calculations to temporaries (and in this specific case ::sincos() would be better still).
There is no excuse for not handling obvious and simple cases on the first pass through the code you are writing.
#3: If you already understand exactly what you are supposed to be implementing, there is no reason not to spend a few minutes thinking ahead towards what obvious bottlenecks there might be and designing around them.
Just this morning I was implementing the first stages of a triangle cache optimizer, which begins by creating a vertex-triangle adjacency list.
Since each vertex can have any number of adjacent triangles, if I was an idiot I would have just started coding right away and given each vertex a variable-length array (std::vector or similar) to hold its list of connections.
That’s not how we do things.  I took 5 minutes to think about how to avoid making so many allocations, because allocations are always something you want to avoid.  It’s obvious.
I structured my list to take advantage of a pool pre-allocated once.
In that time I also realized that I didn’t need to copy the 3 indices to each triangle over to the list, I could simply store a pointer to the first index in the triangle and know that the next 2 indices are part of the same triangle.  Now my pool is 3 times smaller on 32-bit machines.
A little thought saved me a lot of memory and performance issues.
Will I find little things I could improve later?  Probably, but I won’t have to rewrite the whole thing because I didn’t use a retarded implementing in the first place.
#4: You don’t need to focus on non-obvious or time-consuming things on the first pass.  Making a pool was obvious.  Making my loops go from X to 0 was obvious.
If there’s anything else I can do, it’s not obvious, and I am not going to spend time looking for it until profiling later reveals where I should be looking.
This is the point of the mantra.
I get the feeling that people read that mantra and suddenly become idiots where they otherwise would not.
Just because you read that it doesn’t mean suddenly you should repeat ::cos( A ) in five places in a routine or avoid spending the time to think about how to reduce the number of allocations etc., like the people who go out of their ways to avoid writing an engine even though it is the natural byproduct of writing a game.
It means, “Do everything you normally would, just don’t spend unreasonable time on non-obvious optimizations until a profiler has told you where to look.”
Do as much as you can as you go without specifically taking time off to look for deep optimizations.
Go back any time it is convenient and profile for bottlenecks.  Not at the end of the project, but during its development.
I just added our optimized Oren-Nayar shading model to my own engine this morning and then went on to other tasks.  I had a few extra minutes afterwards and did a test to see if an if in the shader could improve performance.  It turned out to be slower or exactly the same in my case, but if you have a hunch and 5 minutes to test, there is also nothing wrong with doing that sporadically throughout the project either.
L. Spiro

In Topic: best way to make game render most efficient

11 September 2014 - 09:59 PM

* Efficiently.


I will let others tell you about how varied hardware is and to always test and profile etc.



One reason teams may fail to optimize is because optimization is a process that starts early and once certain systems are in place with a bad/slow foundation there is nothing you can do.


If we are talking only about rendering, there are quite a few general tips that should improve performance in most, if not all, cases.


#1: Use index buffers and reduce vertex buffers to as little as possible.

#2: Order triangles for best cache usage.  Use either Tipsify or Forsyth’s algorithm.  Prefer cache over triangle strips.

#3: Always use 16-bit indices.  When 32-bit index buffers are necessary, it is usually double the bandwidth for only a few triangles that were outside the range of 16-bit indices.  Therefore, it is usually better to make 2 16-bit draw calls (with 1 vertex buffer and an offset on the 2 index buffers) and save on index-buffer bandwidth.

#4: Reduce/eliminate redundant API calls (any call to Direct3D or OpenGL).  If blending is already disabled, don’t disable it again.  Don’t set the same viewport that is already set.  To do this, keep track of the render state locally and make a wrapper around all API calls.  If blending is being activated, check your local state and if it is already active, return without calling any API functions/methods.

#5:This is especially true for shaders, textures, vertex buffers, and index buffers.

#6: Because of #5, you should sort objects via a render-queue so that objects using the same shader are drawn together, then objects using the same textures, then the same vertex buffers, and optionally the same index buffers.  Combining a render-queue with #4 means you will set a shader once and draw a bunch of objects before you change shaders again.  This is one of the most important things you can do.

#7: Be smart with your shaders.  This could be a whole new subject on optimizations, and I only have 5 minutes, so I might edit some tips in here later.

#8: Reuse textures/render targets as much as possible.  Bloom requires 2 chains of textures, ping-ponging back and forth between them, each getting smaller by 50%.  Instead of making 2 whole chains from full-size down to 1-by-1, just make 1 texture full-sized and 1 texture half sized.  Then ping-pong back and forth between those, rendering into smaller parts of each for each pong.

#9: Avoid swapping render targets as much as possible.

#10: Avoid swapping textures as much as possible.  #4, #5, and #6 cover this, but also, use texture atlases when possible.  Put all the parts of a model into 1 texture.

#11: Use instancing as much as possible to reduce draw calls.

#12: Reduce over-draw.  If 2 objects have the same shader, textures, vertex buffers, etc., render front-to-back among them (this means using depth as a tie-breaking sorting criterion in the render-queue).  Note that translucent objects must be rendered back-to-front.  I haven’t yet personally seen an improvement over this by doing a dull depth pre-pass, since you can get most of the benefits of depth pre-pass without dedicating a whole pass to it.

#13: Keeping blending and alpha-testing disabled and don’t use discard in shaders unless absolutely necessary.

#14: Implement multi-threaded rendering.  This is unrelated to decoupling your rendering and logic, but decoupling is necessary.  Multi-threaded rendering means creating your own custom display list and passing it off to a thread dedicated to just performing rendering.



L. Spiro

In Topic: how to implement ghost shadow effect like these pictures?

11 September 2014 - 05:40 AM

You can also do this trail effect by using a render target with an alpha channel. It should give a smoother result than instancing the trailed meshes.

Just note that while this works in all 2D cases, the camera must remain stationary for this to work in all 3D cases. If the camera moves in a 3D scene, this method will cause the after-shadows to appear billboarded.

L. Spiro

In Topic: Rendering large meshes in OpenGl

10 September 2014 - 09:18 PM

In addition to everything mentioned above, there is no reason to glEnableClientState(GL_VERTEX_ARRAY); and glDisableClientState(GL_VERTEX_ARRAY);.

Vertices are always required in a draw call.

Enable it once and leave it enabled.


It also wastes time enabling and disabling GL_NORMAL_ARRAY needlessly between draw calls.  That is, if you draw 2 objects in a row that both have GL_NORMAL_ARRAY enabled, don’t disable and re-enable it, just leave it enabled.

You should always reduce/eliminate redundant API calls.



L. Spiro