Jump to content

  • Log In with Google      Sign In   
  • Create Account

mhagain

Member Since 06 Jun 2010
Offline Last Active Today, 12:03 PM

#5298419 Blending order

Posted by mhagain on Today, 11:53 AM

The only case where you don't have to render translucent objects in back-to-front order is where they don't overlap.  The whole point of back-to-front order is so that rendering of overlapping translucent objects will work right, both in terms of the blend equation and of depth testing.

 

Checking that objects don't overlap might be more expensive than just doing the sort, however.

 

Some cases you're just not going to bother sorting every single translucent object.  If, for example, you have multiple particle emitters in a scene, each emitting hundreds of particles, you're probably going to want to sort by emitter rather than by particle, and the result can look OK.




#5298390 [MSVC] Why does SDL initialize member variables?

Posted by mhagain on Today, 09:20 AM

I think you're missing the point that the "S" in "SDL" stands for "Security".

 

It's nothing to do with assisting debugging, it's nothing to do with spotting non-security issues in your code.

 

So in the case of a pointer it's either initialized or it's not, and if it's not initialized then it's going to be pointing at some memory address that's effectively random.  Hence security: a 3rd party could use your unitialized pointer to gain access to some other arbitrary data in your process address space.




#5298097 Is using one the switch statement better then using multiple if statements?

Posted by mhagain on 26 June 2016 - 05:59 AM

Unless it's in a performance-critical part of the code, just use whichever makes the code clearer and move on.  Otherwise it's actually really difficult to give a generalized answer to such a broadly-scoped question these days.

 

On modern hardware, with sophisticated branch prediction, the question of which approach is faster is probably not so clear-cut, and overly simplistic approaches (such as counting the number of asm instructions) could give misleading results.  If it's absolutely performance-critical you really should be coding up both, and doing proper profiling under a representative real-world workload.

 

If you're debating between using if or switch to avoid doing a calculation (or other work), you should also be considering whether the cost of the calculation is going to be less or more than the cost of the branch.

 

Different hardware is different.  What's faster on PC hardware may not be faster on mobile hardware, and vice-versa.

 

Are you going to blow cache?  Is it going to scale across multiple threads?  How well will it pipeline?  These can all be more important than micro-optimizing at the individual assembly instruction level.

 

Bottom line is that it's not the 1970s, 1980s, 1990s or even 2000s any more, and you absolutely do need to consider the performance characteristics of your target hardware as well as those of your code.




#5298096 The latest BMP I made crashes the program, other BMPs work fine

Posted by mhagain on 26 June 2016 - 05:37 AM

First of all, learn to use your debugger.  It's often trivial to identify causes of crashes by running a debug build under a good debugger; it will halt on the line of code that crashed and you can then inspect the contents of variables and follow the execution of code to help determine what went wrong.

 

Secondly, learn the file format you're using.  Did you know that the BMP format requires each row to be padded to a multiple of 4 bytes?  If you didn't know that, there's a good probability that this is the cause of your crash.

 

Thirdly, learn how to ask for help properly.  Don't just say "it crashed"; give as much information as possible to help others help you.  Even worse is saying "it gave an error message" without saying what the error message was.

 

Between these you should be able to determine and fix this problem.




#5296949 Logarithmic depth buffer/Infinite far clip plane?

Posted by mhagain on 17 June 2016 - 07:01 AM

Infinite far clip is not actually intended for this use case.  What it's actually for is geometry that needs to be projected on the far clipping plane, or extruded to infinity, with a classic example being stencil shadow volumes.  Despite sounding attractive, it actually suffers from the same precision issues as a standard perspective projection.  So while the far plane may be at infinity, that doesn't mean that you have a usable infinite range between the near and far planes; you still have the majority of your precision concentrated at the near plane and you still have the same number of bits asif you'd used a classic perspective projection.




#5296693 Batching draws - keeping buffers up to date

Posted by mhagain on 15 June 2016 - 11:33 AM

But if you have static vertex data that can be identified or marked as static, than you can put those into a separate buffer that isn't updated on a per-frame basis, only when needed. You than have a separate buffer that is refilled on each frame, and you draw from both of them.

 

For the OP, the trick is to realize that certain classes of data, which may not at first glance seem to be static, actually are.

 

Interpolated keyframe animation is a nice example here.  You take two frames and blend the vertices of them to produce a blended frame which is what is drawn.  A naive/brute-force approach might decide that this is dynamic data, the blending needs to be done on the CPU and then uploaded for drawing.  But it can actually be done entirely using static data - two glVertexAttribPointer calls (one for each frame) and a glUniform1f for the blend factor - then the work can be shifted to the GPU and the data can be made static.

 

This is an especial win in the OP's scenario because not only would it avoid the data upload, but it would also shift a not-insignificant bunch of work from slow interpreted Javascript to the much faster GPU.

 

So I'd encourage the OP to examine their data, examine the usage patterns, determine workloads that can actually be shifted to the GPU, and take advantage of putting heavier work on the faster processor wherever possible.




#5296618 Batching draws - keeping buffers up to date

Posted by mhagain on 15 June 2016 - 05:31 AM

For this kind of use case, buffer object streaming is the typical technique and is tried and trusted for over 15 years in Direct3D land (OpenGL didn't gain the ability to do a similar streaming model until much more recently). 

 

The thinking is, that because whether or not a range of the buffer is required by a pending draw call is such an important factor for performance, it's often faster to just replace everything than it is to replace small subranges.  This can seem counter-intuitive (writing more data is faster) but this approach allows the CPU and GPU to continue operating asynchronously which is a net win.

 

A brief outline of the technique:

 

Beginning at location 0 in the buffer, you append incoming data to the buffer, building up a batch as you go.  If state needs to change you flush the current batch (i.e. a draw call) then keep appending.

 

Eventually you run out of space in the buffer: the next set of incoming data won't fit.  At that stage you invalidate/orphan/discard (you'll see all 3 used depending on which sources you read, but it's just terminology and the end result is the same) the buffer - using glBufferData (..., NULL, ...) or GL_MAP_INVALIDATE_BUFFER_BIT - and what the driver will do is keep the existing buffer storage for pending draw calls, but allocate a new block of storage for future writing.  Reset to location 0 and begin again.  Again, this can seem counter-intuitive - allocating new storage at runtime surely must be slower - but you must remember that this is a common pattern for over 15 years.  Drivers are optimized around this pattern.  So what will actually happen is that after a few frames things will settle down and instead of making new allocations the driver will have it's own internal pool of 3 or so buffers that it just automatically cycles through.  New allocations will stop, and the driver will do it's own multi-buffering for you.

 

A further writeup (by one of the authors of the GL_ARB_map_buffer_range extension) is available here: https://www.opengl.org/discussion_boards/showthread.php/170118-VBOs-strangely-slow?p=1197780#post1197780




#5296508 How to avoid Singletons/global variables

Posted by mhagain on 14 June 2016 - 11:15 AM

The question is: is it global state?  This is a design decision you need to make on a case-by-case basis, and it requires having an understanding of what you are actually interfacing with in order to make the decision.

 

Take textures in OpenGL since that's the example you used.

 

Textures aren't global state; a texture manager isn't global state and the current texture binding isn't global state.  Texture objects are per-context state, may or may not be shared by multiple contexts depending on whether the appropriate platform-specific call (e.g wglShareLists) was made, only last as long as the context lasts, and there may be more than one context.

 

As you can see you're already in a position where you thought you'd only need one texture manager but instead you may actually need more than one.  I find that's a good general principle to keep in mind everytime you think "I know, I'll use a singleton": if you have one of something there's a good chance that having more than one is, if not required, at least possible.  So begin by designing around having more than one and you save yourself a bunch of future pain.

 

So it's no longer TextureManager->GenTexture; it's Context->TextureManager->GenTexture and because you may have more than one context, Context shouldn't be a singleton either.

 

On the other hand, if it is legitimately global state then by all means make it a global or singleton.  This is actually a much cleaner design than having multiple copies of it all of which need to be kept in sync.  Form should follow function.




#5296073 how to solve z-fighting

Posted by mhagain on 11 June 2016 - 08:07 AM

Yeah, co-planar geometry will z-fight... unless they share exactly the same vertices data and run the same vertex shader.


My experience is that you also need the very same input assembler setup, as well as the same matrices. For example:
 

out.Position = mul (m1, mul (m2, in.Position));
out.Position = mul (m3, in.Position);


Assume that m3 = m1 * m2. This will probably z-fight.
 

device->DrawPrimitive (D3DPT_TRIANGLEFAN, ....);
device->DrawIndexedPrimitive (D3DPT_TRIANGLELIST, ....);

 
Assume that otherwise everything is identical and these are drawing the same geometry.  This will also z-fight on certain hardware (*cough* Intel *cough*).

 

So to slightly contradict my initial statement; it's also important to ensure that one is not in a situation that actually causes z-fighting when otherwise there would be none.




#5296059 how to solve z-fighting

Posted by mhagain on 11 June 2016 - 05:14 AM

The best way to fix z-fighting is to move your geometry so that it doesn't z-fight.

 

Depending on which API you're using, there may be features available that can help to hide the effects of z-fighting.  In OpenGL it would be polygon offset, in Direct3D it would be depth bias.  But these are just a collection of hacks and tricks and may have unwanted side-effects.  OpenGL's polygon offset is particularly bad because the specification allows it to be implementation-dependent.  They also may not interact too well with a common non-linear depth buffer.

 

So the only real "fix" is to actually get it right to begin with.




#5295955 Shader performances

Posted by mhagain on 10 June 2016 - 07:33 AM

I remember OpenGL GDC lecture about zero driver over head, swapping shader was very expensive.
so I'm still trying to figure why the majority picks option 2.

The problem is: "very expensive compared to what?"

 

If you're swapping shaders to avoid a couple of uniform changes and maybe a texture change, then for sure it's going to be more expensive.  If on the other hand the amount of state change exceeds the cost of the shader swap, then it goes the other way.  That's why I remarked that "everybody's workload is different" in my earlier answer, and why a general guideline can't really be given for this kind of question.




#5295926 About different OpenGL Versions

Posted by mhagain on 10 June 2016 - 02:26 AM

Your 'don't use OpenGL' logic is because you have to 'pick a version' and then hope the user has updated drivers?
Honestly... if you can't pick a version number then you probably shouldn't be doing this job to start with... and Vulkan also requires updated and recent drivers to function... so what the hell?


It's worth mentioning that this tactic even applies to OpenGL 1.1 - first of all, by selecting 1.1 one has already picked a version, secondly one then has to hope that the user has OpenGL drivers installed at all.  The days of OEM drivers that didn't even include OpenGL are not that far behind us.




#5295925 Shader performances

Posted by mhagain on 10 June 2016 - 02:20 AM

Depends.

 

Everybody's workload is different, and what kind of hardware you're targetting is a huge factor.  Also don't forget that in many cases it can be beneficial to lose a certain amount of performance in exchange for cleaner, more maintainable code - when you come to add a feature in 6 montts time you'll be happy you did.

 

If it was me I'd build option 1 first.  The case you cite - sampling a 1x1 texture - should be accelerated because the result will be cached.  Option 1 would also allow for better draw call batching (which will give a performance boost on it's own).  You can then profile your program to determine if there are any performance-critical paths which it might make more sense to split out into their own shaders.




#5295806 About different OpenGL Versions

Posted by mhagain on 09 June 2016 - 09:44 AM

But OpenGL (and D3D11) still have their place and allow you to get a lot done without having to worry about basically writing a graphics driver and all the bullshit that goes with it.

Exactly this.  There will always be a need for some kind of API that exists at a relatively high level of abstraction and allows people to just make API calls without having to worry about the finer details or hardware-level ugliness that Vulkan and D3D12 provide.  Today that need is filled by OpenGL and D3D11.  In the future it might be a software wrapper around Volkan or D3D12, and when that future happens we can cheerfully forget about today's high level APIs.  But that future hasn't happened yet.




#5295686 When does a vertex buffer live on the card vs system memory?

Posted by mhagain on 08 June 2016 - 03:16 PM

thank you Jesse, if objects don't change their shape that often but are merely transformed via a matrix by the shader would that be better served with system memory? It seems like objects whose vertices change a lot would be better in GPU memory.

 

john

 

It's actually the other way around.  Jesse will explain better than I could, but the idea is that the vertices have to get to the GPU anyway in order to be transformed and pass through to the next step of the pipeline.  If the vertices don't change it's better to have them in GPU memory already.






PARTNERS