Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 14 Feb 2007
Online Last Active Today, 02:12 AM

#5199839 Problem with tangents (fbx)

Posted by Hodgman on 24 December 2014 - 06:21 AM

If you're using a normal map that was generated by Maya, then you have to use Maya's tangent/binormal data (or tangent/handedness data) -- you can't generate your own tangents and still correctly decode that normal map (unless you use the exact same algorithm that Maya does internally).


To deal with the fact that handedness may differ (i.e. either binormal = cross(normal,tangent) or binormal = -cross(normal,tangent)) then you either need to store the Maya's binormal per vertex, or store an extra bit per vertex indicating which binormal to use.

Often I've seen this implemented by changing the tex-coord attribute from a float2 to a float3, and storing either +1 or -1 in the z value. You then generate the binormal with cross(normal,tangent) * texcoord.z.

When you're building your model file, for each vertex, you can compare Maya's binormal against those two possible values to determine if you should be storing +1 or -1.

#5199828 Is this correct sRGB conversion?

Posted by Hodgman on 24 December 2014 - 04:02 AM

Yep, 0.25882srgb is one quarter as bright as 0.5srgb, so, you're doing everything correctly.
Have a look (squint) at these images and see approximately what the gamma response of your monitor is.
If it's sRGB calibrated, you should see the equal intensity band at about 2.2 gamma.
If you perceive the dashed and solid bars to be of equal intensity at some other value, then your monitor isn't correctly calibrated, so 'correct' sRGB outputs will look wrong to you.
To support such crummy monitors, then give the user an option to replace the "final image; to sRGB again" step with a simple pow function, where the exponent is the value you got from squinting at that chart.

I tried recreating the scene in Unity and applied linear lighting calculations and I got [0.125, 0, 0, 0]

Sounds like Unity isn't implementing gamma-correct lighting then... Shame on them.



Personally, I use a ColorMunki to ensure all our development monitors are close to being calibrated correctly (along with the squint-test image above), which means that all of our input colour textures will contain sRGB data (as they were authored on an sRGB monitor).

Then, most end-users unfortunately will have perverse settings applied to their own monitors (completely destroying the point of colour standards...) so I default to sRGB output, but also give the user a slider if they want to increase/decrease a gamma value (which results in a simple pow-based final gamma encoding, instead of hardware sRGB encoding).



p.s. trying to tweak lighting in an 8-bit pipeline (whether gamma correct or not) is incredibly hard. IMHO everyone should be doing their lighting in 16bit linear these days, and then tone-mapping to 8-bit sRGB/gamma. Without a tonemapper, your ambient light is just 0.25 units(... of light..? 0.25 lights?), but with a tone-mapper you can declare what those units are, and how those units are exposed / converted to colours, as in photography.

#5199697 Problem with tangents (fbx)

Posted by Hodgman on 23 December 2014 - 06:59 AM

What kind of problem are you actually having?

There is no one right way to generate tangents - any method that produces a vector that's perpendicular to the normal is valid.
What is vitally important is that both the art tool that's generating your normal map, and your game, are both using the exact same normals/tangents/bitangents. Otherwise, if the normal-map tool and the game have different vectors, there's absolutely no way to correctly decode the normal map..

#5199616 Thoughts on Rust?

Posted by Hodgman on 22 December 2014 - 06:13 PM

I think it's 5-10 years out before Rust has even a shot of becoming mainstream. A large part of this is inertia. In games, for instance, we need not only ports/wrappers of open source libraryes but also for things like... . This is the same problem faced by Go, D, and all the other C++ killers...

For mainstream games use, we also need compilers to exist for esoteric CPU's under closed platforms, where only other game-devs are allowed to tread - locking out typical open source contributors and requiring a decent amount of demand to exist if a commercial provider is to step in. Chicken and egg ensues, where we can't use it because there's no compiler, and there's no compiler because no one is using it.
You need a PC devs to invest heavily and then transition to mainstream games, forcing them to develop/port the compilers themselves ;D

#5199361 What is faster cbuffer or data textures for MVP matrices?

Posted by Hodgman on 20 December 2014 - 10:09 PM

In D3D11, you don't have to use VTF; you can use a regular Buffer object (like for vertices) and bind it to a texture slot (tbuffer type in the HLSL code).

If using instancing, you can't use multiple cbuffers (as all resources have to be the same for every instance), so tbuffers are the obvious choice.
If not using instancing, the overhead of updating a cbuffers probably dwarfs the cost of copying 2 to 48 bytes, so I don't imagine that updating cbuffers with indices instead of matrices will be a useful optimization.

With instancing, you could also put the matrix array in a cbuffer rather than a tbuffer, but I would guess this will be non-optimal.
Cbuffers are optimised assuming that every pixel/vertex will require every bit of data in the cbuffer, and older cards may not support array indexing (= may implement it as a huge nested if/elseif chain...).
Tbuffers and textures are optimised assuming that different vertices/pixels will need different subsets of the data, but that there may be some spatial locality that a cache would help with. They're implemented using the fetch hardware, so you know that array indexing will work, but also that it will be performing an actual (cached) memory request (whereas perhaps cbuffer data may have been pre-fetched into registers - wasting a huge number of them).

Lastly, you can put the matrix data into a vertex buffer and bind it to an Input Assembler slot, where the vertex layout / vertex declaration is responsible for defining how it is fetched for the vertex shader.
In D3D9, this is probably the best approach, as VTF was either slow and limited, or entirely unsupported back then. In D3D11, it's probably faster to define your own tbuffer and do the fetch yourself using SV_InsanceID.

#5198870 Occlusion Culling - Combine OctTree with CHC algorithm?

Posted by Hodgman on 17 December 2014 - 07:11 PM

What would be some good alternatives, preferably with being able to keep the OctTree, for dynamic scenes?

That's a good question!  I just wanted to point out the inevitable GPU sync points involved in CPU-read-back techniques.


Obviously the best performance will be with precomputed techniques and static scenes...


For dynamic scenes, many engines are using depth occlusion queries, but entirely on the CPU-side to avoid the GPU-sync issues.

This generally requires your artists to generate very low-poly LOD's of your environment / occluders, and ensure that these LODs are entirely contained within the high-poly visual versions. You then rasterize them at low resolution on the CPU to a floating point depth buffer. To test objects for visibility, you find their 2D bounding box (or bounding ellipse) and test all the floats in that region to see if the bounding volume is covered or not.


At the moment, I'm using a 1bpp software coverage rasterizer, as mentioned here.


Another technique that I've seen used in AAA games is simple "occlusion planes". The level designers manually place large rectangles around the levels (usually inside walls of buildings / etc), which represent occluders. The game then frustum culls these rectangles and then sorts them by screen-space size and selects the biggest N number of them. Those N rectangles are then extruded into occluder frustums, and every visible object is tested to see if it's entirely inside any of those occluder frustums.


visibleObjects = allObjects.Where( x => EntirelyOutsideFrustum(camera, x) == false );
visibleOccluders = allOccluders.Where( x => EntirelyOutsideFrustum(camera, x) == false );
bestOccluders = visibleOccluders.Sort( x => x.ScreenArea() ).Trim(0,10);
occluderFrustra = bestOccluders.Select( x => x.ExtrudeQuadToFrustum(cameraPos, x) );
reallyVisibleObjects = visibleObjects.Where( x => EntirelyInsideFrustum(occluderFrustra.ForEach(), x) == false );

You'd be suprised at how fast modern CPUs can burn through these kind of repeated, simple, brute-force checks... Sometimes simple flat lists will even out-perform complex structures like trees, due to how ridiculously slow random memory access patterns are compared to predictable linear patterns.


Other games use a combination of portals and occlusion planes.

#5198695 How much do I pay someone for coming up with some ideas for my game?

Posted by Hodgman on 16 December 2014 - 09:48 PM

Add up how many hours both you and him have put into it and show those numbers to him.

Either agree to a royalty share based on that, or a fixed price, e.g. $20 x his hours.
In either case you need a lawyer to draft the agreement.

If negotiations fail, you're not legally obliged to pay him anything.... Game ideas aren't really subject to copyright - implementations of ideas are.

#5198691 declare custom types of ints?

Posted by Hodgman on 16 December 2014 - 09:06 PM

yes, but if both parameters are of type eiTYPEDEF_INT, the compiler won't catch it if they are accidentally reversed will it? IE if i accidentally passed ani # as model #, and model # as ani #.

The whole point of those macros is to solve that problem for you - otherwise you'd just use the regular typedef keyword.
If you use:
Then passing an AnimID as a ModelID will result in a compiler error, saying can't convert AnimID to ModelID.
The trick is that you end up with two completely different instatiations of the PrimitiveWrap template -- one using 'tag_ModelID' as an argument and one using 'tag_AnimID' as an argument. Those 'tag' types are just dummy structures with no use at all, except to trick C++ into cloning the PrimitiveWrap template into a new unique type. 
struct tag_ModelID;//useless structs, just to create a unique type
struct tag_AnimID;

typedef PrimitiveWrap<int, tag_ModelID> ModelID;//the useless structs are used as a template argument
typedef PrimitiveWrap<int, tag_AnimID>  AnimID; // so that the two resulting types are different

void PlayAnimation(ModelID some_model, AnimID some_ani);
AnimID a = AnimID(2);
ModelID m = ModelID(1);
PlayAnimation(m, a);//OK!
PlayAnimation(a, m);//error - can't convert arg#1 from PrimitiveWrap<int,tag_AnimID> to PrimitiveWrap<int,tag_ModelID>
PlayAnimation(1, 2);//error - can't implicitly convert arg#1 from int to PrimitiveWrap<int,tag_ModelID>
PlayAnimation(AnimID(1), ModelID(2));//error - can't convert arg#1 from PrimitiveWrap<int,tag_AnimID> to PrimitiveWrap<int,tag_ModelID>
PlayAnimation(ModelID(1), AnimID(2));//OK!

int id = m;//OK -- n.b. you CAN convert from ID's back into integers implicitly
ModelID m2 = id;//ERROR -- but you can't implicitly convert integers into IDs!
ModelID m3 = ModelId(id);//OK -- you have to explicitly convert them like this

#5198653 declare custom types of ints?

Posted by Hodgman on 16 December 2014 - 05:54 PM

I use this code (and helper macros) to declare custom type-safe integer and pointer types.
template<class T,class Name>
struct PrimitiveWrap
	PrimitiveWrap() {}
	explicit PrimitiveWrap( T v ) : value(v) {}
	operator const T&() const { return value; }
	operator       T&()       { return value; }
	T value;

# define eiTYPEDEF_INT( name )					\
	struct tag_##name;					\
	typedef PrimitiveWrap<int,tag_##name> name;		//

# define eiTYPEDEF_PTR( name )					\
	struct tag_##name;					\
	typedef tag_##name* name;				//
eiTYPEDEF_INT( ModelId ); // ModelId is a typedef for int
eiTYPEDEF_PTR( AnimationId ); // AnimationId is a typedef for void*

void Play( ModelId m, AnimationId a )
  int modelId = (int)m;
  void* animPtr = (void*)a;

Animation anim;
AnimationId a = AnimationId(&anim);
ModelId m = ModelId(42);
Play(m, a);
The final asm should be the same as if you were using ints/void*'s, but you get to use C++'s compile-time type-safety system.

#5198508 Current-Gen Lighting

Posted by Hodgman on 16 December 2014 - 05:46 AM

but I wasn't sure if even current consoles were capable of that yet.

Even PS3/360/mobile games do PBR  these days... just with more approximations.


Use a nice BRDF (start with cook torrence / normalized blinn-phong), use IBL for ambient (pre-convolved with an approximation of your BRDF, so that you end up with ambient-diffuse and ambient-specular), use gamma-decoding on inputs (sRGB->linear when reading colour textures), render to a high-precision target (Float16, etc) and tone-map it to gamma-encoded 8bit (do linear->sRGB / linear->Gamma as the last step in tone-mapping).


Ideally you'll do Bloom/DOF/motion-blur before tone-mapping, but on older hardware you might do it after (to get better performance, but with worse quality).

#5198292 GPU skinned object picking

Posted by Hodgman on 15 December 2014 - 05:57 AM

You almost never see triangle based picking on skinned meshes (for pixel perfect accuracy), except maybe in editor apps.
Generally you'll make a bounding box for each bone, then transform the boxes on the CPU and trace against them.

#5197762 Occlusion Culling - Combine OctTree with CHC algorithm?

Posted by Hodgman on 12 December 2014 - 07:28 AM

The CHC++ algorithm (Which builds on CHC) takes care of both of these problems.
There’s no synchronization between the CPU and GPU, unfinished queries are simply queued and used in the next frame.
There’s no “popping” either, can’t quite recall how that’s solved in the algorithm, but they have that covered.

From the paper, notice they say it reduces sync points, not eliminates sync points. In the one chart where they compare with the 'OPT' dataset, their average case is very close as they claim, but their worst case is 2x the frametime / half the framerate, probably due to a sync.

If an object has been invisible but is now visible this frame, you either have to conservatively draw it anyway, or sync (or pop artefacts). So in worst-case situations, if you don't pop and don't sync, you can't cull either...

#5197742 Occlusion Culling - Combine OctTree with CHC algorithm?

Posted by Hodgman on 12 December 2014 - 02:59 AM

Before you go too far down that path, you might want to evaluate alternatives to CHC+...
Using GPU occlusion queries for CPU occlusion culling means that either - you suffer a one-frame delay in culling results, leading to horrible popping artefacts, or you have your framerate by introducing CPU/GPU sync points. IMHO, it's simply not a feasible approach.

#5197481 Physical Based Models

Posted by Hodgman on 10 December 2014 - 06:24 PM

Yeah I'm with promit - even if two different engines/art tools both use "PBR", they probably store their specular/roughnews/etc maps in completely different ways...

The best solution would be if the art files used only mask textures and material names - e.g. a greyacale map specifying which bits are clean copper, another map for "worn blue paint", etc, etc... You could then write/use a tool to convert those masks into textures that are correct for your engine.

It's always been an issue that games made up of "asset packs" will have no consistency unless retouched by an art team... PBR had just exaggerated this existing fact.

#5197142 is there a name for this type of data structure?

Posted by Hodgman on 09 December 2014 - 06:40 AM

The "interface with Update/Render methods" pattern is often called a "game object" or a "game entity".

These days I consider it an anti-pattern...

Sure you could implement any bit of software using it - though non "realtime"/interactive software would be a bad fit..
However, by choosing to put everything into a single "entity" list you're making some serious trade-offs! Sure, your main loops becomes incredibly simple - for each entity, Update; for each entity Render... But in exchange you're giving up the ability to have any high level knowledge of your application's data (and data-flows), and you're giving up all knowledge of your program's flow-of-control. The order that things happen is left up to the chance of your entity list ordering...

You end up with silly bugs, like sometimes your homing missiles are targeting a position that lags by one frame, because you're unable to ensure that all movement logic has completed before executing targeting logic... You become cursed to watch, helplessly, as your co-workers expand the interface to include PreUpdate, PostUpdate... PostPostUpdate... to work around such bugs :lol:

When considering the potential 4x boosts of a quad-core CPU, you're left scratching your head, because any of those Update calls could be touching any bit of the game-state, leading you to consider abhorrent ideas like per-object locks and even more non-deterministic update orders...

This pattern is single-handedly responsible for the oft-repeated myth that "games are not a good fit for multithreading" or "games are hard to mutithread"... And on that note - Last-gen consoles made tripple-core CPUs and NUMA co-processing standard requirements for all games. Current-gen consoles are making hex-core CPUs standard. Games are now leading the way in making consumer software take advantage of parallel processors.

IMHO, having a larger number of smaller, more homogeneous lists of self contained objects (or 'systems' of objects), combined with a more complex main loop in which the flow of data and control is explicit and easy to follow, is a far, far superior approach.
It's easier to keep everything sensibly deterministic, easier to understand and maintain, easier to optimize, easier to spread over multiple CPU cores...
Hence all the rage about 'component systems', 'data oriented design', and multithreaded game engines in recent times.