Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 14 Feb 2007
Offline Last Active Today, 07:35 AM

#5211144 Assimp for animations and commercial use

Posted by on 17 February 2015 - 04:42 AM

Typically you use a library like Assimp in your engine's tool-chain, to convert from interchange formats (stuff exported from your Artist's tools, which is designed to be flexible enough to be read by anything and contain every feature) into your engine's own model format, which is designed to be compact, load quickly, and support only the features required by your game.


So, both. Use Assimp to read from these interchange/export formats, and then write your own library that (one-way) converts data read by Assimp into your own format.

#5211075 Exclusive maximum in random functions

Posted by on 16 February 2015 - 05:28 PM

The 3 main conventions for ranges are: Min/Max, Begin/End, Offset/Size.

Where they're all equivalent:
Size = End-Begin = Max-Min+1
Min = Begin = Offset
End = Max+1 = Offset+Size

The 3rd one hasn't been mentioned much.
For dice, offset/size might be useful - even hard-coding offset to 1, and passing NumFaces as Size, etc...

IMHO, Offset/Size is more closely related to Begin/End than to Min/Max, as the latter involves adding/subtracting one.
In my experience (outside of RNG), Offset/Size is a common convention in C code, and Begin/End is common is C++ code. Min/Max isn't that common because of the reasons posted earlier, referencing the computer scientists of legend...
In the end and convention is ok as long as you remain consistent, but "when in Rome, do as the Romans do", so if coding in C or C++, I'd stick with the prevailing conventions, above.

As for the "you can't represent the full range of an int32" argument -- someone already pointed out above that the generator should be distinct from the thing that gives you distributions.
The former component should only need to know how many bits of randomness to generate. The latter component then adds on logic for ranges, uniform distributions, he'll curves, etc...
If you know that you need to select 1 from exactly 2^32 choices, you can ask the low level generator to give you 32 bits of randomness.

#5210898 OpenGL framebuffer management

Posted by on 15 February 2015 - 05:29 PM

If you read *pointer++, then it reads as "dereference this pointer and increment it". People read from left to right1, and so what you read is just what happens, albeit somewhat contorted (the increment actually happens first because of precedence, but the old value is retained for the dereference).

And as I just had to fix a related bug on friday... "dereference this pointer and increment it" is not correct, as a copy is dereferenced. When does this become relevant? Whenever you have to deal with InputIterators. Because incrementing an InputIterator may invalidate any other copies of the iterator. Including those returned by postfix operator ++. So a=*it++; will not behave the same as a=*it; ++it;

The morale of the story? Don't use postfix ++
That's an insidious bit of code!!

However, a C or Java programmer might say that the morale of the story is that you shouldn't use operator overloading :D After all, that seems like the perfect example of code that a C programmer thinks is straightforward, but is actually full of treacherous C++ bugs :lol:
I'd almost expect to see it on a "C++ is evil" site.

[edit] Wait, your example doesn't smell right. Why would the input iterator return an invalid iterator from the increment operator, when it has the power to return void instead? That's just a terrible API design choice!

And actually, C++'s InputIterator concept *does* actually make the guarantee that:
Is equivalent to:
value_type x = *i;
return x;

i.e. If an iterator doesn't guarantee that this works, then it's not an InputIterator.
Sounds like your bug is due to a bad library, not the fault of the guy who wrote that loop!

#5210887 Can't fix shadow acne with bias

Posted by on 15 February 2015 - 04:06 PM

What's happening with the bottom left shadow? It appears that the light green block is hovering a tiny bit above the ground - is that the case? Otherwise I'm not sure how you've got that strange shadow effect where the shadow doesn't line up on the vertical castee against the flat castee

That's a known flaw with the normal-offset trick. Discontinuities in the surface normal leads to discontinuities in the shadow results. Smooth geometry suffers less.
In the original presentation, he "solves" it (covers it up) by making the PCF blur radius larger than the artefact's size.

A question regarding normal offset shadows: Do you offset along geometry normals or the final normals of the pixel (as a result of geometry normals + normal map + decals etc.)? If it's the former, what do you do with deferred shading, since typically geometry normals aren't available at that stage?

If you're just doing it for one light (e.g. The sun), you could use 'deferred shadows' and render the scene geometry itself a second time.

For the general case, you can add geometry normals to your Gbuffer as well. On a recent game, we already had normal-mapped normals AND plain vertex normals in our Gbuffer for use by other tricks (SSS), so when we decided to add normal-offset to our shadow code we already had that data :D

Lastly, you can use ddx/ddy of depth to generate a per-triangle normal within your lighting pixel shaders, however, this produces errors at the silhouettes of objects. Maybe you could compare this generated normal against the normal mapped one to detect these errors, and if the difference/error is large, just use the normal/mapped normal instead (just guessing!)

#5210764 Weird behavior when getting data from structured buffer

Posted by on 14 February 2015 - 07:20 PM

ZeroMemory( &sd, sizeof(D3D11_SOME_DESC) );

Sorry for bikeshedding :D :lol: ...
You can also just use standard C++ struct initialization syntax to default-initialize a descriptor -
D3D11_SOME_DESC sd = {};

IIRC, the ZeroMemory macro (which is just a MS wrapper for memset) was recommended back when MS compilers weren't C++03 compliant. Modern MS style guides seem to just use “={}" these days.
The advantage of the standard method is the compiler can optimize out the initialization to zero where it's not needed (i.e. where you later assign a non-zero value yourself), whereas the ZeroMemory/memset will never be optimised away.

...Not that micro-optimizations are a big concern in resource creation code... :lol:

#5210757 A more data-oriented tree structure

Posted by on 14 February 2015 - 06:15 PM

Dosn't each node having a std::vector of indices trash the cache? Since each node has to lookup a new memory location and load it and adjacent (unused) memory into the cache? But is there any way around it?

My intention was that you wouldn't often use this vector.
Most of the time you'd be doing for each node in scene - operating linearly on every node, instead of hierarchically.

BTW, depending on what kind of processes you need to perform on your data set, it might be simpler to have each node have a single handle to its parent, rather than a collection of handles to their children.

On that note, you can derive a topographical sorting key by counting how many times you have to recursively follow the parent handle before reaching the root node (counting how deep in the hierarchy the node is). If you sort the linear list of nodes by this sorting key, it can make certain operations easier by guaranteeing an update order where parent nodes are always processed before their children.

For doing things like transform hierarchies (where you concatenate together each node's local matrix to get a global/world matrix) you can then just linearly iterate over the scene's vector of all nodes, using the parent handle once per iteration (not recursively).

If trying to exploit multithreading, it can make sense to have one of these flat/linear lists per model (or some other granularity) rather than for the whole scene, so that each worker thread can be given an isolated block of data to update.

#5210678 A more data-oriented tree structure

Posted by on 14 February 2015 - 08:47 AM

So at the moment you iterate through models, then root nodes, then recursively through child nodes?

Instead of having the Mesh own the Node as a child, and the Node own a vector<Node> recursively, switch to pointers or indices.

The scene can have a vector<Node> containing all nodes then the model can have "int rootNode", and the node can have "vector<int>" children.
Or replace int with Node* if you'd rather use pointers.

Then if you want to perform a task on all nodes, iterate the scene's linear list rather than jumping around down a recursive hierarchy.

Ideally for culling, you would just be able to iterate through all AABBs and nothing else.

You can either iterate through the nodes (which have an AABB value inside), or alternatively you can store a node index in the AABB -- e.g.

struct NodeBounds


  Vec3 mAABBCenter;
  Vec3 mAABBExtent;
  int nodeIndex;


So then, you could iterate over all NodeBounds objects in one go, producing a new temporary vector containing the indices of visible nodes.

#5210665 noncopyable parent class

Posted by on 14 February 2015 - 05:33 AM

You should only get errors like that if you're actually attempting to copy something that inherits from noncopyable :/

#5210632 Weird struct padding issue - what am I doing wrong?

Posted by on 14 February 2015 - 02:54 AM

Every time you make an assumption in your code, you need an assertion there to prove that the assumption is valid (and also document the assumption).
For the assumptions that your C++ struct matches your GLSL struct, you can use static_assert, offsetof and sizeof.
static_assert( offsetof( CubeInstance, material ) == 48, "bad aligment assumption" );
static_assert( offsetof( CubeInstance, _pad0 ) == 52, "bad aligment assumption" );
static_assert( sizeof( CubeInstance ) == 64, "bad aligment assumption" );
static_assert( sizeof( glm::vec3 ) == 12, "bad GLM assumption" );
static_assert( alignof( glm::vec3 ) == 4, "bad GLM assumption" );

#5210442 Aliens as citizens of the Empire?

Posted by on 13 February 2015 - 03:59 AM

If your empire as a whole had different ideologies as a choice (kinda like a character class system) then this would be cool.

Different idideologies/classes would be offered different options in situations like this.
More prejudiced ideologies would have the option of genocide, slavery, apartheid. More compassionate classes would have other options such as giving up land to the natives (loss of production, but less rebellion), more equal assimilation (which may lead to racial tension if the population itself doesn't share your ideologies), etc...

#5210376 Entity-Component Confusion

Posted by on 12 February 2015 - 05:31 PM

Does Artemis actually say that it is a solution for optimising your game?
The "ECS" phrase originally popped up as a solution for flexibility and empowering game-designers. Only recently have people started making ECS frameworks with an eye to performance optimisation.

Some of the older ECS frameworks I've used had horrible performance, but you put up with it because you wanted to use the other features (which at the time was basically writing OOP from an XML file instead of C++ code :()

It's common to not compact arrays when items are removed, and instead have two extra arrays/lists of indices. One contains the indices of valid array elements so that you can iterate through all the items in the array, the other contains the free elements so that you can allocate new items.
At this point you're basically dealing with a pool with a free-list, not a basic array.

The memory requirements of a game are completely predictable, which means it's quite feasible to use fixed size allocations instead of growable ones.
If that's too hard, a std::vector is still probably a better choice than a std::list though!!

#5210240 Does draw calls number always matter?

Posted by on 12 February 2015 - 04:42 AM

Trying to minimize draw-calls is generally a CPU-side optimization.


Every GL/D3D function call has a cost. Draw functions have the highest cost, as they actually collect all the changed states, bound resources, validate everything, build native GPU commands, push those commands into a command-buffer, and possibly flush that buffer through to the GPU.

If you have too many draw-calls, you can end up in a situation where the CPU's milliseconds-per-frame value is actually higher than the GPU's value, which is rediculous!


Mantle/Metal/GLNext/D3D12 exist to solve this problem, and reduce the CPU cost of draw-calls.



On the GPU side of things, the number of state-changes becomes an issue. The GPU always wants to work on large amounts of data at a time -- thousands of triangles, thousands of pixels, etc...

Ideally, the GPU will actually try to merge multiple successive draw-calls into a single "job"!

Certain state-changes cause the GPU to have to take a small break in-between draw calls to adjust to the new state. The details depend on the GPU -- on some it might be any state change, on others resource bindings might be free, etc... there's some general hints / rules of thumb about what tends to be expensive though...


If a draw-call contains a lot of data (e.g. thousands of pixels), then often this small pauses do not matter, because the GPU can perform the state adjustment in the background while it is still drawing the pixels from the previous draw-call.

However, it becomes a huge problem if your draw-calls do not contain much work. I had a project a few years ago where we had about 100 draw-calls that each only drew about 40 pixels each. We had access to a vendor-specific profiling tool that showed us that each of those draw-calls was costing the same amount of time as one that would've draw 400 pixels (10x more than they should!!), simply because we were changing states in between each draw. We developed the guideline (for that specific GPU) that every draw-call should cover at least 400 pixels in order to avoid the state-change penalty.


On newer GPUs, they can be preparing multiple draw-call's states at the same time, so these penalties only appear when you submit, say, 8 tiny draw calls with different states, in a row.

Still, it's always best practice to try and sort/group your geometry to reduce state-changes to keep the GPU happy... and as a result, you'll probably end up with less D3D/GL function calls on the CPU side, and possibly even less draw-calls for the CPU as well!



One small detail that doesn't happen much in practice -- every command sent by the CPU (state change, draw, etc) must be processed by the GPU command processor (sometimes called a front-end). This bit of hardware decodes the commands and controls the GPU. Usually there's so much work for a GPU to do (e.g. one command might result in thousands or millions of pixels being drawn) that the speed of command processing doesn't matter. Usually if you're generating so many commands that you're bottlenecked by the CP, then you're already going to be bottlenecked by your CPU costs anyway!! However, apparently on the next-gen APIs (e.g. Mantle), the CPU side cost of draw-calls has become so cheap that it's possible for you to become bottlenecked by the GPU's CP. In that situation you'd want to follow the traditional advice of minimizing draw-calls again biggrin.png


[edit #2]

The advice from about 5 years ago was that if you had 2500 draw-calls per frame, then you'd be able to run at 30Hz as long as all you did was render things.

i.e. 2500 draw-calls would take ~33ms of CPU time... which means you've got no time left over to run gameplay or physics or AI!

So back then, you'd usually aim for under 1000 draw-calls, so that you have time left over for the rest of the game, and can still hit 30Hz.

At the moment, D3D11 is much faster than D3D9/GL2 were at that time, plus CPUs are faster, so you can do well above 1000 draws per frame now... but can't go crazy.

On D3D12/Mantle/GLNext and game consoles, it's possible to go as high as 10k or 100k draws per frame.

On mobile devices with GLES though, you're often told to try and stay under 100 draw-calls per frame!

#5210181 Bad practice: GLfloat vs float?

Posted by on 11 February 2015 - 08:29 PM

even though most compilers today will use 32 bits for float

Are there any practical platforms for game development in use today where the built-in float will not be a 32-bit type approximately compatible with IEEE 754?
I may be wrong, but I feel like this is a rather academic distinction (unlike integer sizes, which vary extensively by the compiler/platform).

AFAIK, C/C++ don't require the machine to follow the IEEE float specification.

Yeah that's a very academic / theoretical discussion though, because every CPU we care about does support IEEE floats, so the C/C++ float type == IEEE float.

The one exception might be GPUs, where proper IEEE float support is still quite a new feature (Not that long ago, GPU's supported 32bit float, but not to the strict letter of the spec, with things like NaNs/etc). This isn't really relevant though, because it's not common to write shader code using C/C++ laugh.png

But as you worded it -- these processors are still approximately compatible though!

#5210140 Freelance Game Programmer

Posted by on 11 February 2015 - 04:48 PM

Organizations hire freelance / contract developers essentially to do work they don't want to pay for in house. Sometimes that is because they are fully staffed and have a quick side project. Sometimes that is because an existing project needs some extra hands.

In all cases they want experienced, well-rounded, low-risk contracts.

More simply: Why should I hire you, a beginner with no experience and no track records, when for a relatively small amount more money I can hire someone with a decade of experience who has focused directly on the task I need done and a long track record of success?

There are other cases - there's the people who either through incompetence or greed think that they can get away with paying someone $5/hr - that if a freelancer is not half the cost of an employee, then "what's the point in outsourcing anyway?"...

More than once we've been contacted by a potential new client who needs a lot of work done quickly, but is offering a wage that would insult a Chinese factory worker... And who responds with anger and disbelief when informed of what a quality result will actually cost them

If you're willing to work for cheap, there's a lot of people out there who seem willing to take the risk on cheap workers.

#5210135 Deferred optimization

Posted by on 11 February 2015 - 04:28 PM

@The chubu
Well, storing depth is an option, but would still require part of a buffer. DX9 does not allow using depth as an input without vendor specific hacks from what I've read.

The magic INTZ format works on every vendor for DX10+ hardware. For earlier hardware there's a bunch of other vendor specific formats that are too much hassle...
However, in your gbuffer pass, you can always just write depth to a colour target yourself, like you're doing for position!

@The chubu, @Hodgman I attempted the stenciling with spheres, The stencil setup was done in 2 passes inc/decr, then a third pass where the sphere was rendered while reading the stencil.

In the first two passes, we're you using a NULL pixel shader, with no colour targets bound?
And in the third pass where you say you read the stencil value - do you mean you sampled that value yourself in the shader, or just that you enabled the stencil test? To use the stencil buffer as an optimization, you really need hi-stencil and Hi-Z to be active, which is tricky because D3D/GL don't expose an API for it, requiring you to just use the perfect order of operations such that drivers will keep it enabled :(