Jump to content

  • Log In with Google      Sign In   
  • Create Account

Matias Goldberg

Member Since 02 Jul 2006
Offline Last Active Today, 12:31 PM

#5274589 Any alternatives to automatic class instantiation via macro?

Posted by Matias Goldberg on Yesterday, 09:04 PM

I agree with everyone... on desktop.


Unfortunately Android and iOS came to crash the party where there is no main, and the former enters into Native Code via a Java loader that loads an so library with a set of arbitrary-named JNI function calls, and the latter enters the system by overloading AppDelegate.


Considering these two bastards if they need to be supported, the macro idea looks suddenly more appealing; although I personally still prefer letting the user write these JNI loaders or iOS AppDelegates himself, instead of trying to do it for him (specially when the user needs to release resources or be notified of low memory conditions).

If a macro tries to do it for the user, when something goes wrong there's always that weird feeling that it's the macro's overloaded method fault (i.e. "I bet main system isn't informing me of low memory conditions even though the app is receiving them")

#5274581 Why are there no AAA games targeted towards the young adult audience?

Posted by Matias Goldberg on Yesterday, 08:01 PM

This theory also seems to apply towards why there aren't many games that cover political or sociological themes.

The first Assassin's Creed games were strongly loaded with political and sociological themes.
I still remember fondly the long discussions about politics, religion, morality and ethics between Altair and Al Mualim (even though I met a lot of people who disliked those moments... "boring" they said).

The second game is about a teenager seeking revenge for the unjust sentence to death of half of his family (quite common in that era), involving real world events like Lorenzo Di Medici's attempt of murder, the Pazzi conspiracy, the speculation of poisoning of the Doge of Venice Giovanni Mocenigo, the Borgia's family drama, and well... someone summarized it for me. It also covers topics like thievery, extreme poverty, and prostitution.

Some people may have played AC II as just a dude that kills people with cutscenes inbetween; but it's actually strongly charged with a lot of content if you pay attention to the story.

#5274389 Why are there no AAA games targeted towards the young adult audience?

Posted by Matias Goldberg on 04 February 2016 - 09:36 PM

According to Wikipedia, young adult is between 14-20 years old.


I was under the impression most games target that audience already.

Also according to Wikipedia, YA literature often treats topics such as depression, drug & alcohol abuse, identity, sexuality, familial struggles and bullying.


Perhaps you meant to ask why aren't there more games covering these topics. Which is a very different type of question. If that's the case, beware the target market is mostly the same as current games, so they would be against a lot of strong, established competition.

#5274064 [Debate] Using namespace should be avoided ?

Posted by Matias Goldberg on 03 February 2016 - 10:19 AM

using namespace has little to no way of being disabled once it is declared. Which is why at header level can be a PITA.

At .cpp file level it sounds more sane. But if you try "Unity builds" to speed up compilation, using namespace at .cpp file level comes back to bite you. Which makes "using namespace" more friendly at an enclosed scope, e.g.:

void myFunc( int a )
     using namespace std; //Only available at myFunc level.


Typing std is not a big deal, so I try to avoid it as much as possible. Furthermore it "using" pollutes autocomplete.


There are legitimate cases where it's appropriate, but use with discretion, with care.

#5273997 Multi-threaded deferred setup

Posted by Matias Goldberg on 02 February 2016 - 09:54 PM

For "read once" (ie, not read again on the next frame) dynamic data such as constants it's not worth copying it over to the GPU. Just leave the data in the UPLOAD heap and read it from there.

Actually on GCN performing a Copy via a Copy queue allows GCN to start copying the data from bus to the GPU using its DMA engines while it does other work (like rendering the current frame); which might result in higher overall performance (particularly if bound by the bus or latency is an issue).


However it hurts all other GPUs which don't have a DMA Engine (particularly Intel integrated GPUs and AMD APUs which don't need this transfer at all and takes away precious bandwidth)

#5273783 Shader Permutations

Posted by Matias Goldberg on 01 February 2016 - 09:37 PM

You may be interested in how we tackled it in Ogre 2.1 with the Hlms (see section 8 HLMS).

Basically, 64 bits will soon look like not enough flags to handle all the permutations. But like Hodgman said, many of these options are mutually exclusive; or most of the combinations aren't used.


The solution we went for was, at creation time, to create a 32-bit hash to the shader based on all the options (which are stored in an array), and store this hash in the Renderable.

Then at render time we pull the right shader from the cache using the "final hash". The final hash is produced by merging the Renderable's one with the Pass hash. A pass hash contains all settings that are common to all Renderables and may change per pass (i.e. during the shadow map pass vs another receiver pass vs extra pass that doesn't use shadow mapping for performance reasons).

You only need to access the cache when the hash between the previous and next Renderable changes, which why it is a good idea to sort your Renderables first.


Source 2 slides suggest a similar thing to map their PSOs (see slide 13-23; PPT for animated version).


While a 64-bit permutation mask works well for simple to medium complex scenes, it will eventually fall short; specially if you need to adapt to very dynamic scenarios or have lots of content. However implementing a 64-bit permutation mask is a good exercise to get a good idea of the pros and cons of managing shaders.

#5273776 Cost of Switching Shaders

Posted by Matias Goldberg on 01 February 2016 - 09:19 PM

On the CPU side, the "root signature" is changed, which means that (on pre-D3D12 APIs), all the resource bindings must be re-sent to the GPU. The driver/runtime also might have to resubmit a bunch of pipeline state, and even validate that the PS / VS are compatible, etc (and possibly patch them if they mis-match, or patch the VS if it mis-matches with the IA config).... The driver might also have to do things like patch the PS if it doesn't match the current render-target format... sad.png

Since you're describing pre-DX12 problems; I shall add that most state changes (particularly shader changes) meant the driver would delay all validation and updates (basically any actual work) until the next DrawPrimitive call. Since it's only then where the driver has already all the information it needs i.e. it needs the IA layout & vertex buffer bindings to patch vertex shaders, it needs the RTT format and multisample settings to patch the pixel shader, etc.

Then it would have to internally create a cache of all the IA Layouts / RTT / shader combinations and pull the ISA assembly code from the cache the next time it is needed.

Mantle said screw it, and came up Pipeline State Objects to condense all that huge information any GPU could possibly need to run generate the ISA from shaders into one huge blob, moving the overhead from DrawPrimitive time (which happens every frame) to PSO creation time (which happens once).

#5273561 request HLSL support for sqrt() for integers

Posted by Matias Goldberg on 31 January 2016 - 07:21 PM

If we're using the usual definition of "determinism" to mean a system that doesn't produce random results (ie, same input + same set of operations = same output. Every time.) then I fail to see how any of the normal operations on a GPU can be classified as non-deterministic.
Now, if you're talking about things that are sensitive to timing (like Atomic operations, UAV writes) then you can get some non-determinism, but only by virtue of having started operating on a shared resource with many threads. This is the same non-determinism you'd get on any architecture, CPUs included.

For two different machines to produce the same output (GPU speaking), they must follow these rules:

  1. Exact same GPU chip (not even different revisions).
  2. Same drivers (to generate the same ISA).
  3. Same version of HLSL compiler (if compiling from source).

Otherwise the result will not be deterministic across machines. This is very different from x86/x64 and ARM CPUs where the same assembly with the same input will result in the same output even across different Intel & AMD chips, as long as you stay away from some transcendental FPU functions (like acos), some non-determinstic instructions (RCPPS & RSQRTPS) and ignoring certain models with HW bugs (e.g. FDIV bug)

#5273541 request HLSL support for sqrt() for integers

Posted by Matias Goldberg on 31 January 2016 - 05:48 PM

Floating point operations surely are not deterministic on GPU, but I'm pretty sure that casting an int to a float, then sqrt(), then cast back to int (truncate, floor, ceil) will result in deterministic results.

#5273533 Omnidirectional shadow mapping

Posted by Matias Goldberg on 31 January 2016 - 04:32 PM

It's good because it doesn't use much memory, the disadvantage is it eats fillrate for breakfast even if you do some kind of clever stencil + light bounding volume based optimization.

Another disadvantage is that involves a lot of SetRenderTarget calls which are relatively expensive CPU side. (Normally for N lights you would need N+1 SetRenderTarget calls; but with this method you need N*2 calls. Though you can amortize if you work on 2 cubemaps at once)

#5273015 D3d12 : d24_x8 format to rgba8?

Posted by Matias Goldberg on 28 January 2016 - 10:57 AM

Yes they mentionned it on some twitter account, but then does GCN store 24 bits depth value as 32 bits if a 24 bits depth texture is requested ?
Since there is no performance bandwidth advantage since 24 bits needs to be stored in a 32 bits location and 8 bits are wasted the driver might as well promote d24x8 to d32 + r8 ?

No, they store it as 24-bit fixed point with 8 bits unused. It only uses 32 bits if you request a floating point depth buffer, and they can't promote from fixed point -> floating point since the distribution of precision is different.

Pretty much this. They cannot promote it for you since the behavior is very different. They must honour 24-bit integer precision.

As for the bandwidth, this is why AMD recommends that if you never use the stencil, don't ask for a depth buffer with stencil capabilities.

#5272800 D3d12 : d24_x8 format to rgba8?

Posted by Matias Goldberg on 26 January 2016 - 11:53 PM

IIRC AMD GCN always stores the stencil and depth separately, so this hack won't work there.

#5272515 transition barrier strictness

Posted by Matias Goldberg on 24 January 2016 - 04:25 PM

D3D12 is explicitly targetted at "expert" graphic programmers with already a background with GPU hardware and modern APIs.
This is why D3D11 is not going away and still being updated (i.e. D3D11.3).
Note I'm not calling you a rookie. I'm just saying this what to expect from D3D12.

If the debug runtime doesn't complain, does it just not matter?

No, it just means the debug layer didn't catch it. Hopefully it will improve with time.

#5271406 Understanding cross product without delving too much on Linear algebra

Posted by Matias Goldberg on 16 January 2016 - 09:13 AM

I you have Firefox (won't work in Chrome because no NPAPI) and the Java plugin (and most security stuff disabled), you can run the interactive demo:



IMHO that's the best tutorial I ever found on cross products.

#5271338 VSSM

Posted by Matias Goldberg on 15 January 2016 - 03:05 PM

I have no idea either. VSM is the popular technique, VSSM is not popular. You can try guessing from the rest of the steps.

The link you gave is just an abstract preview. It appears the actual paper is here which may provide better insight.