Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 14 Feb 2007
Offline Last Active Today, 04:19 AM

#5285580 Input Polling - Low Latency API

Posted by Hodgman on 07 April 2016 - 07:43 AM

But of course, there's also XInput... which, ironically requires you to poll...
Well, for standard windows input messages you also have to poll the message pump :wink: :lol:


As for GetAsyncKeyState being unreliable -- yes, it will miss fast double-(or more)-taps -- but IIRC it actually is stateful such that it won't ever miss a momentary keypress. When a key is pressed, a flag is set internally to true, which isn't cleared until after the next time GetAsyncKeyState is called, meaning quick taps can't be lost. 

It's still a terrible plan though for being partially unreliable - the message pump doesn't miss double-taps :)


I'm pretty sure xinput does the same thing, so that very short/momentary presses can't get lost in between two polls.

#5285358 What is the difference between an Critique and a Review?

Posted by Hodgman on 05 April 2016 - 05:19 PM

A film critic analyses the film with regards to the artistic practices of the medium, deconstructing the work that's gone into creating it and attempting to decode the intentions of the creators.


A film reviewer tells you whether you should go and watch the film or not.

#5285160 D3D12 - Root Signature difference between PSO and CmdList ?

Posted by Hodgman on 04 April 2016 - 07:46 PM

No, a PSO can't change which root sig that it's compatible with after it's been created.


What I meant is that the PSO doesn't necessarily retain that pointer. When creating a PSO, it will use some information that's contained within the root signature, but it doesn't retain a pointer to the root signature itself.


struct RootSig { int c; }
struct PsoDesc { int a, b; RootSig* root; };
struct Pso { int a, b, c; }
Pso CreatePso( PsoDesc& desc )
  Pso newPso = { desc.a, desc.b, desc.root->c };
  return newPso;

#5285154 D3D12 - Root Signature difference between PSO and CmdList ?

Posted by Hodgman on 04 April 2016 - 07:19 PM

When creating a PSO, you have to tell it which root signature it will be used with. The PSO doesn't have to remember this information -- it uses this information to compile/optimize the new PSO, and then is able to forget that pointer, with the assumption that you will only use this PSO with a compatible root signature later on.


When you bind a PSO to a command list, it doesn't also bind a root signature, because the PSO doesn't necessarily have that information (it's allowed to forget).

However, you have to remember. When you bind a PSO, you have to make sure that you also bind a compatible root signature.


As an example, you should have multiple PSO's that share a root-sig:

PSO1 - compatible with Root A

PSO2 - compatible with Root A
PSO3 - compatible with Root B
You could then have a draw command stream that looks like:
Bind(PSO2) -- n.b. no rebinding of the root sig here :)

#5285150 Questions about D3D12_GPU_VIRTUAL_ADDRESS, D3D12_CPU_DESCRIPTOR_HANDLE, and D...

Posted by Hodgman on 04 April 2016 - 06:33 PM


It's because the GPU and CPU may not use the same virtual addresses.


To build on what vlj is saying under the hood the GPU most likely runs in GPUMMU mode where gpu has it's own page tables, this means that the gpu virtual address and cpu va will be different. A newer GPU may also run in IOMMU mode where GPU and CPU share the same page tables so in this case the GPU and CPU va will be the same. It all depends on how the driver is implemented under the hood.


Edit: So to answer OP's question, anything on the command list will use the GPU descriptor handle.

For API like SetGraphicsRootXXXView, it requires D3D12_GPU_VIRTUAL_ADDRESS, instead of D3D12_GPU_DESCRIPTOR_HANDLE. so what is the different from GPU point of view then?  Sorry for being annoying, but I just want to know what under the hood 




These "root views" are different to a normal view object. 

Normally a view descriptor contains a pointer (virtual address), and information about the view such as the format and size.

A "root view" is thinner and more dangerous -- it's just the pointer with no information about the format or size of the data. This means that it can't be used for all situations, but when it can be used, it's more efficient.

To begin with, you should probably avoid these "root views" due to this complexity / danger.

#5285146 (sweet!) New Vulkan features just released!

Posted by Hodgman on 04 April 2016 - 06:12 PM

Could've waited the extra two weeks for April 1st :P :lol:

#5285141 In terms of engine technology, what ground is left to break?

Posted by Hodgman on 04 April 2016 - 05:39 PM

Unity and UE4 are very flexible, very generic, friendly, very powerful & very portable. But this has come at price of performance where they pale against a custom tailored engine (by several orders of magnitude, I'm talking between 4x & 10x difference)

That's really interestering.  Could you give an example or two where another engine is that much faster than Unity or UE4?

Almost any proprietary engine used by the big studios :P
Most big console games are more often than not, built on custom tech that's owned by that studio. You hear about some of them, such as Snowdrop, Frostbite, Nitrous, or Dunia, but most just fly under the radar because they're only ever seen or used by the few dozen developers inside the walls of that studio. There's previous-generation engines that could do 50k draw-calls per frame without breaking a sweat, but Unity would shit it's pants given the same workload.

That doesn't mean that Unity/UE4 are bad -- they're amazing engines. They really just aren't the pinnacle of engine tech that exists in the world.


Even my small-time engine completely hands-down destroys UE4 and Unity when it comes to rendering efficiency, but they win on tools and ease of use. We had a situation recently where an artist accidentally exported a model as 2000 sub-meshes, in a way where the tools weren't able to re-merge them into a single object. That would cause a serious performance issue in unity, but in our engine we didn't notice because it was chewing through 2000 sub-meshes per millisecond...

#5284960 What Shadow mapping techniques do engines use?

Posted by Hodgman on 03 April 2016 - 10:42 PM

Check out MJP's blog on shadows :)


#5284942 In terms of engine technology, what ground is left to break?

Posted by Hodgman on 03 April 2016 - 07:24 PM

I'm with the other posters who said "workflow" :D

If you're trying to make an MMORPGFPS with a team of one man, then the idea engine would be based around making all the tasks involved in that game as simple as possible. That might mean that it's got great UI's for quickly tweaking procedural art, or that it's got instant code-reloading for sub-second iteration times... Whatever it is that your team wastes time doing, that's what the engine should aim to fix.

Another answer would be "that killer feature for your innovative new game". If you want to make a game about civil engineering with detailed structure failure mechanics, or a game about fire propagation, or a game about designing aerodynamic vehciles in a wind tunnel, or a game about time loops, etc, etc... there's plenty of ideas like this that are hard to make with existing tech. Once someone builds the tech though, they often become standard.


e.g. I don't think that Achron would've been easy to make in any existing engine. The entire game-state management system needs to keep track of a huge number of parallel versions of the simulation that are slowly erasing each other as the "time waves" progress and rewrite history.

Or in Half Life 2, using a physics engine to have movable game objects was novel, so they showed it off a lot and made puzzles around it. These days it's just expected that games will be using a physics engine though :lol:

Or Gears of War was built around a "snap to cover" system, popularizing that technology to the point where it's now almost expected in many games.

Or Assassins Creed showed off an amazing character controller that used extensive IK and AI to allow simple controls for a very complex movement scheme. Now, that kind of movement is taken for granted by many games.


If you're trying to make a game that's never been done, not only is the technology lacking, but the also the terminology/lexicon too. It's only in hindsight that we'll see how obvious that bit of tech should be, and everyone starts copying the tech and taking the new game mechanic for granted :)

Which to get true physical simulation would require the application of chaos theory... which means that you'd need to build some sort of monstrocity and hook it up to your computer to assist with these type of simulations. I'm not talking about another computer. I'm talking about a dice roller.

Modern CPU's do actually have RNG's (not PRNG's) built into them now, which source random numbers from true entropy sources.
I used to work on gambling machines, which were often legally required to use true random numbers too. A radioactive or thermal entropy source isn't as impressive to look at as those dice machines, but they are very small and easy to embed in a PC :wink:

My work/research focuses on using real-time ray tracing on the CPU, here is a recent paper of mine from i3D2016 showing what is possible. ...
The big problem in transitioning this technology to games at the moment is that you still need most of a 4-core HT CPU to do this interactively. GPU implementation is a possibility but I doubt many game developers want to trade most of their graphics compute time for sound, plus there are issues with the amount of data that needs to be transferred back to the CPU and the latency inherent in this.

Surprisingly, VR is what's finally pushing tech companies to actually take this stuff seriously now, so there's finally some momentum building up here on the game-dev/engine-dev side of things  :)
Have you considered targetting DSP chips like AMD's TrueAudio? Not for the ray-tracing part, but the actual filtering would benefit from that.

To be fair... game processing is kinda hard to do out of single threads.

Not necessarily. Existing single-threaded code is hard to magically make multi-threaded, but there's nothing special about game rules that makes it any harder to multi-thread than any other problem.
The free lunch is over was published in 2004.
Xbox 360 launched in 2005 and forced game developers to target SMP.
PS3 launched in 2006 and forced game developers to target NUMA.
We've been writing multi-threaded games and engines for over 10 years now. Most studios are pretty good at it by now, and have code that automatically works on 1 to 8+ threads.

#5284933 Post your book collection!

Posted by Hodgman on 03 April 2016 - 05:19 PM

BTW Bjarne Stroustrup is the creator of C++, so his books on C++ will of course be very technical and focus on small details. Not the best beginner's material.



Whenever it gets to the end of a financial year and I'm not poor at the time, I usually get inspired to go spend a hundred or two on technical books to make the most of my tax return :lol:  My office bookshelf has:
Introduction to 3D Game Programming with DirectX 12 (this one's in the mail...)
GPU Pro 1/2/3/4/5/6 -- Great if you're an intermediate/advanced graphics programmer
ShaderX 7 -- Actually the same series/editor as above, but different publisher
Mathematics for 3D Game Programming and Computer Graphics*
Real-Time Collision Detection
Real-Time Rendering -- An amazing overview of so many different computer graphics topic.
Real-Time Shadows
Advanced Global Illumination
Physically Based Rendering, Second Edition: From Theory To Implementation -- Dives extremely in depth in building a ray-tracer and explains the concepts very well, if you like reading code... The book contains the full code for one!
Practical Rendering and Computation with Direct3D 11* -- Great D3D11 reference manual
OpenGL Programming Guide: The Official Guide to Learning OpenGL, Version 4.3
OpenGL SuperBible: Comprehensive Tutorial and Reference
OpenGL 4 Shading Language Cookbook
OpenGL ES 3.0 Programming Guide
OpenGL ES 2.0 Programming Guide
Game Programming Patterns
Game Engine Architecture -- good if you're an intermediate programmer and wonder how proprietary engines get built
Code Complete: A Practical Handbook of Software Construction -- a bible that every professional programmer reads (or claims they've read) :D
Clean Code: A Handbook of Agile Software Craftsmanship
Large-Scale C++ Software Design -- very dry, but it's the bible for staying sane in large C++ code-bases.
Effective C++ -- Scott Meyers is very readable and just wants to correct people's bad habits - a must for all C++ programmers.
More Effective C++ -- as above
*"Mathematics for 3D..." and "Practical Rendering..." have authors who are members of this forum, so of course those books are good :wink:

#5284821 Texture sample as uniform array index.

Posted by Hodgman on 02 April 2016 - 10:29 PM

1) Yes.
2) On modern hardware, no.
Get the HLSL compiler to spit out the assembly code of your shaders and have a quick look. Even if you can't really read the assembly, "constant waterfalling" will stand out like a sore thumb... On some older shader models, the asm will look like (psuedocode):

if sample == 0 then value = array[0] else
if sample == 1 then value = array[1] else
if sample == 2 then value = array[2] else
if sample == 3 then value = array[3] else

This shouldn't happen in shader model 4/5... but it did used to be a thing that happened... and obviously was a huge performance hit :wink:


However, consider using a structured buffer (GL: texture buffer) instead of a constant buffer (GL: uniform array) -- these are a hint to the driver that you'll be reading random subsets within the buffer, whereas a constant/uniform buffer is a hint to the driver that every pixel/vert/etc will require all data contained within the buffer.

#5284691 Dealing with unstable contractors?

Posted by Hodgman on 01 April 2016 - 07:42 PM

you are just forcing artist to make free tests and steal their time

This is actually something that predatory studios do, and artists have to keep their guard up to protect themselves from it.

This guy has probably dealt with that situation before, and for whatever reason, apparently your correspondence triggered his pattern detector and got his guard up.


Yeah as above it could be lost-in-translation issues, but in future, I'd be very upfront about what the outcomes of an art-test will be, and what the time-frames are for those outcomes. If you ever move these goal posts, or fail to define them in the first place, you could easily scare people like this guy has been scared.

#5284595 When would you want to use Forward+ or Differed Rendering?

Posted by Hodgman on 01 April 2016 - 05:47 AM

Those all represent different lighting shader permutations / code-paths in the deferred renderer

Do you handle those with single shader and branches or multiple passes?
I'd assume branching is fine when processing small blocks of pixels with a good chance of few materials.

At the moment I'm just doing a lot of branching. In a previous implementation I broke the screen up into tiles and did conservative branching per tile to run a differently optimized shader per tile.


Forward is nice though in that every shader can be properly optimized. Even if a branch isn't taken for any of your pixels, and even if your GPU can execute branch instructions for less than a clock cycle's cost... that extra unused code still creates GPR pressure, eating up register space and restricting the amount of pixel that the GPU can have in flight at a time. 

#5284581 When would you want to use Forward+ or Differed Rendering?

Posted by Hodgman on 01 April 2016 - 02:22 AM

It may be debatable, but i think using a general PBR shader for everything should do it and thus the argument of limited material types becomes almost neglectable.

My PBR lighting model handles most object with the default code path... But optionally handles anisotropic reflection (e.g. brushed metal), sub-surface scattering (e.g. skin), back-face transmission (e.g. leaves), retroreflection (e.g. road paint), translucency and diffusion (e.g. frosted glass) all at extra/optional cost.
i.e. Those all represent different lighting shader permutations / code-paths in the deferred renderer, and different material+lighting shaders in the forward renderer.

4 targets? Hoooooooooooooooooooooooow exactly?

4 targets / 128 bits is pretty standard for a gbuffer.

Light accumulation doesn't go in the gbuffer and is a separate FP16/64bit target.

A simple system can get away with two targets:
1: color(rgb), metalness / specularMask
2: normal(rgb), roughness.

A more advanced system can cram a lot into 4 targets:
1: color(rgb), translucency
2: roughness (rg), metalness / specularMask, retroreflection (4bit), clearcoat(4bit),
3: normal/tangent (quaternion in 10_10_10_2)
4: ao, cavity, curvature, id

#5284461 Best way to simulate weight distribution between car wheels

Posted by Hodgman on 31 March 2016 - 05:32 AM

What are you going to do once you've calculated the load on each wheel?

The posters above are saying that even if you never write a single line of code that calculates load-per-wheel or weight-transfer, a car simulation can still exhibit these behaviours.
Weight transfer is an emergent property of the simulation.

Let's say the car's centre of mass is towards the front, so at rest, the loaf will "shift to the front wheels" and the nose will angle down.
We don't have to write any code for these behaviours.
All we need is:
1) a rigid body system with adjustable centre-of-mass wnd total-mass properties.
2) a gravity force, which accelerates the body downwards.
3) the ability to ray-cast against the ground.
4) a spring model, which uses ray-casting to measure the distance to the ground, computes spring compression from that value, and applies a force to the body from that value.

To begin with, you spawn the car in the air, perfect flat, and gravity accelerates it downwards.
After some time, all 4 wheels touch, their springs compress, and 4 upward forces are applied.
If the wheels were all equidistant from the centre of mass, no torque would be generated as they'd be cancelled out -- however, the back wheels are further away from the centre of mass than the front ones, so when you call AddForceAtLocation/etc, the rigid body system generates four torque values which sum together to create a slight "pitch-forwards" torque value.
The result is that when the rigid body system applies all the forces/torques for this frame, the car rotates forwards as well as falling down, meaning the front wheels fall further than the back wheels.
In the short term, if you don't also have dampners, the front springs, being more compressed, will push harder than the back ones, causing the load to shift back again -- so the car will rock back and forth a few times... But in the long term, the springs will find equilibrium and come to rest such that the front of the car is lower than the back, due to this torque produced simply from the spring's locations relative to the centre of mass.

The same thing happens when cornering. Say the front wheel's contact patches are exerting a force that goes forward and slightly left (this force is in addition to the spring forces).
Because the contact patches are below the centre of mass, this slightly left force will automatically generate a torque that causes the car to roll clockwise/right (and a yaw-torque that spins the car left, as these left-forces are being generated in front of the centre of mass). Thus when making a hard left turn, weight will transfer to the right wheels.

So you don't have to code this. It magically arises out of rigid body physics by simply applying linear forces at the correct locations.