Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 14 Feb 2007
Offline Last Active Today, 04:43 AM

#5302871 [D3D12] Swapchain::present() Glitches

Posted by Hodgman on Yesterday, 10:55 PM

Are you using fences to make sure that the GPU has finished each frame before you reuse that frame's resources?

#5302847 Game-Hobby

Posted by Hodgman on Yesterday, 09:28 PM

I'd recommend making a mod for an existing game rather than a game from scratch here. That way you can start with fully functioning gameplay systems and millions of dollars worth of assets as a starting point, and just modify them to suit your new ideas.


Given the 100km2 requirement, I'd suggest checking out the modding tools for Arma 3.

#5302813 Multi Vertex Buffer And Inputlayout

Posted by Hodgman on Yesterday, 03:06 PM

Yep, that will set them to slots 0 and 1.


The stride is the number of bytes to advance in the data stream to reach the data for the next vertex.

Your first set of attributes would use sizeof(Vertex1) and the second set would use sizeof(Vertex2). You'd also make sure to specify each attrib as coming from stream 0 or 1 (via the InputSlot field in D3D11_INPUT_ELEMENT_DESC).


As for the second question, yes, you can specify just 3 attributes, but, no, if you lie about the stride then the GPU can't iterate through the buffer correctly.

#5302706 you all say oculus rift but why not google glass?

Posted by Hodgman on 26 July 2016 - 10:17 PM

By the way, the surprising amazing thing about the HoloLens isn't the holograms - it's the world tracking.

It really does make the Oculus and Vive's IR tracking systems seem "old technology"!

Hopefully Oculus Touch will address the issue later this year.

The tracked area will still be smaller than the Vive's, due to there only being two cameras in front of you, rather than one projector on either side of you... but the prototype Touch controllers IMHO are waaaay better than the Vive controllers.
* Vive has the thumb touch pad, which is more flexible -- virtual buttons, virtual mouse, virtual thumb-stick, swipe gestures, etc...
* Touch has a traditional thumb-stick and a few buttons, xbox style... which I prefer. Touch also has capacitive touch so it knows whether you've got your thumb resting on the controller or not, allow you to give a "thumbs up" gesture as an input.
Index finger:
* Vive and touch are both pretty much the same - an analogue trigger. Vive's trigger feels a little firmer. Touch has the capacitive sensor, allowing a "finger guns"/"pointing" gesture as input.
Middle finger:
* Vive has a stupid button that takes an awkward amount of force to squeeze.
* Touch has another analogue trigger, just like the index finger (again with a capacitive sensor for "finger off trigger" detection). Picking up items by squeezing feels far better than with Vive's button. It sounds minor but this made a massive difference in immersion to me.

#5302588 Basic Game Object Communication

Posted by Hodgman on 26 July 2016 - 01:45 AM

3 - KISS

#5302570 D3D11_Create_Device_Debug Question

Posted by Hodgman on 25 July 2016 - 09:23 PM

Where are you looking for the debug output? Just in case you're looking in your console window / etc, it only appears in visual studio's 'output' window.


If you're running Windows 7/8, make sure you've installed the Windows 8.0 SDK.

If you're running Windows 10, you need to separately install the debug layer dlls: http://stackoverflow.com/questions/32809169/use-d3d11-debug-layer-with-vs2013-on-windows-10

#5302562 Create A Gaming Operating System

Posted by Hodgman on 25 July 2016 - 07:40 PM

You could always implement all of this as an application like steam, rather than an entire OS :)

#5302468 Rpg Stats - Temporary Changes, Harder Than I Realised!?

Posted by Hodgman on 25 July 2016 - 06:30 AM

Yep, this can lead to troubling bugs (or exploits) when attributes can change as well -- e.g. if you happen to level-up while a buf is enabled.

3. Represent the stats as a stack of operations.
e.g. a stack for your speed stat might contain:
*Slow: Subtract 10
*Base: Set 50

When you evaluate that from the bottom up, you get (50-10) = 40.


When you cast the slow spell, the new instance of that spell adds an operator to speed's stack and retains a handle to it. When the spell ends, it uses that handle to remove the operator that it added earlier.

If the player levels up, you can modify the 'base' item in the stack above, and it will automatically allow you to recompute the new resulting speed value, including buffs.


You can keep a cache of the speed attribute as well as this stack, and update the cache whenever the stack is modified... But never change that cached value -- always change the stack of operators, and let the stack update the cache with the new value.

#5302393 Fast Way To Determine If All Pixels In Opengl Depth Buffer Were Drawn At Leas...

Posted by Hodgman on 24 July 2016 - 06:16 PM

There's a hardware feature called 'occlusion queries', which do exactly what you're looking for -- determine a yes/no answer to whether something was drawn or not. To find out if there's "holes" in the depth buffer, you can draw a quad that's very far away using an occlusion query, and check if the result is "yes - the quad was visible".

Now I get to my idea - display few partitions, then CHECK IF ALL PIXELS WERE DISPLAYED, if not, display some more partitions, and so on. That could make it really fast, cause it would cut of everything excpet the first room. Problém is that extracting depth buffer and checking all the 1920x1080 of that little guys is so slow that it would be contraproductive (proven by try).

 A bigger problem is that the CPU and GPU have a very large latency between them. When you call any glDraw function, the driver is actually writing a command packet into a queue (like networking!), and the GPU might not execute that command until, say, 30ms later. This is perfectly fine in most situations, as the CPU and GPU form a pipeline with huge throughput, but long latency.
e.g. a healthy timeline looks like:

CPU: | Frame 1 | Frame 2 | Frame 3 | ...
GPU: | wait    | Frame 1 | Frame 2 | ...

If you ever try to read GPU data back to the CPU during a frame -- e.g. you split your frame into two parts (A/B) with a read-back operation in between them, you end up with a timeline like this:

CPU: | Frame1.A | wait      |Copy| Frame1.B | Frame2.A | wait      |Copy| Frame2.B | Frame3.A | ...
GPU: | wait     | Frame 1.A |Copy| wait     | Frame1.B | Frame 2.A |Copy| wait     | Frame2.B | ...

Now, both the CPU and GPU spend roughly half of the time idle, waiting on the other processor.
If you're going to read back GPU data, you need to wait at least one frame before requesting the results, to avoid causing a pipeline bubble :(
That means that reading back GPU data to use in CPU-driven occlusion culling is a dead-end for performance.

#5302283 Game Actors Or Input Components?

Posted by Hodgman on 24 July 2016 - 04:15 AM

1) You might have a split-screen / local multilayer game (pong has two paddles!), or the ability to remote control other entities (bombs, drones, mind control, a 3rd person camera), etc...

2) Yeah Actor/Entity/Prop/Object are often used interchangeable to mean a "thing" :)

3) 'The Actor Model' is a specific way of writing parallel programs (mutlithreading/distributed) - it's unrelated to Unreal's actors.

#5302255 Is This A 'thread Deadlock'?

Posted by Hodgman on 23 July 2016 - 08:40 PM

It's simply just not valid code  :wink:  :P
If you're sharing a mutable variable between threads, then you need to use some form of synchronization, such as wrapping it in a mutex.
Assuming C/C++: The old-school advice would be: this will work fine if 'done' is volatile, but don't do that (it's not what volatile is for, and will still be buggy). You can, however, make done a std::atomic and it will work in this particular scenario. An atomic variable is basically one that's wrapped in a super-lightweight, hardware-accelerated mutex, and by default is set to provide "sequentially consistent ordering" of memory operations.
Without some for of synchronization being present, there's no guarantee that changes to memory made by one thread will be visible by another thread at all, or in the correct order.
Ordering matters a lot in most cases, e.g. let's say we have:
result = 0
done = false

  result = 42; // do some work
  done = true; // publish our results to main thread

  Launch Worker
  while(!done) {;} // busy wait
  print( result );
If implemented correctly, this code should print '42'.
But, if memory ordering isn't enforced by the programmer (by using a synchronization primitive), then it's possible that Main sees a version of memory where done==true, but result==0 -- i.e. the memory writes from Worker have arrived out of order.
Synchronization primitives solve this. e.g. the common solution would be:
result = 0
done = false

    result = 42;
    done = true;

  Launch Worker
      if done then break;
  print( result );
Or the simplest atomic version... which honestly you should only try to use after doing a lot of study on the C++11 memory model and the x86 memory model (and the memory models of other processors...) because it's easy to misunderstand and have incorrect code :(
result = 0
done = false

  result.AtomicWrite(42, sequential_consistency)
  done.AtomicWrite(true, sequential_consistency)

  Launch Worker
  while(!done.AtomicRead(sequential_consistency)) {;} // busy wait
  print( result.AtomicRead(sequential_consistency) )

#5302158 Deferred context rendering

Posted by Hodgman on 23 July 2016 - 08:23 AM

Hodgeman: would I be able to run a parallel thread on the GPU? If I did the procedural generation on the GPU, I would have to write a very complex shader to do that.

No... well, this is what "async compute" does in Dx12/Vulkan, but you still shouldn't use it for extremely long-running shaders.

Any task that can be completed by preemptive multi-threading can always be completed by co-operative multi-threading, it's often just harder to write certain problems with one model or the other... In other words, you can break up a very expensive task into a large number of very cheap tasks and run one per frame (it just might make your code uglier).

e.g. I did a dynamic GPU lightmap baker on a PS3 game, which took about 10 seconds of GPU time to run... so instead, I broke it up into 10000 chunks of work that were about 1ms each, and executed one per frame, producing a new lightmap every ~2 minutes :wink:

#5302147 My First Videogame Failed Conquering The Market

Posted by Hodgman on 23 July 2016 - 05:56 AM

Sorry this isn't more constructive, because I don't have the art education to be specific, but -- the visual aspect of your game and website is unappealing.

Most people will make an instant judgement based on the first visual that's presented to them, and the colours, the composition, the style here just don't come together to make a beautiful piece of art.


It would be a very good investment to hire an experienced concept artist to visualize what the game could/should look like early on in production, and use that to produce a style guide / art bible for the rest of the production phase. You can also get a concept artist to "paint over" screenshots at any point in time to show you what they should look like, and then use that to focus on improving the art.

Same goes for the website -- an experienced graphic designer and UI/UX person would be invaluable.

#5301959 Matrix Calculation Efficiency

Posted by Hodgman on 22 July 2016 - 08:22 AM

Right now I can measure time in NSight's "Events" window with nonosec-precision and can’t see performance gain between the shaders.
Is there a way to measure the difference in a finer way?

Well there's two explanations -
1) NSight can't measure the difference.
2) There is no performance difference...

It could be that when the driver tranlsates from D3D bytecode to native asm, it's unrolling the loops, meaning you get the same shader in both cases.
It could be that branching in a GPU these days is free as long as (a) the branch isn't divergent and (b) is surrounded by enough other operations that it can be scheduled into free space.

e.g. on that latter point, this branch won't be divergant because the path taken is a compile time constant. I'm not up to date with NV's HW specifics (and they're secretive...) but on AMD HW, branch set-up is done using scalar (aka per-wavefront) instructions, which are dual-issued with vector (aka per-thread/pixel/vertex/etc) instructions, which means they're often free as the scalar instruction stream is usually not saturated.

#5301878 Matrix Calculation Efficiency

Posted by Hodgman on 21 July 2016 - 11:48 PM

Simple answer: yes - doing multiplication once ahead of time, in order to avoid doing it hundreds of thousands of times (once per vertex) is obviously a good idea.


However, there may be cases where uploading a single WVP matrix introduces its own problems too!

For example, lets say we have a scene with 1000 static objects in it and a moving camera.

Each frame, we have to calculate VP = V*P, and then perform 1000 WVP = W * VP calculations, and upload the 1000 resulting WVP matrices to the GPU.

If instead, we sent W and VP to the GPU separetely, then we could pre-upload 1000 W matrices one time in advance, and then upload a single VP matrix per frame.... which means that the CPU will be doing 1000x less matrix/upload work in the second situation... but the GPU will be doing Nx more matrix multiplications, where N is the number of vertices drawn.


The right choice there would depend on the exact size of the CPU/GPU costs incurred/saved, and how close to your GPU/CPU processing budgets you are.