Advertisement Jump to content
  • Advertisement

LeGreg

Member
  • Content Count

    367
  • Joined

  • Last visited

Community Reputation

754 Good

About LeGreg

  • Rank
    Member

Personal Information

  1. It is asynchronous (meaning it is submitted on the CPU time line, but it will not actively wait for the GPU timeline to complete the task before returning).. but the commands get sent right away (it doesn't wait for anything). What happens is that your program lives and executes in user land. The windows DDI does not allow (currently) to submit commands directly from user land. Because of that submitting commands to the GPU is triggering a user/kernel transition, which is a bit expensive (to do at every draw call). Once upon a time, because of this user land limitation, commands would be batched by the runtime and driver and would be submitted all at once at random times (not immediately). (though some commands would force this submission to happen, this was not how commands were typically submitted). Now with dx12 YOU control the rate of submission and execute calls are submitted immediately. So you are the one making the judgement call to build a batch of commands (through command lists) big enough to not trigger the user/kernel transition too often. But will the GPU see the command immediately after the submission ? Well it depends. If there's nothing in the pipe being rendered then that command list could be seen immediately by the GPU. If there is still work to be executed then it will be put into a queue for execution. There's actually a higher priority queue that will take up any new work that is posted there before looking at the other work posted by normal apps. (as an application writer you should not worry about that detail).   Present will stall.. but only if you hit the render limit you set (or the one set by the API). Typically by default it is three frames of GPU work can be submitted before a Present() call will stall. It is to prevent the CPU from going way more ahead than practical. You can control that rate and sometimes it is encouraged to do so to limit latency (the time it takes for an input to be taken into consideration and having an effect visible to the end user on their monitor). That stall does not need to consume CPU power (it can be paused then resumed at the next vblank), but your app will be stuck in that thread during that time (which can be okay.. or not well it's up to you). Because it doesn't consume CPU power, your OS/CPU can either run another thread that still has work to do, or go into a idle mode that does not consume as much electricity.   There's the notion of API order. And multiple time lines. In the current time line, things are ordered in the order they are submitted to that time line. If in your timeline you submitted the Signal() AFTER the Execute() then you should be guaranteed that the Execute() is all done when you receive the message that the Signal() has completed. This is really important as you're using fences before you recycle, reset, destroy, resources and can't have them still in use by the GPU when you do.   It's not a race condition, because if the condition becomes true after the if() is taken, then WaitForSingleOBject() will simply return immediately. This code is functionally equivalent to the one you posted : // Signal and increment the fence value. const UINT64 fenceToWaitFor = m_fenceValue; ThrowIfFailed(m_commandQueue->Signal(m_fence.Get(), fenceToWaitFor)); m_fenceValue++; // Wait until the fence is completed. ThrowIfFailed(m_fence->SetEventOnCompletion(fenceToWaitFor, m_fenceEvent)); WaitForSingleObject(m_fenceEvent, INFINITE);But it doesn't do a quick early check for fenceToWaitFor, as a consequence it will set the event every time (you can see that as a mini-optimization, not changing the meaning of the code).
  2. LeGreg

    Off Center Projection ?

      It's actually *really* easy.   Your vertex shader simply outputs normalized screen coordinates (from -1 to 1) multiplied by W.   If you want to simulate the same shift in screen coordinates that you would get by shifting in Photoshop, then you take the computed screen coordinates (X/W, Y/W) and add the necessary constant amount to each one.   So to sum up If you want to shift from (Xo, Yo) in screen coordinates, you do this (after you've done all the transforms !) : output.position.x += Xo * output.position.w; output.position.y += Yo * output.position.w;   (Note : z and w won't change obviously).
  3. I'd recommend resetting the command allocator as soon as you're able to do it (but not sooner !).   As mentioned command allocators hold resources so if your game has a high resource usage that puts more pressure on it.   So that typically means an allocator used at frame N, will be reset at most at frame N + M (M being how many frames in advance your CPU is compared to what your GPU has done).   On the other hand an allocator used for a bundle, will not be reset until that bundle has stopped being reused, if it's the same bundle that is used for the whole existence of your application then that means the allocator will not be reset before the end of your application (so you also have to be careful to not mix bundles and command lists that have very different lifecycles).   Also, about the cost of doing reset : doing a reset per frame is NOT costly. Command allocators have been designed to have a lower CPU overhead when recycling resources, and that means you can reset it and be able to more quickly recycle allocations than without an allocator (what dx11 was doing). Allocators reuse is an integral part of the reduction of allocation cost over time.   Silly example : you have 8 threads building command lists. You could have for example one allocator per thread per in-flight frame. If there is two frames between one command being added to a command list and that command resulting in something on the screen (using a fence to guarantee that), then you will at first have 16 command allocators. One frame N, you use the first 8 allocators, then on frame N+1 you use the next 8 allocators, then before frame N+2 starts you wait for the fence that will signal the end of frame N, then you reset the first 8 allocators, and so on. Then on top of that you have bundles that you built for reuse. If you have bundles that will last for the whole level (for example !), you have a dedicated allocator for them (if you built them in parallel then you will have one dedicated allocator per thread). Then you keep those allocators around until the level ends and those bundles are not used by any command list that is still in flight.   (we're using the term "frame" loosely here, it's any unit of GPU work whose completion you're keeping track of).
  4. LeGreg

    Bloom Flickering

    Are you downsampling in high dynamic range format ? Don't forget that half overbright can still be overbright, so if your process introduces non linearities those overbright values do not get filtered as you wish they would.
  5. LeGreg

    PBR Metalness equation

    "Fresnel is a scalar.." Not really. Fresnel is a "complex" function (as in complex number : two scalar components), one scalar function for orthogonal component and one scalar function for the parallel component of light (light is a wave in a 3D space). It is a function, not a single number, because materials have very different reaction to light based on its wavelength. Some materials will reflect some wavelength more than an other. So when you describe F0 as a color with three scalar channels, you already simplified the problem a lot . The reason why metals who should be entirely reflective on all wavelengths are not is because of the underlying physics (it takes different amounts of energy to move electrons in their lattice of atoms). This is where Gold or Copper get their color. Also some other metals get their colors because of destructive interference (some thin oxidation layer for example at the surface, or metal plating, and so on). Some will have a color coating (paint, and so on).   Not really.. Also the idea behind PBR is NOT to compare your output to another PBR renderer, but go back to first principles of light. Of course (almost) nobody does that and everybody is using the same biased equations as a (or several) seminal paper . Anyway : cubemaps and (apparent) point lights are two approximations of the underlying physics on the opposite end of a spectrum. One work well for some light sources and the other one work better with other light sources. It's the trade offs you have to do to achieve your real time performance (even offline renderers have limits !). Basically you want to avoid a solution that is too generic as to prevent you to do useful approximations. Which is why it is useful to segregate light sources into different types. In PBR, the idea is that you make informed approximations instead of ad-hoc ones. But you still make them. (And ideally you'd have also figured where those approximations break down).   Lighting equations have the form of integrals (A big sum over a continuous range). The summed terms are usually much simpler and yes we still use the notion of a normal and incident light (in the microfacet model or the BRDF model). The (pre-)blurred environment map is used as a quicker implementation that may or may not be accurate depending on the situation.   There is a light direction inside the integrals. (we sum over all possible incoming light directions). But since it's impractical to sum that integral in real time you try to pre-sum as much as possible and yes sometimes you get an estimated pseudo-"light direction" that may or may not accurately reflect how the light really interacts with the material.   You have to be careful to not have your perceptions tinted by your limited experience. Anyway cubemaps are a useful tool in real time rendering so yes they're used often.   The point is : if your material is already perfectly fine with a simpler model, then there is no need to make it more complex than that. BUT 1- very few materials act as perfect mirrors, so that insight will be useful for a very small fraction of your scene. 2- In theory you could go back to the first principles and re-derive the effect of electrons of the glass and metal on the photons to find out why your mirror material is fine as it is. This is probably not needed in your case but you could if you were so inclined (and I'm sure there are physicists out there who HAVE to do that in order to create better mirrors for specific applications).     Addenda : If the idea of separate approximations is strange to you here's an illustration (for an offline renderer) :     You can see that based on the dimension of the light and the type of surface (roughness) the sweet spot in term of artifacts will be at a very different place. So the whole idea that you have to solve is how to try to be generic (to accommodate different environments), but not as generic that you miss the sweet spot for two common categories of lighting+material combinations. In real time it often means you'll have to keep some approximations separate (for example low frequency ambient vs bright quasi-punctual light).
  6. LeGreg

    PBR Metalness equation

      There are two elements to this question,   First, a non metal material will tend to have a dimmer reflection of the surrounding (because some of the light will be absorbed and/or diffused). You can usually still see bright spots because the intensity of point lights is so much higher than the surrounding.   Second, that color F0 is the minimum total reflection. If you are seeing things from a grazing angle, the reflection is closer to one (if you take the Fresnel effect into account).   Here's an illustration : http://media.codermind.com/lighting/dielectric-ggx-rough-0-2.mp4 There's a bright spot from the sun, it is dim when it is facing the camera but becomes much whiter as it is seen from increasingly grazing angles on the edges of the sphere.  
  7. It's not just the ability to separate the color of two adjacent pixels (we may be close to that limit already), but you have these effects that are still present:   - the human eye is very good at seeing contrasted edges and narrow variations in brightness, we are very far from the situation where they are not present (narrow features will still appear ropey : http://www.massal.net/article/hdr/roping.gif).   - the ordered grid of the pixels of the screen is not good at hiding moiré. Ideally each pixel would have a slightly random position to its adjacent one instead of being all equidistant. Until we have that you will always find content that will show moire (moire can also happen in real life, but the choice of how it's rendered and displayed should not add more).   Don't forget that texture pre-filtering (then anisotropic filtering on top) is also used to hide aliasing : the antialiasing that most people think about applied on geometric edges is a small part of what is done to reduce aliasing on the screen.   So yes in theory antialiasing is still going to be useful (would you get rid of mipmaps for example ?). In my mind, antialiasing has a better pay off than purely increasing the resolution : it can be done in a pre-computed pass (mipmaps) or as a post processing pass (intra and inter-frame), or with analytics (lean/leadr) and so on. Keep in mind that brute forcing increases of resolution also has a diminishing return.
  8. LeGreg

    HDR gamma correction

    Additional note, While people have pointed out that if your data is in non linear sRGB format, then doing a filtering directly with it is going to be incorrect (resulting in more aliasing or weird differences in brightness).. On the other hand, doing a sRGB to linear conversion before filtering, is NOT giving you a correct result especially when the higher dynamic range is involved. Arguably you can't get a correct (perceptual space) result with pre-filtering. Prefiltering is done in two places : First when computing the mipmaps (could be done off line), second when doing the bilinear taps/anisotropic filtering. Those two steps assume (wrongly in the case of the HDR content) that the filtering steps and shading steps can be done in any order. The only time they are (strictly) commutative is when the shading step is a linear operation. The tonemapping alone is not going to be linear, except on small ranges (approximately). For example on a sigmoid function (if your tone mapping function looks like a sigmoid), there's a narrow middle band where things almost look linear/affine, but then near zero or in higher intensity it flattens and we lose the commutativity on any range of values that go near that. What can you do ? Very little actually (you can live with it). Unless you forgo hardware filtering completely you have to rely on this pre-filtering no matter what. Doing it in "linear" (lighting) space is slightly less wrong than doing it in sRGB space but not by very much. You could, in theory, do supersampling (doing all the operations at a higher resolution then downsample) so that the textures are filtered in perceptual space a little bit. But it's usually considered too expensive to do by default. You can also ask your artists to make textures that are very flat and don't have a very high contrast (at all levels of minification), that way no matter what calculations you do in your shaders and tonemapping phase, the final values are going to be close together and the function that transforms one to the other can be approximated with a linear function. That's of course very limiting and may not fit your content at all. You could also in theory have a larger support for your filtering function (at the cost of extra blurriness). (this problem also affects other types of calculations that are not linear or affine, like lighting calculations from normal maps, or anything else we do in shaders these days). The end result is not necessarily going to be super wrong of course. But you may end up with more aliasing and artifacts that you would have liked.
  9. LeGreg

    Banding Problem

    When people say "texture", it's more so that you add high frequency details to fool the eye.   You need 1- to take a hard look at your whole pipeline and look at any place that could reduce precision. If you have multiple passes of blending, be careful if you take a 8 bit per channel data, and increase its brightness in a later pass (by multiplying by a value bigger than 1). If you have textures, make sure they don't have already banding (they are stored in sRGB format and they have enough high frequency details, especially if they are compressed textures that could have reduced their precision).   2- If you have perf to burn, make all your calculations in a higher precision (use 16 float number or 10 float number or 10 integer number rendertarget), then as a last pass convert to the 8 bit per channel format. Either with a straight conversion (if the banding was introduced by the aforementionned blending or rescaling), or with dithering.     In what sense ? By default your desktop is displayed (implicitly) in sRGB format (based on your monitor default assumption). That format increases the precision in the dark areas which is where most banding is visible. So the recommendation would be to NOT touch the default color correction/color space. Changing the monitor/scanout color space to a "linear" format (with SetGammaRamp() for example) will definitely increase banding (linear is still recommended for lighting computations which is where the hw capability (or the shader based conversion) comes in). If you have to make brightness adjustments, try to do it shader based as much as possible rather than playing on the output ramp.
  10. CBVs match the old "shader constants" paradigm. Shader constants are as their name indicates constants. They draw advantages of that fact and for example if they do not vary per shader "thread" you do not need to have a dedicated storage for each thread in hardware, instead all threads can refer to the same value in memory. (gpus could have a thousand "threads" executing in lock step all referring to the same "constant"). SRVs match the old "texture" paradigm. Textures are typically interpolated along the primitive (triangle..), which means their values vary per pixel thread (if you're writing a pixel shader). Each thread so then need to have their specific value loaded at execution which is a bit more involved. In addition textures require filtering, conversion, extra steps that constants do not have. So the pipeline involving textures is going to be a bit deeper (and may require more latency hiding effort). So if you fall closer to one preferred usage situation vs the other that will tell you if you should prefer "constants" or "textures". Nowadays usage patterns can fall in between the two, in which case you may have to profile if it's a close call on paper between both choices.
  11. Shadowmaps are useful for shadow casting objects and lights. Modern games still do not have a lot of shadow casting lights as that would tank performance. (or you can have shadow casting for "free" in something like voxel GI with many light sources.. but that in itself is expensive so not really "free"). What usually happens in games.. is that lights that cast shadows are either hand optimized (one per scene or per subsections) or automatically optimized (the engine selects the closer light source or the brightest one relative to distance <- eg the sun would always be far but typically the brightest). Ideally you profile based on your minimum spec (or provide a slider so that you can scale perf for a wider range of hardware) that allows you to determine how many shadow maps you want (at most) per scene. If you subdivide your scene into subsections it can also be useful to have hard fall offs for lights rather than physically accurate ones (they would decrease to zero at infinity).
  12. The closer you get to "reality", the blurrier the boundary between those lighting techniques becomes. Ambient historically described one approximation of lighting (non spatially varying lighting), and "direct" lighting described another approximation (highly spatially varying lighting). They are useful because they allow very fast lighting calculations on a hardware that is never fast enough but as the scenes become more complex and/or closer to what the reality look like then these distinction may stop being useful. Image based lighting does not in itself contains any indication of an approximation.. But indicates a capture method (rather than describe a scene lighting through functions and "light objects", you instead evaluate it through a captured array of light intensity <- an "image"). It's not exclusive, you can combine it with any other method, recompute it dynamically and then you can use that capture method and do approximations similar to the ambient/direct methods that we talked about earlier (you could extract direct lights from an image, and compute an average ambient term from an image, or do something in between). So these terms will describe either : what's the predominant approximation, how are things stored, how are things captured, and so on.
  13. LeGreg

    DX12 SkyBox

    If your problem is to "how in practice" draw the skybox as the last thing (without it intersecting other geometry) you probably have many solutions to choose from. You can choose some range to draw the main scene and some other range to draw the skybox. You can modify your vertex shader to make sure your skybox always end up near the max z value (not exactly the max z-value as that could result in z-fighting and clipping on some hardware). You can also render your skybox as a quad near the max z value (a cube has a varying depth unless you use previous tip, the quad can be drawn at constant depth) and use texture coordinates to point to the right texel in the cubemap and so on. If you're unsure on how to modify your vertex shader, remember that the W value that you write determines the proper position on the screen (perspective correction and X/W Y/W screen coordinates after a conic projection) and that the Z/W value is what ends up being written in the depth buffer and/or tested against the depth buffer.
  14. LeGreg

    DX12 SkyBox

    &nbsp; 1 - Expensive is a "relative" term, it's more expensive than not switching them, so you should avoid doing it if you can without adding too much extra cost to do that. On the other hand setting states and drawing things is much more efficient with d3d12 compared to d3d11 so you have more of a margin there before hitting hard limits. If objects share the same pipeline states try drawing them together if you can. But you do not have to go out of your way to merge different pipeline states by artificial means (not unless you find that is becoming a problem). 2 - I'd suggest turning on the debug runtime. 3 - Try to render the sky as the last opaque thing. (you could also use a Z only pass). The culling of pixels (not rendering unnecessary pixels) is more efficient if opaque objects are rendered front to back (in a rough order but you know that the sky is always the background so it's not complicated to find out when to draw it). The z-only pass helps if you can't for some reason guarantee a very good front to back order. But it can be more expensive if you don't gain any more pixel culling after it (it's two passes instead of one). We realize that some advice can appear contradictory. It's because you always have to make trade-offs. The preferred order for X is different than the perfect order for Y. It's why you always profile (if possible with your final load) and before you can profile you build things that are more flexible and/or with good guesstimates. See for example this benchmark : http://www.anandtech.com/show/9112/exploring-dx12-3dmark-api-overhead-feature-test/3 it shows that the engine can render 300k draw calls per frame. Grouping one more draw call the wrong way is not necessarily going to affect performance that much (especially if you are far from the theoretical limit). On the other hand if your sky is expensive to draw per pixel then drawing it after everything makes sense.
  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!