Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 29 Mar 2007
Offline Last Active Today, 12:18 AM

#5271015 HLSL Bilinear Filtering ?

Posted by MJP on 14 January 2016 - 12:29 AM

Assuming that you've created and bound the sampler state correctly, that should be sufficient to have bilinear filtering enabled.

The way that rasterization and pixel shading works is that by default, all attributes that are output by your vertex shader will be interpolated to the exact center of each pixel. Here's a quick and dirty diagram that shows what I mean by that:

Pixel Coordinates.png

This diagram shows the case of a 2x2 render target that's fully covered by a quad with 4 vertices. As far as SV_Position goes, the edges of the pixels are located at integer coordinates while the center (which is where attributes are interpolated to) is located at .5. So the X coordinate of the first pixel is 0.5, the next one is 1.5, then 2.5, and so on. The UV's are interpolated, in the same way, except that they're typically 0 or 1 at all of the vertices. This means that they end up being fractional at every pixel, and their value ends up being lerp(float2(0, 0), float2(0, 1), svPosition / renderTargetSize). So if you wanted to sample the texel neighboring to your right with a UV coordinate, you would do UV + (1.0f / renderTargetSize). Alternatively, Sample takes an integer offset that you can use to specify that you want to sample exactly 1 texel over to the right. Or if your prefer, you can use the bracket operators to directly load a texel using SV_Position like this: return Texture0[uint2(svPosition + float2(1, 0))].

Now let's say that we're downsampling that 2x2 render target to 1x1. In that case, SV_Position will be (0.5, 0.5) for that one pixel. However the UV will be (0.5, 0.5), since the center of this pixel lies exactly in the center of your 4 vertices. In other words, the 1x1 target's single pixel center would be in the exact center of the above diagram. This is perfect for bilinear filtering, since it means that the filter kernel will sample all 4 texels and blend them all equally. However if you wanted to sample a single texel with UV's, you would need to offset by 1/4 of a pixel. So for example to sample the bottom right texel, you would want to do UV + (float2(0.25f, 0.25f) / renderTargetSize). Or again you could load using explicit coordinates by doing Texture0[uint2(svPosition * 2.0f + 1.0f)]

#5269998 Explicit suffix “.f” for floats in HLSL.

Posted by MJP on 08 January 2016 - 12:31 AM

It's the same as C++: the "f" suffix specifies that it's a floating point literal. If you leave it off you will get an integer, which means you will get an implicit conversion to float in cases like the second line of your code snippet. It's essentially the same as doing this:

OUT.position = mul( ModelViewProjection, float4( IN.position, (float)1 ) );
Using the "f" suffix is more explicit: it tells the compiler that you want your value to be a float, and therefore no implicit conversion is necessary. In most cases there will probably be no difference in the generated code, it's more a matter of style and personal preference. However you should be careful when mixing floats with doubles, since in some cases that will give you different results, and can potentially result in more expensive double-precision ops being used by the GPU.

#5269344 Does anyone use counter/Append/Consume buffers very much?

Posted by MJP on 05 January 2016 - 12:22 AM

The only catch with an append buffer is that you can only append one element at a time. This can be wasteful if a single thread decides to append multiple elements, since a lot of hardware implements an append buffer by performing atomic increments on a "hidden" counter variable. For such cases, you can potentially get better performance (as well as better data coherency) by performing a single atomic add in order to "reserve" multiple elements in the output buffer.

#5269091 Questions on Baked GI Spherical Harmonics

Posted by MJP on 03 January 2016 - 06:40 PM

I agree with Dave: you should remove the ambient term. To start out, I would try just rendering your cubemaps with nothing but direct lighting applied to your meshes. When you bake this and combine it with direct lighting at runtime, you'll effectively have 1 bounce GI. Since it's only 1 bounce you'll end up with shadowed areas that are too dark, but this is probably better than just introducing a flat ambient term. If you want to approximate having more bounces, a really easy way to that is to just repeat your baking step, but this time feeding in the results from your first bake pass. If you do this, then every pass gives you another bounce. As long as you have physically-plausible materials where they always absorb some energy (in other words, each component of the diffuse albedo is < 1), then every added bounce will have a smaller effect on the final result. This means that you can often get a decent approximation of GI with as few as 3 or 4 bounces.

#5269089 Particle Systems Full GPU

Posted by MJP on 03 January 2016 - 06:33 PM

The last game I shipped simulated all particles on the GPU. We still had our legacy codepath for CPU-simulated particles, but I don't think that we actually had any assets in the shipping game that used it. At least for us all of our use cases for particles were things that didn't require feedback from the simulation back to the CPU. Personally, if any such case came up I would be fine writing an optimized component/actor system to handle that one particular case instead of making our particle systems flexible enough to handle it. For the other 99% of cases, the GPU path is extremely fast, and simulation was essentially free as far as we were concerned (especially on PS4 where we could use async compute).

I would also disagree with the notion that "uploading" is a high cost of GPU-based particle systems. In our case, the amount of CPU->GPU data that occurred every frame was very small, and generally amounted to a few small constant buffers. Everything else was pre-packed into static buffers or textures kept in GPU memory.

#5268651 IASetIndexBuffer Error

Posted by MJP on 31 December 2015 - 05:48 PM

So it's saying the virtual address supplied by your index buffer view is outside the range of the resource to which that address belongs to. However it lists the address of the resource as 'nullptr', which is peculiar. Is there any chance that perhaps you're already destroyed the resource containing your index buffer?

#5268650 VertexBuffers and InputAssmbler unnecessary?

Posted by MJP on 31 December 2015 - 05:45 PM

I personally shipped a project that 100% used manual vertex fetch in the shader, although this was for a particular console that has a AMD GCN variant as its GPU. AMD GPUs have no fixed-function vertex fetch, and so they implement "input assembler" functionality by generating small preamble for the VS that they call a "fetch shader". They can basically generate this fetch shader for any given input layout, and the vertex shader uses it through something analogous to a function call for their particular ISA. When the fetch shader runs, it pulls the data out of your vertex buffer(s) using SV_VertexID and SV_InstanceID, and deposits them all in registers. The vertex shader then knows which registers contain the vertex data according to convention, and it can proceed accordingly. Because of this setup, the fetch shader can sometimes have suboptimal code generation compared to a vertex shader that performs manual vertex fetch. The fetch shader must ensure that all vertex data is deposited into registers up-front, and must ensure that the loads are completed before passing control back to the VS. However if the VS is fetching vertex data, then the vertex fetches can potentially be interleaved with other VS operations, and can potentially re-use registers whose contents are no longer needed.

Unfortunately I'm not sure if it's the same when going through DX11, since there are additional API layers in the way that might prevent optimal code-gen. I'm also not sure which hardware still has fixed-function vertex fetch, and what kind of performance delta you can expect.

#5268647 GS Output not being rasterized (Billboards)

Posted by MJP on 31 December 2015 - 04:57 PM

If I understand your code correctly, it looks like you're setting the output vertex position to have z = 0.0 and w = 0.0, which is invalid. Try setting to w to 1.0 instead.

#5268406 [D3D12] Driver level check to avoid duplicate function call?

Posted by MJP on 29 December 2015 - 05:23 PM

As far as I know there's no API-level guarantee that the implementation will filter out redundant calls for you. It's possible that the drivers will do it, but there's no way of knowing without asking them or profiling. Filtering yourself should be pretty easy and cheap, you can just cache the pointer to the PSO that's currently set for that command list and compare with it before setting a new one.

#5267077 OpenGL Projection Matrix Clarifications

Posted by MJP on 19 December 2015 - 05:40 PM

This image is from the presentation Projection Matrix Tricks by Eric Lengyel, and shows how normalized device coordinates work using OpenGL conventions:


As you can see, in OpenGL the entire visible depth range between the near clip plane and the far clip plane is mapped to [-1, 1] in normalized device coordates. So if a position has a Z value of 0 then it it's not actually located at the camera position, it's actually located somewhere between the near clip plane and the far clip plane (but not exactly halfway between, since the mapping is non-linear).

#5266801 Questions on Baked GI Spherical Harmonics

Posted by MJP on 17 December 2015 - 01:27 PM

Yes, you'll either need to use multiple textures or atlas them inside of 1 large 3D texture (separate textures is easier). It would be a lot easier if GPU's supported 3D texture arrays, but unfortunately they don't.

#5266102 D3D11 texture image data from memory

Posted by MJP on 13 December 2015 - 02:22 AM

If I read the PNG images with the winapi and not with stbi_load and 
then use D3DX11CreateShaderResourceViewFromMemory it should work ?

Yes. You can use OpenFile and ReadFile to load the contents of a file into memory, and then pass that to D3DX11CreateShaderResourceViewFromMemory.

I should point out that many games do not store their textures using image file formats such as JPEG and PNG. While these formats are good for reducing the size of the image on disk, they can be somewhat expensive to decode. They also don't let you pre-generate mipmaps or compress to GPU-readable block compression formats, which many games do in order to save performance and memory. As a result games will often use their own custom file format, or will use the DDS format. DDS can store compressed data with mipmaps, and it can also store texture arrays, cubemaps, and 3D textures.

#5266065 D3D11 texture image data from memory

Posted by MJP on 12 December 2015 - 03:23 PM

D3DX11CreateShaderResourceViewFromMemory expects that the data you give it is from an image file, such as JPEG, DDS, or PNG file. stbi_load parses an image file, and gives you back the raw pixel data that was decoded from the image file. To use that raw data to initialize a texture, you should call ID3D11Device::CreateTexture2D and pass the raw image data through a D3D11_SUBRESOURCE_DATA structure that's passed as the "pInitialData" parameter. For a 2D texture, you should set pSysMem to the image data pointer that you get back from stbi_load, and you should set SysMemPitch to the size of a pixel times the width of your texture. So in your case it looks like you're loading 8-bit RGBA data which is 4 bytes per pixel, so you should set it to "object.width * 4".

#5265393 MSAA and CheckFeatureSupport

Posted by MJP on 08 December 2015 - 12:23 AM

Perhaps back buffers don't support MSAA with D3D12? I wouldn't be surprised if this were the case, since D3D12 it's much more explicit in dealing with swap chains. MSAA swap chains have to have a "hidden resolve" performed on them, where the driver resolves the subsamples of your MSAA back buffer to create a non-MSAA back buffer than can be displayed on the screen. If I were you, I would just do this yourself by creating a MSAA render target and then resolving that to your non-MSAA back buffer using ResolveSubresource.

#5265391 [D3D12] Command Queue Fence Synchronization

Posted by MJP on 08 December 2015 - 12:16 AM

Conceptually, SetEventOnCompletion works like this:
HRESULT SetEventOnCompletion(UINT64 Value, HANDLE hEvent)
   // Start a background thread to check the fence value and trigger the event
   CreateThread(FenceThread, Value, hEvent);

void FenceThread(UINT64 Value, HANDLE hEvent)
    while(fenceValue < Value);  // Wait for the fence to be signaled
    SetEvent(hEvent);           // Trigger the event
So there's no danger of "missing" the event, as you're fearing.

EDIT: changed the code to be more clear about how the checking is done in the background