Matias Goldberg

  • Content count

  • Joined

  • Last visited

Community Reputation

9599 Excellent

1 Follower

About Matias Goldberg

  • Rank

Personal Information


  • Twitter
  1. Yes. Everyone has cleared up that his is a HW limitation. But I don't think nobody has hinted the obvious: You can create more than one view. The most common scenario is for manually managing memory: creating a large pool, and then having different meshes / const buffers / structured buffers living as views to subregions of it. You just can't access all of it all at once. Though for example if you have a 6GB buffer, you could create 3 views of 2GBs each and bind them all 3 to the same shader.
  2. You need to use R16G16B16A16_SNORM. SINT is when you use the raw signed integer values, and you must declare your variable as int4. The values will be in range [-32768;32767] since they're integers. SNORM is when the integers are mapped from range [-32768;32767] to the range [-1.0;1.0] and your variable must be declared as float4.
  3. ACES and Uncharted Inverse Tone Mapping

    Eye adaptation happens after step 3. Steps 1-3 is not about tonemap correctness, it's about correct AA. After step 3, you have antialiased colour data you can tonemap and eye adapt as you like.
  4. ACES and Uncharted Inverse Tone Mapping

    You cheat: Apply a trivial reversible tonemap operator before AA Resolve AA Apply the reverse of the tonemap operator in step 1 Now apply the tonemap you wanted (Uncharted, ACES, whatever).
  5. DX11 Binding buffers and then updating them

    The example you posted is fine. What I meant is that you cannot do the following: //Draw a Cube graphicsDevice->deviceContext->Draw(cube.vertexCount, 0); UpdateBufferWithCubeData(); //Update the cube that will be used in the draw above^ This is not valid in D3D11, but it is possible (with certain care taken) in D3D12 and Vulkan. No, I meant what is explained here and here. Basically the following is preferred: //Draw a Cube void *data = constBuffer->Map( DISCARD ); memcpy( data, ... ); bindVertexBuffer( constBuffer ); graphicsDevice->deviceContext->Draw(cube.vertexCount, 0); //Draw a Sphere data = constBuffer->Map( DISCARD ); memcpy( data, ... ); graphicsDevice->deviceContext->Draw(cube.vertexCount, 0); over the following: //Draw a Cube void *data = constBuffer0->Map( DISCARD ); memcpy( data, ... ); bindVertexBuffer( constBuffer0 ); graphicsDevice->deviceContext->Draw(cube.vertexCount, 0); //Draw a Sphere data = constBuffer1->Map( DISCARD ); //Notice it's constBuffer1, not constBuffer0 memcpy( data, ... ); bindVertexBuffer( constBuffer1 ); graphicsDevice->deviceContext->Draw(cube.vertexCount, 0); This difference makes sense if we're talking about lots of const buffer DISCARDS per frame (e.g. 20k const buffer discards per frame). It doesn't make a difference if you have like 20 const buffer discards per frame. Btw I personally never have 20k const buffer discards, as I prefer to keep large data (such as world matrices) in texture buffers. This pattern is used with D3D11_USAGE_DYNAMIC buffers. These buffers are visible to both CPU and GPU. This means that actual memory is either stored in GPU RAM and your writes from CPU go directly through the PCIE bus, or that the buffer is stored in CPU RAM and GPU reads fetch directly via the PCIE bus. Whether is one or the other is controlled by the driver, though probably D3D11_CPU_ACCESS_READ and D3D11_CPU_ACCESS_WRITE provide good hints (a buffer that needs read access will likely end up CPU side, a buffer that has no read access will likely end up GPU side, but this is not a guarantee!). What you're saying about an intermediate place, must be done by hand via staging buffers. Create the buffer with D3D11_USAGE_STAGING instead of DYNAMIC. Staging buffers are visible to both CPU and GPU, but the GPU can only use them in copy operations. The idea is that you copy to the staging area from CPU, and then you copy from staging area to the final GPU RAM that is only visible to the GPU (i.e. the final buffer was created with D3D11_USAGE_DEFAULT). Or vice versa as well (copy from GPU to staging area, then read from CPU). There's a gotcha: with staging buffers you can't use D3D11_MAP_WRITE_NO_OVERWRITE nor D3D11_MAP_WRITE_DISCARD. But you have the D3D11_MAP_FLAG_DO_NOT_WAIT flag. If you get a DXGI_ERROR_WAS_STILL_DRAWING when you tried to map the staging buffer with this flag, then the GPU is not done yet copying from/to the staging buffer and you must use another one (i.e. create a new one, or reuse an old one from a pool). What's the difference between STAGING and DYNAMIC approaches? The PCIE has lower bandwidth than GPU's dedicated memory (and probably higher latency). If you write from CPU once, and GPU reads that data once, then use DYNAMIC. But if the data will be read by the GPU over and over again, you may end up fetching the data multiple times from CPU RAM through the PCIE; therefore use the STAGING approach to perform the transfer through the PCIE once, and then the data is kept in the fastest RAM available. This advice holds for dedicated GPUs. Integrated GPUs using staging aggressively may hurt since there is no PCIE, you'll just be burning CPU RAM bandwidth doing useless copies. And for reading GPU -> CPU, you have no choice but to use staging. So it's a good idea to write a system that can switch between strategies based on what's faster depending on each system.
  6. DX11 Binding buffers and then updating them

    You can do that. What you cannot do is to issue Draw commands (or compute dispatches) and update the buffers later; which is something you could do with D3D12 as long as the command buffer hasn't been submitted. As for performance, if you use D3D11_MAP_WRITE_NO_OVERWRITE and then issue one D3D11_MAP_WRITE_DISCARD when bigBufferIsNotFull is false (do not forget to reset this bool! the pseudo code you posted doesn't reset it!) you'll be fine. Also allocating everything dynamic in one big pool is fine. Just a few caveats to be aware: Texture buffers you cannot use D3D11_MAP_WRITE_NO_OVERWRITE unless you're on D3D11.1 on Windows 8 or higher. You always have to issue D3D11_MAP_WRITE_DISCARD. Discarding more than 4MB per frame overall will cause stalls on AMD drivers. And while NVIDIA drivers can handle more than 4MB, it will likely break in really bad ways (I've seen HW bugs to pop up) In Ogre3D 2.1 we do the following on D3D11 systems (i.e. not D3D11.1 and Win 8): Dynamic vertex & index buffers in one big dynamic pool with the no_overwrite / then discard pattern. Dynamic const buffers separately; one API const buffer per "buffer" as in our representations. Though the ideal with D3D11 is to reuse the same const buffer over and over again using MAP DISCARD. We do not use many const buffers though. Dynamic texture buffers also separately, one API tex buffer per "buffer" as in our representations.
  7. DX11 24bit depthbuffer is a sub-optimal format?

    No. Depth buffers are in range [0; 1]. D16_UNORM supports 65535 different values within that range (with uniform distribution) R16_HALF has to distribute those 65535 variations between the range [-65000; 65000] approximately (see Wikipedia on how 16-bit float precision distribution works). You lose A LOT of precision by moving to R16_HALF. D32_FLOAT vs D24_UNORM is different because the calculations are natively in 32-bit floats, and it's 24-bits i.e. 16 million different values in range [0; 1] vs 32-bit i.e. 4 billion different values in range [-3.402823 × 10^38; 3.402823 × 10^38] approximately with most of the precision between [-1: 0] and [0; 1]. And no, it's not better to use [-1; 1] like OpenGL does instead of [1; 0] (reversed far/near as in Direct3D) because although there's more available numbers in the range [-1; 1]; the reversed far-near trick as already explained exploits a trick in how floating point behaves from 0 through 1 to counter what the projection math does. When using the range [-1; 1] this trick no longer works. There is no 32-bit unorm format.
  8. Android The main game loop FPS independent

    As masskonfuzion says, at frame N there will be three states: The Physics state at frame N-1; that is the state of the previous frame (state in this case means position, orientation and scale of all objects; and maybe something else if needed, like velocity) The Physics state at frame N; that is the state of the current frame The graphics state, which is an interpolation somewhere between N-1 and N. Errr... really the main motivation for interpolating is to detach physics framerate from graphics framerate. So your physics may be updated at 30hz and your graphics rendered at 60hz; or physics at 120hz and graphics at 60hz. Another big reason is to combine variable frame rate for graphics (which is good for smoothness and performance) and fixed frame for physics (which is good for simulation stability and required for achieving determinism). Without interpolation, you either update at fixed framerate, or at variable framerate. Interpolation gives you both. You can even go beyond what gafferongames' teaches and put graphics in a separate thread; allowing to do graphics in a different CPU core, but this is a more advanced topic.
  9. Material parameters in PBR pipeline

    Yes. This was discussed (but for some reason the blogpost was removed, likely in server migration) in Filmic Worlds' website. Fortunately Web Archive remembers. Also twitter discussion:
  10. Material parameters in PBR pipeline

    Just wanted to say, they're not. While they make similar results, coloured fresnel / IOR tends to lack colour at the borders, unlike specular colour. It's a subtle difference.
  11. I was hoping I wouldn't have to ride the Linux horse, but...

    Pardon me for being rude, but you don't need to be a Linux expert to understand that a file with that name does not exist. From the link you posted: Did you you download the psp tool chain? Did you place it in C:\cygwin\home\Brian ? And more importantly, is that file named exactly psptoolchain-20050625.tgz ? (Beware of case sensitiviness psptoolchain-20050625.tgz is not the same as PsPtoolchain-20050625.tgz) You got the name wrong or didn't download the file into that folder. Cheers
  12. Vulkan NonUniformResourceIndex in Vulkan glsl

    Vulkan doesn't have this intrinsic. You'll have to do by hand what the compiler does for you when shaderSampledImageArrayDynamicIndexing is set to False: uint NonUniformResourceIndex( uint textureIdx ) { while (true) { uint currentIdx = readFirstInvocationARB(textureIdx); if (currentIdx == textureIdx) return currentIdx; } return 0; //Execution should never land here } Cheers Matias
  13. DX11 Constant buffer and names?

    Since I think that sounds confusing to a beginner, I'll translate it to plain english: modern GPUs no longer work like that (they don't need such crazy alignments... for the most part, there a few exceptions not worth mentioning right now) but we're stuck with these overly conservative alignments.
  14. DX11 Constant buffer and names?

    No. That declaration is just fine. What the alignment means is that if you've got: float3 a; float2 b; Then the address of b when you write the data from C++ starts at 0x0000010 instead of starting at 0x0000000C because there's 4 bytes of padding between a & b Please read the msdn article BrentMorris left you. It has plenty of examples on how the padding works.
  15. DX11 Constant buffer and names?

    Normally we explicitly define the register slots. So for const buffers you would do: cbuffer MyBuffer0 : register(b0) { // Declarations.. }; cbuffer MyBuffer1 : register(b1) { // Declarations.. }; cbuffer MyBuffer2 : register(b2) { // Declarations.. }; If you do not explicitly tell the register slots, the compiler will assign them for you and you have to retrieve them via HLSL reflection (which is cumbersome and error prone). When you call VSSetConstantBuffers( 0, ... ) the 0 will correspond to MyBuffer0, and VSSetConstantBuffers( 1, .. ) will correspond to MyBuffer1, etc. In the case of your buffer: cbuffer MatrixBuffer { matrix worldMatrix; matrix viewMatrix; matrix projectionMatrix; }; If the buffer you bind via VSSetConstantBuffers is less than the 192 bytes required for this structure (4x4 x 4 bytes per float x 3 matrices) the debug layer will complain, but you are guaranteed that reading const buffers out of bounds will return 0.