# DX12 Subpixel precision and integer coordinates

This topic is 1623 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

When implementing subpixel precision in a software rasterizer I've found the following code:

It's just too bad the author doesn't explain anything to it. Can someone explain to me how shifting these integer variables gives us 4 bit subpixel precision a.k.a 16 more values of precision ? Not to mention this not working in my code :/

// 28.4 fixed-point coordinates
const int Y1 = iround(16.0f * v1.y);
const int Y2 = iround(16.0f * v2.y);
const int Y3 = iround(16.0f * v3.y);

const int X1 = iround(16.0f * v1.x);
const int X2 = iround(16.0f * v2.x);
const int X3 = iround(16.0f * v3.x);

// Fixed-point deltas
const int FDX12 = DX12 << 4;
const int FDX23 = DX23 << 4;
const int FDX31 = DX31 << 4;

const int FDY12 = DY12 << 4;
const int FDY23 = DY23 << 4;
const int FDY31 = DY31 << 4;

// Bounding rectangle
int minx = (min(X1, X2, X3) + 0xF) >> 4;
int maxx = (max(X1, X2, X3) + 0xF) >> 4;
int miny = (min(Y1, Y2, Y3) + 0xF) >> 4;
int maxy = (max(Y1, Y2, Y3) + 0xF) >> 4;

int CY1 = C1 + DX12 * (miny << 4) - DY12 * (minx << 4);
int CY2 = C2 + DX23 * (miny << 4) - DY23 * (minx << 4);
int CY3 = C3 + DX31 * (miny << 4) - DY31 * (minx << 4);

for(int y = miny; y < maxy; y++)
{
int CX1 = CY1;
int CX2 = CY2;
int CX3 = CY3;

for(int x = minx; x < maxx; x++)
{
if(CX1 > 0 && CX2 > 0 && CX3 > 0)
{
colorBuffer[x] = 0x00FFFFFF;
}

CX1 -= FDY12;
CX2 -= FDY23;
CX3 -= FDY31;
}

CY1 += FDX12;
CY2 += FDX23;
CY3 += FDX31;

}

Edited by lipsryme

##### Share on other sites

That code basically compacts to:

((x + 0xF) >> 4) << 4

...which is strange -- doing a shift down followed by a shift up immediately afterwards.

If I'm not mistaken, that's the same as:

((x + 0xF) & ~0xFU)

i.e. clear the lower 4 bits, while rounding upwards to the nearest multiple of 16.

Given this, and not much context, I would guess that the author is using a 28.4 fixed point format, and this is an implementation of the ceil function?

##### Share on other sites

That code basically compacts to:

((x + 0xF) >> 4) << 4

...which is strange -- doing a shift down followed by a shift up immediately afterwards.

If I'm not mistaken, that's the same as:

((x + 0xF) & ~0xFU)

i.e. clear the lower 4 bits, while rounding upwards to the nearest multiple of 16.

Given this, and not much context, I would guess that the author is using a 28.4 fixed point format, and this is an implementation of the ceil function?

Shifts preserve the sign bit if I'm not mistaken

EDIT: hmm, on second though, I didn't read your code correctly, but then again, shifts would be "safe" standard wise, as opposed to the bit masking on an integer.

Edited by Necrolis

##### Share on other sites

So I need to round upwards after multiplying by 16.0f ? Because I thought his iround would be the same as a cast to int.

update1: still using ceil does not help me on my results: http://d.pr/i/qZBx (screen). I basically copy+pasted his code with the same result.

update2: Let's see if I get this right...

Looking at the bitfield and consider as an example an 8bit (less zeros to write  ) integer with the value 3

3 = 0000 0011

now we multiply this by 16

3 * 16 (48) = 0011 0000

and shift it 4 to the right ?

48 >> 4 =  0000 0000

and then we do some math/comparison with it ?

and after that we shift it back to get our original value

48 << 4 (3) = 0000 0011

Is that correct ?

Edited by lipsryme

##### Share on other sites

I think the original author is Nick from devmaster.net, and here's the corresponding article: http://devmaster.net/posts/6145/advanced-rasterization

Maybe that helps in understanding it better!

##### Share on other sites

Just a word of caution about shift operators and signed integers: The result of shifting negative values is either implementation defined or undefined, depending on the direction of the shift and the exact standard [of C or C++] being followed. The way this code is written is problematic.

If the code can be rewritten using only unsigned integer types, that would be a good thing to do. Otherwise it's probably better to express it using division and multiplication (which the compiler will often turn into shifts for you). I guess if everything else fails, it can be implemented in assembly language.

Edited by Álvaro

##### Share on other sites

My problem is rather how does the above code (or similar) for getting subpixel precision work...?

Using the exact code from his topic for rasterization (the one above) results in artifacts like these: http://d.pr/i/TerH

Without the subpixel precision the result is flawless.

Edited by lipsryme

• 10
• 17
• 9
• 14
• 41
• ### Similar Content

• Hi.
I wanted to experiment D3D12 development and decided to run some tutorials: Microsoft DirectX-Graphics-Samples, Braynzar Soft, 3dgep...Whatever sample I run, I've got the same crash.
All the initialization process is going well, no error, return codes ok, but as soon as the Present method is invoked on the swap chain, I'm encountering a crash with the following call stack:
The crash is an access violation to a null pointer ( with an offset of 0x80 )
I'm working on a notebook, a toshiba Qosmio x870 with two gpu's: an integrated Intel HD 4000 and a dedicated NVIDIA GTX 670M ( Fermi based ). The HD 4000 is DX11 only and as far as I understand the GTX 670M is DX12 with a feature level 11_0.
I checked that the good adapter was chosen by the sample, and when the D3D12 device is asked in the sample with a 11_0 FL, it is created with no problem. Same for all the required interfaces ( swap chain, command queue...).
I tried a lot of things to solve the problem or get some info, like forcing the notebook to always use the NVIDIA gpu, disabling the debug layer, asking for a different feature level ( by the way 11_0 is the only one that allows me to create the device, any other FL will fail at device creation )...
I have the latest NVIDIA drivers ( 391.35 ), the latest Windows 10 sdk ( 10.0.17134.0 ) and I'm working under
Visual Studio 2017 Community.
Thanks to anybody who can help me find the problem...
• By _void_
Hi guys!
In a lot of samples found in the internet, people when initialize D3D12_SHADER_RESOURCE_VIEW_DESC with resource array size 1 would normallay set its dimension as Texture2D. If the array size is greater than 1, then they would use dimension as Texture2DArray, for an example.
If I declare in the shader SRV as Texture2DArray but create SRV as Texture2D (array has only 1 texture) following the same principle as above, would this be OK? I guess, this should work as long as I am using array index 0 to access my texture?
Thanks!
• By _void_
Hey!

What is the recommended upper count for commands to record in the command list bundle?
According to MSDN it is supposed to be a small number but do not elaborate on the actual number.
I am thinking if I should pre-record commands in the command buffer and use ExecuteIndirect or maybe bundles instead.
The number of commands to record in my case could vary greatly.

Thanks!

• While working on a project using D3D12 I was getting an exception being thrown while trying to get a D3D12_CPU_DESCRIPTOR_HANDLE. The project is using plain C so it uses the COBJMACROS. The following application replicates the problem happening in the project.
#define COBJMACROS #pragma warning(push, 3) #include <Windows.h> #include <d3d12.h> #include <dxgi1_4.h> #pragma warning(pop) IDXGIFactory4 *factory; ID3D12Device *device; ID3D12DescriptorHeap *rtv_heap; int WINAPI wWinMain(HINSTANCE hinst, HINSTANCE pinst, PWSTR cline, int cshow) { (hinst), (pinst), (cline), (cshow); HRESULT hr = CreateDXGIFactory1(&IID_IDXGIFactory4, (void **)&factory); hr = D3D12CreateDevice(0, D3D_FEATURE_LEVEL_11_0, &IID_ID3D12Device, &device); D3D12_DESCRIPTOR_HEAP_DESC desc; desc.NumDescriptors = 1; desc.Type = D3D12_DESCRIPTOR_HEAP_TYPE_RTV; desc.Flags = D3D12_DESCRIPTOR_HEAP_FLAG_NONE; desc.NodeMask = 0; hr = ID3D12Device_CreateDescriptorHeap(device, &desc, &IID_ID3D12DescriptorHeap, (void **)&rtv_heap); D3D12_CPU_DESCRIPTOR_HANDLE rtv = ID3D12DescriptorHeap_GetCPUDescriptorHandleForHeapStart(rtv_heap); (rtv); } The call to ID3D12DescriptorHeap_GetCPUDescriptorHandleForHeapStart throws an exception. Stepping into the disassembly for ID3D12DescriptorHeap_GetCPUDescriptorHandleForHeapStart show that the error occurs on the instruction
mov  qword ptr [rdx],rax
which seems odd since rdx doesn't appear to be used. Any help would be greatly appreciated. Thank you.

• By lubbe75
As far as I understand there is no real random or noise function in HLSL.
I have a big water polygon, and I'd like to fake water wave normals in my pixel shader. I know it's not efficient and the standard way is really to use a pre-calculated noise texture, but anyway...
Does anyone have any quick and dirty HLSL shader code that fakes water normals, and that doesn't look too repetitious?

• Hi,
I finally managed to get the DX11 emulating Vulkan device working but everything is flipped vertically now because Vulkan has a different clipping space. What are the best practices out there to keep these implementation consistent? I tried using a vertically flipped viewport, and while it works on Nvidia 1050, the Vulkan debug layer is throwing error messages that this is not supported in the spec so it might not work on others. There is also the possibility to flip the clip scpace position Y coordinate before writing out with vertex shader, but that requires changing and recompiling every shader. I could also bake it into the camera projection matrices, though I want to avoid that because then I need to track down for the whole engine where I upload matrices... Any chance of an easy extension or something? If not, I will probably go with changing the vertex shaders.
• By NikiTo
Some people say "discard" has not a positive effect on optimization. Other people say it will at least spare the fetches of textures.

if (color.A < 0.1f) { //discard; clip(-1); } // tons of reads of textures following here // and loops too
Some people say that "discard" will only mask out the output of the pixel shader, while still evaluates all the statements after the "discard" instruction.

MSN>
discard: Do not output the result of the current pixel.
<MSN

As usual it is unclear, but it suggests that "clip" could discard the whole pixel(maybe stopping execution too)

I think, that at least, because of termal and energy consuming reasons, GPU should not evaluate the statements after "discard", but some people on internet say that GPU computes the statements anyways. What I am more worried about, are the texture fetches after discard/clip.

(what if after discard, I have an expensive branch decision that makes the approved cheap branch neighbor pixels stall for nothing? this is crazy)
• By NikiTo
I have a problem. My shaders are huge, in the meaning that they have lot of code inside. Many of my pixels should be completely discarded. I could use in the very beginning of the shader a comparison and discard, But as far as I understand, discard statement does not save workload at all, as it has to stale until the long huge neighbor shaders complete.
Initially I wanted to use stencil to discard pixels before the execution flow enters the shader. Even before the GPU distributes/allocates resources for this shader, avoiding stale of pixel shaders execution flow, because initially I assumed that Depth/Stencil discards pixels before the pixel shader, but I see now that it happens inside the very last Output Merger state. It seems extremely inefficient to render that way a little mirror in a scene with big viewport. Why they've put the stencil test in the output merger anyway? Handling of Stencil is so limited compared to other resources. Does people use Stencil functionality at all for games, or they prefer discard/clip?

Will GPU stale the pixel if I issue a discard in the very beginning of the pixel shader, or GPU will already start using the freed up resources to render another pixel?!?!

• By Axiverse
I'm wondering when upload buffers are copied into the GPU. Basically I want to pool buffers and want to know when I can reuse and write new data into the buffers.
• By NikiTo
AMD forces me to use MipLevels in order to can read from a heap previously used as RTV. Intel's integrated GPU works fine with MipLevels = 1 inside the D3D12_RESOURCE_DESC. For AMD I have to set it to 0(or 2). MSDN says 0 means max levels. With MipLevels = 1, AMD is rendering fine to the RTV, but reading from the RTV it shows the image reordered.

Is setting MipLevels to something other than 1 going to cost me too much memory or execution time during rendering to RTVs, because I really don't need mipmaps at all(not for the 99% of my app)?

(I use the same 2D D3D12_RESOURCE_DESC for both the SRV and RTV sharing the same heap. Using 1 for MipLevels in that D3D12_RESOURCE_DESC gives me results like in the photos attached below. Using 0 or 2 makes AMD read fine from the RTV. I wish I could sort this somehow, but in the last two days I've tried almost anything to sort this problem, and this is the only way it works on my machine.)

• ### Forum Statistics

• Total Topics
631073
• Total Posts
2997754
×