# DX12 Subpixel precision and integer coordinates

## Recommended Posts

When implementing subpixel precision in a software rasterizer I've found the following code:

It's just too bad the author doesn't explain anything to it. Can someone explain to me how shifting these integer variables gives us 4 bit subpixel precision a.k.a 16 more values of precision ? Not to mention this not working in my code :/

// 28.4 fixed-point coordinates
const int Y1 = iround(16.0f * v1.y);
const int Y2 = iround(16.0f * v2.y);
const int Y3 = iround(16.0f * v3.y);

const int X1 = iround(16.0f * v1.x);
const int X2 = iround(16.0f * v2.x);
const int X3 = iround(16.0f * v3.x);

// Fixed-point deltas
const int FDX12 = DX12 << 4;
const int FDX23 = DX23 << 4;
const int FDX31 = DX31 << 4;

const int FDY12 = DY12 << 4;
const int FDY23 = DY23 << 4;
const int FDY31 = DY31 << 4;

// Bounding rectangle
int minx = (min(X1, X2, X3) + 0xF) >> 4;
int maxx = (max(X1, X2, X3) + 0xF) >> 4;
int miny = (min(Y1, Y2, Y3) + 0xF) >> 4;
int maxy = (max(Y1, Y2, Y3) + 0xF) >> 4;

int CY1 = C1 + DX12 * (miny << 4) - DY12 * (minx << 4);
int CY2 = C2 + DX23 * (miny << 4) - DY23 * (minx << 4);
int CY3 = C3 + DX31 * (miny << 4) - DY31 * (minx << 4);

for(int y = miny; y < maxy; y++)
{
int CX1 = CY1;
int CX2 = CY2;
int CX3 = CY3;

for(int x = minx; x < maxx; x++)
{
if(CX1 > 0 && CX2 > 0 && CX3 > 0)
{
colorBuffer[x] = 0x00FFFFFF;
}

CX1 -= FDY12;
CX2 -= FDY23;
CX3 -= FDY31;
}

CY1 += FDX12;
CY2 += FDX23;
CY3 += FDX31;

}

Edited by lipsryme

##### Share on other sites

That code basically compacts to:

((x + 0xF) >> 4) << 4

...which is strange -- doing a shift down followed by a shift up immediately afterwards.

If I'm not mistaken, that's the same as:

((x + 0xF) & ~0xFU)

i.e. clear the lower 4 bits, while rounding upwards to the nearest multiple of 16.

Given this, and not much context, I would guess that the author is using a 28.4 fixed point format, and this is an implementation of the ceil function?

##### Share on other sites

That code basically compacts to:

((x + 0xF) >> 4) << 4

...which is strange -- doing a shift down followed by a shift up immediately afterwards.

If I'm not mistaken, that's the same as:

((x + 0xF) & ~0xFU)

i.e. clear the lower 4 bits, while rounding upwards to the nearest multiple of 16.

Given this, and not much context, I would guess that the author is using a 28.4 fixed point format, and this is an implementation of the ceil function?

Shifts preserve the sign bit if I'm not mistaken

EDIT: hmm, on second though, I didn't read your code correctly, but then again, shifts would be "safe" standard wise, as opposed to the bit masking on an integer.

Edited by Necrolis

##### Share on other sites

So I need to round upwards after multiplying by 16.0f ? Because I thought his iround would be the same as a cast to int.

update1: still using ceil does not help me on my results: http://d.pr/i/qZBx (screen). I basically copy+pasted his code with the same result.

update2: Let's see if I get this right...

Looking at the bitfield and consider as an example an 8bit (less zeros to write  ) integer with the value 3

3 = 0000 0011

now we multiply this by 16

3 * 16 (48) = 0011 0000

and shift it 4 to the right ?

48 >> 4 =  0000 0000

and then we do some math/comparison with it ?

and after that we shift it back to get our original value

48 << 4 (3) = 0000 0011

Is that correct ?

Edited by lipsryme

##### Share on other sites

I think the original author is Nick from devmaster.net, and here's the corresponding article: http://devmaster.net/posts/6145/advanced-rasterization

Maybe that helps in understanding it better!

##### Share on other sites

Just a word of caution about shift operators and signed integers: The result of shifting negative values is either implementation defined or undefined, depending on the direction of the shift and the exact standard [of C or C++] being followed. The way this code is written is problematic.

If the code can be rewritten using only unsigned integer types, that would be a good thing to do. Otherwise it's probably better to express it using division and multiplication (which the compiler will often turn into shifts for you). I guess if everything else fails, it can be implemented in assembly language.

Edited by Álvaro

##### Share on other sites

My problem is rather how does the above code (or similar) for getting subpixel precision work...?

Using the exact code from his topic for rasterization (the one above) results in artifacts like these: http://d.pr/i/TerH

Without the subpixel precision the result is flawless.

Edited by lipsryme

## Create an account

Register a new account

• ## Partner Spotlight

• ### Forum Statistics

• Total Topics
627657
• Total Posts
2978472
• ### Similar Content

• By Mr_Fox
Hi Guys,
Does anyone know how to grab a video frame on to DX texture easily just using Windows SDK? or just play video on DX texture easily without using 3rd party library?  I know during DX9 ages, there is a DirectShow library to use (though very hard to use). After a brief search, it seems most game dev settled down with Bink and leave all hobbyist dx programmer struggling....
Having so much fun play with Metal video playback (super easy setup just with AVKit, and you can grab movie frame to your metal texture), I feel there must be a similar easy path for video playback on dx12 but I failed to find it.
Maybe I missed something? Thanks in advance for anyone who could give me some path to follow
• By _void_
Hello guys,
I have a texture of format DXGI_FORMAT_B8G8R8A8_UNORM_SRGB.
Is there a way to create shader resource view for the texture so that I could read it as RGBA from the shader instead of reading it specifically as BGRA?
I would like all the textures to be read as RGBA.

Tx
• By _void_
Hello guys,
I am wondering why D3D12 resource size has type UINT64 while resource view size is limited to UINT32.
typedef struct D3D12_RESOURCE_DESC { … UINT64                   Width; … } D3D12_RESOURCE_DESC; Vertex buffer view can be described in UINT32 types.
typedef struct D3D12_VERTEX_BUFFER_VIEW { D3D12_GPU_VIRTUAL_ADDRESS BufferLocation; UINT                      SizeInBytes; UINT                      StrideInBytes; } D3D12_VERTEX_BUFFER_VIEW; For the buffer we can specify offset for the first element as UINT64 but the buffer view should still be defined in UINT32 terms.
typedef struct D3D12_BUFFER_SRV { UINT64                 FirstElement; UINT                   NumElements; UINT                   StructureByteStride; D3D12_BUFFER_SRV_FLAGS Flags; } D3D12_BUFFER_SRV; Does it really mean that we can create, for instance, structured buffer of floats having MAX_UNIT64 elements (MAX_UNIT64 * sizeof(float) in byte size) but are not be able to create shader resource view which will enclose it completely since we are limited by UINT range?
Is there a specific reason for this? HLSL is restricted to UINT32 values. Calling function GetDimensions() on the resource of UINT64 size will not be able to produce valid values. I guess, it could be one of the reasons.

Thanks!
• By pcmaster
Hello!
Is it possible to mix ranges of samplers and ranges of SRVs and ranges of UAVs in one root parameter descriptor table? Like so:
D3D12_DESCRIPTOR_RANGE ranges[3]; D3D12_ROOT_PARAMETER param; param.ParameterType = D3D12_ROOT_PARAMETER_TYPE_DESCRIPTOR_TABLE; param.DescriptorTable.NumDescriptorRanges = 3; param.DescriptorTable.pDescriptorRanges = ranges; range[0].RangeType = D3D12_DESCRIPTOR_RANGE_TYPE_SRV; .. range[1].RangeType = D3D12_DESCRIPTOR_RANGE_TYPE_UAV; .. range[2].RangeType = D3D12_DESCRIPTOR_RANGE_TYPE_SAMPLER; .. I wonder especially about CopyDescriptors, that will need to copy a range of D3D12_DESCRIPTOR_HEAP_TYPE_SAMPLER and a range of D3D12_DESCRIPTOR_HEAP_TYPE_CBV_SRV_UAV.
Thanks if anyone knows (while I try it :))
.P

• So I was reading the presentation Practical DirectX 12 - Programming Model and Hardware Capabilities again and finally decided to tackle proper command list submission.  Things mentioned in the document regarding this subject:
Aim for (per-frame): ● 15-30 Command Lists ● 5-10 ‘ExecuteCommandLists’ calls
Each ‘ ExecuteCommandLists’ has a fixed CPU overhead ● Underneath this call triggers a flush ● So batch up command lists
Try to put at least 200μs of GPU work in each ‘ExecuteCommandLists’, preferably 500μs
Small calls to ‘ExecuteCommandLists’ complete faster than the OS scheduler can submit new ones
OS takes ~60μs to schedule upcoming work
So basically I want to estimate how long my draw calls take.  Benchmarking for a particular piece of hardware seems impractical.  So given the stats primitive count, pixel count(approximately how many screen space pixels the call will be rendered to), and some precomputed metric associated with shader ALU complexity(like # of alu ops) do you think that I can get a reasonable estimation of how much time a draw call will take?
What do you do to take this into account?
What about other things like transitions?  I can only think of actual measurement in this case.

• 10
• 12
• 22
• 13
• 33