# Obliique

Member

26

## Posts posted by Obliique

1. ### Why is the benefit of using XMVectorGetX(XMVector3Dot(Origin, Look)) than just calling XMVectorGetX(Origin)

20 hours ago, ritzmax72 said:

What is the benefit of calling GetX every time on appropriate DotProduct-ed vector instead of getting x,y,z coordinates of Origin like below?﻿

The original form of your camera matrix in world space would be a result of a series of transformations, typically a Rotation R followed by Translation T .  so to get world space matrix, we'd need to do W = RT . But what we want is the inverse of this transform so that every object is multiplied by it  which allows us to use the camera as reference coordinate system making every object coordinates relative to camera space. to compute inverse, We can simply go ahead and compute the inverse the usual way but this won't be a good idea as it is very costly. What you'd probably want to do is use a computation that is cheaper. The easier way to go around this is decomposing R and T from the world matrix and computing inverse on R and T individually using a cheaper method. For the rotation R, We know the camera basis vectors are orthonormal, this allows us to get inverse by simply transposing the camera basis vectors so that we have the form RT  which gives us:

RT    =     |   Ux      Vx       Wx       0 |

|   Uy      Vy       Wy       0 |

|   Uz      Vz       Wz       0 |

|   0        0         0         1 |

where U, V and W are transposed camera basis vectors derived from the original world matrix:

R    =     |   Ux      Uy      Uz       0 |

|   Vx      Vy       Vz       0 |

|   Wx      Wy       Wz    0 |

|   0        0         0        1 |

To get the inverse of T which is a translation, we need to negate the translation potion so that we have the form T-1 :

T-1 =         |   1      0        0        0  |

|   0      1        0        0  |

|   0       0       1        0  |

|   -Tx    -T     -Tz    1   |

derived from T:

T =         |   1      0      0       0  |

|   0      1       0      0  |

|   0       0       1      0  |

|   Tx    T     Tz     1   |

Since we have computed the inverses the easy way/ we can multiply T-1RT  to give us view space. note that when you multiply this . you end up with the scenario you just stated to get our forth row. that is when you are doing matrix multiplication in the forth row of  T-1  by  RT you are simply doing a dot product of the forth row with the basis From transposed rotation matrix.

the result view camera matrix should be:

T-1RT  =  |   Ux             Vx             Wx             0 |

|   Uy             Vy              Wy            0 |

|   Uz             Vz              Wz            0 |

|   -Tdot U     -TdotV       -Tdot W     1  |

2. ### SwapChain in DX12

Thank you so much for taking the time to respond . This has cleared things up for me . I read carefully and will consider these options when I refactor my code.

3. ### SwapChain in DX12

Hi. I have been programming dx12 for nearly 6 months now and I think I still have a misuderstanding on swapchain flags and how they affect presentation. please correct me if I am wrong. My understanding is the following:

- DX12 only supports two swap effect flags with the flip model. ie DXGI_SWAP_EFFECT_FLIP_DISCARD and DXGI_SWAP_EFFECT_FLIP_SEQUENTIAL . My understanding is that both these flags don't need a redirection surface hence the contents of backbuffers are displayed to the screen directly from app. The DXGI_SWAP_EFFECT_FLIP_DISCARD flag allows for an option were if the presentation queue is full and the call to IDXGISwapChain::Present() is made, whatever is at the end of this queue is discarded without ever making it to the screen, is this correct? The DXGI_SWAP_EFFECT_FLIP_SEQUENTIAL inserts the frame to be presented at the end of the queue. does this mean that the queue can only contain one buffer at a time?

- Both DXGI_SWAP_EFFECT_FLIP_DISCARD and DXGI_SWAP_EFFECT_FLIP_SEQUENTIAL dont support multisampling. So i have had to set my sample count to 1 and sample quality to 0 for my swapchain desc structure. My question is how would we add support for multisampling like 4xMSAA if these are the only flags supported in dx12. I have seen some usages were these were set to sample count > 1 and quality level queried which leaves me confused. I still have'nt tested multisampling as I don't use them in my experimental engine.

- The waitable Swapchain options blocks present thread from the calling application until the specified time to wait on elapses. But why would we explicilly specify wait time on swapchain?

- tearing support is included by the GPU vendor. So this allows for options like freesync and gsync to be utilized. I am using an intel gpu and I don't really know how to test this.

- It isn't a requirement for apps to toggle vsync-on on a windowed app. This is also confusing. Won't screen tearing happen anyway if my app is not synchronized with my next screen vertical blank?

4. ### Memory Leak Detection and DX12

I am not sure if that carries over to DirectX since it's built around COM . The best I can think of as mentioned already is enabling the debug layer and making a call to ReportLiveObjects() after you've released all dx objects to see if you have live objects .

5. ### DXR software impl?

Hi all,

I was wondering if DXR has a software implementation like WARP . I ask this because I am not using NV hardware. Is it possible to get an app running using DXR on hardware that isn't nvidia on microsoft's software drivers?

6. ### Fewer vertices mapped to more vertex normals?

7 minutes ago, pcmaster said:

You'd issue a Draw(6 * 2 * 3) /* 6 sides, 2 triangles each, 3 vertices each */ and connect a general buffer with a Wavefront-Obj-like independent indices to vertices, indices to normals and/or indices to UVs. You'd index it by SV_VertexID running from 0 to 35 (you've got nothing else in your vertex shader), so in the end you could have just 8 vertices, 6 normals and 8 uv:s, for example. You wouldn't have any VB or IB, just general GPU-readable buffers.

I think I will read this properly when I settle down 😄 as it's slightly overwhelming.

8 minutes ago, pcmaster said:

Do you want to process your meshes yourself?

I am actually trying read Obj files into my application. And one file I exported has more vertex normals than there are vertices which was surprising to me because I have yet to know how these are grouped together. I don't have knowledge of smoothing groups but I have used something similar when I used to use 3ds max 🙂

6 minutes ago, pcmaster said:

There's been a discussion about something similar in this very forum one or two weeks ago, read through it.

I will look through this, thanks!

7. ### Fewer vertices mapped to more vertex normals?

Thanks for the helpful response again pcmaster.  Would you know a better way on how to resolve complicated meshes which would have both smooth and sharp edges? Would I need to do away with indices to be safe? Or should I detect the angles somehow and somehow use index buffer?

8. ### Fewer vertices mapped to more vertex normals?

Hi,

I am wondering.. If I have something like a cube which has 8 vertices which are referenced through an index buffer, Is there a way I would go about assigning unique vertex normals to each vertex which I figure are 24... From my current knowledge I think I would need about 24 normals assigning 4 identical normals to each face for the lighting to work correctly , for this to work I would need 24 vertices which eliminates the need for an index buffer. I figured vertex averging was working wrongly here because of very sharp edges.  Is it possible to still use normals on cube geometry while using an index buffer such that my vertex count remains 8 or the only way this goes is by using a non indexed geometry with just regular DrawInstanced (dx12) ?

9. ### opinion on size of constant ring buffer

Hi,  So I am curious on what other peole would allocate normally for constant buffer managed by a ring buffer. I have assumed a case for my maximum allowable constants and I found myself needing upto 1GB for a constant buffer with tripple buffering, this assumption says I can approximately draw close to upto a million objects. What are some of your maximum sizes or do you generaly  use an upload heap to also support stuff like VBs and uploadable buffers for all scenarios? 🙂

10. ### ref count SwapChain->GetBuffer()

Does IDXGISwapChain::GetBuffer() in DX12 increase the COM reference count to the buffer being obtained from this method? I am getting weird Live object reports in my code . I call ReportLiveObjects() at the app shut down stage just after destroying the virtual adapter and I'm getting ref count 3 on ID3D12Device . I tried to do an extra reset(I use ComPtr on frame buffers) on the buffers obtained to be sure but the debugger complains about underflow ref count -1 ? Thanks 🙂

11. ### ImGui not rendering to back buffer

Solved. I was using 2 separate descriptor heaps to pass CPU and GPU descriptor handles.

12. ### ImGui not rendering to back buffer

Hi, Need help setting up ImGui.. I am trying to render UI using the ImGui framework on dx12. I followed the ImGui example project for dx12 but so far i've had no luck even after carefully looking through my code. The dubug output shows no errors either. I'm calling the ImGui functions in a separate class with static methods as indicated below:

void GUI::Initialize(HWND hwnd, ID3D12Device* device, D3D12_CPU_DESCRIPTOR_HANDLE srvCpuHandle, int num_frames_in_flight,
DXGI_FORMAT rendertargetformart) {

D3D12_DESCRIPTOR_HEAP_DESC fontHeapDesc{};
fontHeapDesc.NumDescriptors 		= 1;
fontHeapDesc.Type			= D3D12_DESCRIPTOR_HEAP_TYPE_CBV_SRV_UAV;

D3D12_GPU_DESCRIPTOR_HANDLE fonthandle = sm_FontHeap->GetGPUDescriptorHandleForHeapStart() ;

IMGUI_CHECKVERSION();
ImGui::CreateContext();
ImGuiIO& io = ImGui::GetIO();
(void)io;

ImGui_ImplWin32_Init(hwnd);
ImGui_ImplDX12_Init(device, num_frames_in_flight, rendertargetformart, srvCpuHandle, fonthandle);
ImGui::StyleColorsDark();
}

void GUI::Update() {

ImGui_ImplDX12_NewFrame();
ImGui_ImplWin32_NewFrame();
ImGui::NewFrame();

{
ImGui::Begin("Some Window");
ImGui::Text("Random text here");
ImGui::Button("Button");
ImGui::End();
}

}

void GUI::RenderOverlay(ID3D12GraphicsCommandList* cmdlist) {

ImGui::Render();
ImGui_ImplDX12_RenderDrawData(ImGui::GetDrawData(), cmdlist);
}

void GUI::Shutdown() {
ImGui_ImplDX12_Shutdown();
ImGui_ImplWin32_Shutdown();
ImGui::DestroyContext();
}

Microsoft::WRL::ComPtr<ID3D12DescriptorHeap> GUI::sm_FontHeap = nullptr;

And then I call these methods in the graphics class:

	gpuContext->TransitionResource(currbackbuffer, D3D12_RESOURCE_STATE_RENDER_TARGET);
//gpuContext->SetViewport();
//gpuContext->SetScissorRect();
gpuContext->ClearRenderTarget(currbackbuffer);
gpuContext->ClearDepthStencil(dephbuffer);
gpuContext->SetRenderTargets(currbackbuffer, dephbuffer);
//Render GUI
GUI::RenderOverlay(gpuContext->GetCommandList());

gpuContext->TransitionResource(currbackbuffer, D3D12_RESOURCE_STATE_PRESENT);
gpuContext->ExecuteCommands();

GraphicsRoot::Present();

uint64_t frameFenceVal = gpuContext->Finish();

Any help on this will be appreciated

13. ### D3D12 Unnamed Command Queue being final-released while still in use by the GPU

I think if you use DXGI_SWAP_EFFECT_FLIP_DISCARD in swap chain, then you need to signal after present because it doesnt block cpu thread . I m guessing the other options allow blocking and a signal after present wouldnt be a requirement?

14. ### D3D12 Unnamed Command Queue being final-released while still in use by the GPU

1 hour ago, SoldierOfLight said:

Possibly. There is work submitted to the queue (which you pass to creation of a swapchain) by the Present API. Are you signaling after that or before?

I've taken note of this. Alwways signaled before present. Seems like it's safer to signal after present in the event I want to prepare for the next frame? 🙂

15. ### Texture repeat question

Hi,

I think you'd have to find a way to modify your domain range specified by ADDRESS_MODE_WRAP which wraps at interger junctions which I bet is not doable AFAIK.

I think the best approach is to introdude 10 quads, each quad's vertex texture coords would have to address the same portion of the texture in UV coordinates. 🙂

16. ### DX11-HLSL Directional light calculation ?

Hi,

There are many ways to do directional light calculation based on your requirements.

one way to do this is by defining a light struct in your HLSL code containing a light vector member which holds your light direction , a light intensity vector and ambient light  vector which roughly approximates your total ambient light since you are not calculating light bounces:

struct Directional_light
{
float3 Direction;
float3 Intensity;
float3 Ambient;
};

CBs in dx11 are 16 byte aligned and this is were you are going to access the light object that you map from your application. so you will have something like :

cbuffer MainCB
{
Directional_light light;
//other cb information go below
}

The second part is defining your per object material data which you can put in a sepearate constant buffer, this has to be put in a separate cbuffer because the update frequency is higher than the main scene cb. Normarlly you'll include data like roughness of the object and the diffuse albedo.

You'll typically want to do lighting in your pixel shader if performance is not your concern. So the lighting can be done by separating your ambient light contribution, diffuse term which is normally done with Lambert 's cosine law(this is easily done with dot poduct between direction light vector and the vertex normal) and specularity which measures object shininess. You do these calculations separately and add them to get the final color:

cbuffer MaterialCB
{
float4 diffuseAlbedo;
}

float4 PixelShader(float3 normal: NORMAL) : SV_TARGET
{
float4 totallight(.0f,.0f,.0f,.0f);
//caculate ambient light seperately. This just done with component wise vector multiplication
float4 ambient = light.ambient * diffuseAlbedo;
//normalise light vector.. also invert incident vector for lambert cosign law calculation
float3 lightvec = -(normalise(light.direction));
//lamberts cosine law.. the idea is if you have a small angle between light vector and vertex normal you will end up with a smaller
//value when you perform dot product. this simply says light reflection at a point increaseses intesity as the angle between the light
//vec and normal gets smaller this is (view point independent)
float diffuseStrength = max(dot(lightvec, normal), 0.0f);
float4 diffuse = diffuseStrength * diffuseAlbedo;

float4 totallight = ambient + diffuse;

}

I think this is the general idea. I've skipped most details out .

17. ### CB register slots (sm 5.0)

Thanks once again SoldierOfLight. I have now tested this with sm 5.1 and I'm able to bind as many CBs as possible (I suppose my hardware support binding tier 3) . This forum has really proven helpful in my directx 12 journey 🙂

You are right. I think I mixed up the 10 with something. it's actually 14.. I get upto 14 with sm 5.0 on c++ application

18. ### CB register slots (sm 5.0)

Hi once again

I have an understanding that I get 10 CB register slots in my shader functions which are listed in HLSL as b0 to b10. Following this kind of rule, I’ve deduced that I can only bind upto  10 CBs max per shader stage. So if I had a descriptor table pointing to 20 contiguous CBVs, is there a way to get to the remaining half since I’m only able to bind 10 at a time?

I’ve also seen that the Root parameter structure exposes register space (when filling out range structure for descriptors). I suspect this could have me access more CBs than the 10 that are bound to registers which I normally see in HLSL.

Is my current understanding correct or theree’s actually more to this.

Thanks 🙂

19. ### DSV AND RTV ON CPU VISIBLE DESCRIPTOR HEAP

pcmaster thanks for patiently explaining it so clearly to me . I finally get the concept after days of wondering :-)

20. ### DSV AND RTV ON CPU VISIBLE DESCRIPTOR HEAP

So when the driver reads the view descriptiion and emits commands in the command list . Could you enlighten me on the kind of commands that the driver would generate if it's not too much too ask? How exactly does the GPU reference/write to the depth buffer and back buffer without a GPU descriptor handle to the resource? Or is that I missed your point when you mentioned the driver emits commands that the GPU later uses at execution time and these commands have some way they make the GPU reference the Render targets?

Edit: Sorry I think I missed the point when you mentioned GPU Descriptors are meant for shaders to have access to the resources. So is it safe to assume GPU writes directly to the Render targets without any view description?

21. ### DSV AND RTV ON CPU VISIBLE DESCRIPTOR HEAP

Hi dear all. I have a question on RTV and DSV descritpors. Is there any good reason why these are stored on a CPU visible descriptor and not on a GPU visible heap?

I ask this because I am required to provide a descriptor handle for both RTV and DSV on a graphics command list to bind my RT and Depth stencil buffer at OM stage:

CommandList->OMSetRenderTargets(1, &CurrentBackBufferView(), true, &DepthStencilView());  ...  I read some article that only GPU descriptors are used on a graphics command list because they are on GPU context. But why do we have a CPU descriptor. I also believe that CPU descriptors are mostly used for immdiate tasks. I've searched on internet to any underlying reason why this is so but to no avail :-(

22. ### DirectX 12 command queues

Thanks for the time for putting up such a great article MJP. I've already read a good chunk of it.  :-)

23. ### DirectX 12 command queues

Thanks JoeJ, I will definitly look into timestamps ,

24. ### DirectX 12 command queues

Thanks for the reply. This is very helpful, I've only been working with a single command queue and I'm using intel integrated graphics. I'm interested in how the profiling can be on done on multiple queues if its not much to ask. Should I measure the time based on when a fence point is reached on the command queue or are there better ways to profile when a gpu finishes proccessing a set of commands?

25. ### DirectX 12 command queues

Hi I'm currently going through microsoft online documentation and I came across information that I'm not sure I have a grasp on , particularly concerning command queues. The documentation at some point says a command queue can write to the same resource simultanouesly at the same time if the appropriate flag to the resource is set.

My question is.. Upon work submission to the command queues. Can it be a requirement for these command queues represent one gpu adapter, in cases were I define two that is. If yes , Does the gpu process both queues in parallel? My other question would be does a gpu have to finish processing commands from compute queue before processing commands from a graphics queue ?  I understand that the queue stores command submitted from an application and the order of execution is first in first out execution by the gpu.