# Obliique

Member

26

1. ## Why is the benefit of using XMVectorGetX(XMVector3Dot(Origin, Look)) than just calling XMVectorGetX(Origin)

The original form of your camera matrix in world space would be a result of a series of transformations, typically a Rotation R followed by Translation T . so to get world space matrix, we'd need to do W = RT . But what we want is the inverse of this transform so that every object is multiplied by it which allows us to use the camera as reference coordinate system making every object coordinates relative to camera space. to compute inverse, We can simply go ahead and compute the inverse the usual way but this won't be a good idea as it is very costly. What you'd probably want to do is use a computation that is cheaper. The easier way to go around this is decomposing R and T from the world matrix and computing inverse on R and T individually using a cheaper method. For the rotation R, We know the camera basis vectors are orthonormal, this allows us to get inverse by simply transposing the camera basis vectors so that we have the form RT which gives us: RT = | Ux Vx Wx 0 | | Uy Vy Wy 0 | | Uz Vz Wz 0 | | 0 0 0 1 | where U, V and W are transposed camera basis vectors derived from the original world matrix: R = | Ux Uy Uz 0 | | Vx Vy Vz 0 | | Wx Wy Wz 0 | | 0 0 0 1 | To get the inverse of T which is a translation, we need to negate the translation potion so that we have the form T-1 : T-1 = | 1 0 0 0 | | 0 1 0 0 | | 0 0 1 0 | | -Tx -Ty -Tz 1 | derived from T: T = | 1 0 0 0 | | 0 1 0 0 | | 0 0 1 0 | | Tx Ty Tz 1 | Since we have computed the inverses the easy way/ we can multiply T-1RT to give us view space. note that when you multiply this . you end up with the scenario you just stated to get our forth row. that is when you are doing matrix multiplication in the forth row of T-1 by RT you are simply doing a dot product of the forth row with the basis From transposed rotation matrix. the result view camera matrix should be: T-1RT = | Ux Vx Wx 0 | | Uy Vy Wy 0 | | Uz Vz Wz 0 | | -Tdot U -TdotV -Tdot W 1 |
2. ## 3D SwapChain in DX12

Hi. I have been programming dx12 for nearly 6 months now and I think I still have a misuderstanding on swapchain flags and how they affect presentation. please correct me if I am wrong. My understanding is the following: - DX12 only supports two swap effect flags with the flip model. ie DXGI_SWAP_EFFECT_FLIP_DISCARD and DXGI_SWAP_EFFECT_FLIP_SEQUENTIAL . My understanding is that both these flags don't need a redirection surface hence the contents of backbuffers are displayed to the screen directly from app. The DXGI_SWAP_EFFECT_FLIP_DISCARD flag allows for an option were if the presentation queue is full and the call to IDXGISwapChain::Present() is made, whatever is at the end of this queue is discarded without ever making it to the screen, is this correct? The DXGI_SWAP_EFFECT_FLIP_SEQUENTIAL inserts the frame to be presented at the end of the queue. does this mean that the queue can only contain one buffer at a time? - Both DXGI_SWAP_EFFECT_FLIP_DISCARD and DXGI_SWAP_EFFECT_FLIP_SEQUENTIAL dont support multisampling. So i have had to set my sample count to 1 and sample quality to 0 for my swapchain desc structure. My question is how would we add support for multisampling like 4xMSAA if these are the only flags supported in dx12. I have seen some usages were these were set to sample count > 1 and quality level queried which leaves me confused. I still have'nt tested multisampling as I don't use them in my experimental engine. - The waitable Swapchain options blocks present thread from the calling application until the specified time to wait on elapses. But why would we explicilly specify wait time on swapchain? - tearing support is included by the GPU vendor. So this allows for options like freesync and gsync to be utilized. I am using an intel gpu and I don't really know how to test this. - It isn't a requirement for apps to toggle vsync-on on a windowed app. This is also confusing. Won't screen tearing happen anyway if my app is not synchronized with my next screen vertical blank?
3. ## SwapChain in DX12

Thank you so much for taking the time to respond . This has cleared things up for me . I read carefully and will consider these options when I refactor my code.
4. ## Memory Leak Detection and DX12

I am not sure if that carries over to DirectX since it's built around COM . The best I can think of as mentioned already is enabling the debug layer and making a call to ReportLiveObjects() after you've released all dx objects to see if you have live objects .
5. ## DXR software impl?

Hi all, I was wondering if DXR has a software implementation like WARP . I ask this because I am not using NV hardware. Is it possible to get an app running using DXR on hardware that isn't nvidia on microsoft's software drivers?
6. ## 3D Fewer vertices mapped to more vertex normals?

Hi, I am wondering.. If I have something like a cube which has 8 vertices which are referenced through an index buffer, Is there a way I would go about assigning unique vertex normals to each vertex which I figure are 24... From my current knowledge I think I would need about 24 normals assigning 4 identical normals to each face for the lighting to work correctly , for this to work I would need 24 vertices which eliminates the need for an index buffer. I figured vertex averging was working wrongly here because of very sharp edges. Is it possible to still use normals on cube geometry while using an index buffer such that my vertex count remains 8 or the only way this goes is by using a non indexed geometry with just regular DrawInstanced (dx12) ?
7. ## Fewer vertices mapped to more vertex normals?

I think I will read this properly when I settle down 😄 as it's slightly overwhelming. I am actually trying read Obj files into my application. And one file I exported has more vertex normals than there are vertices which was surprising to me because I have yet to know how these are grouped together. I don't have knowledge of smoothing groups but I have used something similar when I used to use 3ds max 🙂 I will look through this, thanks!
8. ## Fewer vertices mapped to more vertex normals?

Thanks for the helpful response again pcmaster. Would you know a better way on how to resolve complicated meshes which would have both smooth and sharp edges? Would I need to do away with indices to be safe? Or should I detect the angles somehow and somehow use index buffer?
9. ## opinion on size of constant ring buffer

Hi, So I am curious on what other peole would allocate normally for constant buffer managed by a ring buffer. I have assumed a case for my maximum allowable constants and I found myself needing upto 1GB for a constant buffer with tripple buffering, this assumption says I can approximately draw close to upto a million objects. What are some of your maximum sizes or do you generaly use an upload heap to also support stuff like VBs and uploadable buffers for all scenarios? 🙂
10. ## ref count SwapChain->GetBuffer()

Does IDXGISwapChain::GetBuffer() in DX12 increase the COM reference count to the buffer being obtained from this method? I am getting weird Live object reports in my code . I call ReportLiveObjects() at the app shut down stage just after destroying the virtual adapter and I'm getting ref count 3 on ID3D12Device . I tried to do an extra reset(I use ComPtr on frame buffers) on the buffers obtained to be sure but the debugger complains about underflow ref count -1 ? Thanks 🙂
11. ## ImGui not rendering to back buffer

Solved. I was using 2 separate descriptor heaps to pass CPU and GPU descriptor handles.
12. ## DX12 ImGui not rendering to back buffer

Hi, Need help setting up ImGui.. I am trying to render UI using the ImGui framework on dx12. I followed the ImGui example project for dx12 but so far i've had no luck even after carefully looking through my code. The dubug output shows no errors either. I'm calling the ImGui functions in a separate class with static methods as indicated below: void GUI::Initialize(HWND hwnd, ID3D12Device* device, D3D12_CPU_DESCRIPTOR_HANDLE srvCpuHandle, int num_frames_in_flight, DXGI_FORMAT rendertargetformart) { D3D12_DESCRIPTOR_HEAP_DESC fontHeapDesc{}; fontHeapDesc.Flags = D3D12_DESCRIPTOR_HEAP_FLAG_SHADER_VISIBLE; fontHeapDesc.NodeMask = 0; fontHeapDesc.NumDescriptors = 1; fontHeapDesc.Type = D3D12_DESCRIPTOR_HEAP_TYPE_CBV_SRV_UAV; device->CreateDescriptorHeap(&fontHeapDesc, IID_PPV_ARGS(sm_FontHeap.GetAddressOf())); D3D12_GPU_DESCRIPTOR_HANDLE fonthandle = sm_FontHeap->GetGPUDescriptorHandleForHeapStart() ; IMGUI_CHECKVERSION(); ImGui::CreateContext(); ImGuiIO& io = ImGui::GetIO(); (void)io; ImGui_ImplWin32_Init(hwnd); ImGui_ImplDX12_Init(device, num_frames_in_flight, rendertargetformart, srvCpuHandle, fonthandle); ImGui::StyleColorsDark(); } void GUI::Update() { ImGui_ImplDX12_NewFrame(); ImGui_ImplWin32_NewFrame(); ImGui::NewFrame(); { ImGui::Begin("Some Window"); ImGui::Text("Random text here"); ImGui::Button("Button"); ImGui::End(); } } void GUI::RenderOverlay(ID3D12GraphicsCommandList* cmdlist) { cmdlist->SetDescriptorHeaps(1, sm_FontHeap.GetAddressOf()); ImGui::Render(); ImGui_ImplDX12_RenderDrawData(ImGui::GetDrawData(), cmdlist); } void GUI::Shutdown() { ImGui_ImplDX12_Shutdown(); ImGui_ImplWin32_Shutdown(); ImGui::DestroyContext(); } Microsoft::WRL::ComPtr<ID3D12DescriptorHeap> GUI::sm_FontHeap = nullptr; And then I call these methods in the graphics class: gpuContext->TransitionResource(currbackbuffer, D3D12_RESOURCE_STATE_RENDER_TARGET); //gpuContext->SetViewport(); //gpuContext->SetScissorRect(); gpuContext->ClearRenderTarget(currbackbuffer); gpuContext->ClearDepthStencil(dephbuffer); gpuContext->SetRenderTargets(currbackbuffer, dephbuffer); //Render GUI GUI::RenderOverlay(gpuContext->GetCommandList()); gpuContext->TransitionResource(currbackbuffer, D3D12_RESOURCE_STATE_PRESENT); gpuContext->ExecuteCommands(); GraphicsRoot::Present(); uint64_t frameFenceVal = gpuContext->Finish(); Any help on this will be appreciated
13. ## D3D12 Unnamed Command Queue being final-released while still in use by the GPU

I think if you use DXGI_SWAP_EFFECT_FLIP_DISCARD in swap chain, then you need to signal after present because it doesnt block cpu thread . I m guessing the other options allow blocking and a signal after present wouldnt be a requirement?
14. ## D3D12 Unnamed Command Queue being final-released while still in use by the GPU

I've taken note of this. Alwways signaled before present. Seems like it's safer to signal after present in the event I want to prepare for the next frame? 🙂
15. ## Texture repeat question

Hi, I think you'd have to find a way to modify your domain range specified by ADDRESS_MODE_WRAP which wraps at interger junctions which I bet is not doable AFAIK. I think the best approach is to introdude 10 quads, each quad's vertex texture coords would have to address the same portion of the texture in UV coordinates. 🙂

17. ## DX12 CB register slots (sm 5.0)

Hi once again I have an understanding that I get 10 CB register slots in my shader functions which are listed in HLSL as b0 to b10. Following this kind of rule, I’ve deduced that I can only bind upto 10 CBs max per shader stage. So if I had a descriptor table pointing to 20 contiguous CBVs, is there a way to get to the remaining half since I’m only able to bind 10 at a time? I’ve also seen that the Root parameter structure exposes register space (when filling out range structure for descriptors). I suspect this could have me access more CBs than the 10 that are bound to registers which I normally see in HLSL. Is my current understanding correct or theree’s actually more to this. Thanks 🙂
18. ## CB register slots (sm 5.0)

Thanks once again SoldierOfLight. I have now tested this with sm 5.1 and I'm able to bind as many CBs as possible (I suppose my hardware support binding tier 3) . This forum has really proven helpful in my directx 12 journey 🙂 You are right. I think I mixed up the 10 with something. it's actually 14.. I get upto 14 with sm 5.0 on c++ application
19. ## DX12 DSV AND RTV ON CPU VISIBLE DESCRIPTOR HEAP

Hi dear all. I have a question on RTV and DSV descritpors. Is there any good reason why these are stored on a CPU visible descriptor and not on a GPU visible heap? I ask this because I am required to provide a descriptor handle for both RTV and DSV on a graphics command list to bind my RT and Depth stencil buffer at OM stage: CommandList->OMSetRenderTargets(1, &CurrentBackBufferView(), true, &DepthStencilView()); ... I read some article that only GPU descriptors are used on a graphics command list because they are on GPU context. But why do we have a CPU descriptor. I also believe that CPU descriptors are mostly used for immdiate tasks. I've searched on internet to any underlying reason why this is so but to no avail :-(
20. ## DSV AND RTV ON CPU VISIBLE DESCRIPTOR HEAP

pcmaster thanks for patiently explaining it so clearly to me . I finally get the concept after days of wondering :-)
21. ## DSV AND RTV ON CPU VISIBLE DESCRIPTOR HEAP

Thanks for the reply SoldierOfLight. So when the driver reads the view descriptiion and emits commands in the command list . Could you enlighten me on the kind of commands that the driver would generate if it's not too much too ask? How exactly does the GPU reference/write to the depth buffer and back buffer without a GPU descriptor handle to the resource? Or is that I missed your point when you mentioned the driver emits commands that the GPU later uses at execution time and these commands have some way they make the GPU reference the Render targets? Edit: Sorry I think I missed the point when you mentioned GPU Descriptors are meant for shaders to have access to the resources. So is it safe to assume GPU writes directly to the Render targets without any view description?
22. ## DX12 DirectX 12 command queues

Hi I'm currently going through microsoft online documentation and I came across information that I'm not sure I have a grasp on , particularly concerning command queues. The documentation at some point says a command queue can write to the same resource simultanouesly at the same time if the appropriate flag to the resource is set. My question is.. Upon work submission to the command queues. Can it be a requirement for these command queues represent one gpu adapter, in cases were I define two that is. If yes , Does the gpu process both queues in parallel? My other question would be does a gpu have to finish processing commands from compute queue before processing commands from a graphics queue ? I understand that the queue stores command submitted from an application and the order of execution is first in first out execution by the gpu.
23. ## DirectX 12 command queues

Thanks for the time for putting up such a great article MJP. I've already read a good chunk of it. :-)
24. ## DirectX 12 command queues

Thanks JoeJ, I will definitly look into timestamps ,
25. ## DirectX 12 command queues

Thanks for the reply. This is very helpful, I've only been working with a single command queue and I'm using intel integrated graphics. I'm interested in how the profiling can be on done on multiple queues if its not much to ask. Should I measure the time based on when a fence point is reached on the command queue or are there better ways to profile when a gpu finishes proccessing a set of commands?