Jump to content
  • Advertisement

fighting_falcon93

Member
  • Content Count

    12
  • Joined

  • Last visited

Community Reputation

1 Neutral

About fighting_falcon93

  • Rank
    Member

Personal Information

  • Interests
    Art
    Audio
    Business
    Design
    Education
    Programming

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

  1. fighting_falcon93

    Management of CommandQueue/CommandAllocator/CommandList

    I've searched a bit and found this: https://docs.microsoft.com/en-us/windows/win32/direct3d12/executing-and-synchronizing-command-lists#executing-command-lists "Applications can submit command lists to any command queue from multiple threads. The runtime will perform the work of serializing these requests in the order of submission." Do I read it correctly that I can use a single command queue and have multiple threads call its ExecuteCommandList method simultaneously?
  2. fighting_falcon93

    Management of CommandQueue/CommandAllocator/CommandList

    I understand about the cleanup pool. I just think it's a bit uneccessary to do it that way when I only have cleanup to do directly after initialization has completed, for the rest of all frames there are no cleanup to do at all 😉 I'll take into consideration what has been written here and start making some changes in my code to see if any problems show up. One question that already popped up, consider this example: void System::ExecuteCommands(ID3D12CommandList* commandList) { commandQueue->ExecuteCommandLists(1, &commandList); } [Thread1] { [...] System::ExecuteCommands(commandList); [...] } [Thread2] { [...] System::ExecuteCommands(commandList); [...] } Can multiple threads use the same command queue simultaneously? And is it a good idea to do it like this?
  3. fighting_falcon93

    Management of CommandQueue/CommandAllocator/CommandList

    No, memory is not a problem, but if I'm going to release it 1-2 frames later, that means that I need to store the pointers to the upload buffers somewhere, and on top of that, each frame I'd need to check if there's something to remove, which feels like a waste of performance when the there's only cleanup to do after initialization. Basically I'd check an empty list every time I call the update function of of subsystem. Unless there's some better way of cleaning it up?
  4. fighting_falcon93

    Management of CommandQueue/CommandAllocator/CommandList

    I'm thinking mostly of upload buffers. In that case, the subsystem needs to know when it's safe to release the upload buffer.
  5. fighting_falcon93

    Management of CommandQueue/CommandAllocator/CommandList

    Thank you very much for your reply @pcmaster. I'm not sure that I understand what you mean in the first part. If I understand you correctly, do you mean that I should do something like this: void System::Render() { index = swapChain->GetCurrentBackBufferIndex; WaitForPreviousFrame(); commandAllocators[index][0]->Reset(); commandAllocators[index][1]->Reset(); commandAllocators[index][2]->Reset(); commandLists[0]->Reset(commandAllocators[index][0], ...); commandLists[1]->Reset(commandAllocators[index][1], ...); commandLists[2]->Reset(commandAllocators[index][2], ...); SubSystem1::Render(commandLists[0]); SubSystem2::Render(commandLists[1]); SubSystem3::Render(commandLists[2]); commandLists[0]->Close(); commandLists[1]->Close(); commandLists[2]->Close(); commandQueue->ExecuteCommandLists(3, commandLists); swapChain->Present(...); Signal(...); } void SubSystem1::Render(ID3D12GraphicsCommandList* commandList) { commandList->[...]; commandList->[...]; commandList->[...]; } void SubSystem2::Render(ID3D12GraphicsCommandList* commandList) { commandList->[...]; commandList->[...]; commandList->[...]; } void SubSystem3::Render(ID3D12GraphicsCommandList* commandList) { commandList->[...]; commandList->[...]; commandList->[...]; } What I'm thinking about this solution is that there will be problems if a subsystem would like to split up the recording of commands on multiple threads, as that would require more than one command list. Another problem I'm thinking of is if a subsystem would like to execute some commands before proceeding with the rest of the commands. I'm also thinking about the initialization, because there I would need to use a different approach, since during initialization one subsystem might want to execute a command list, wait until it's done, and then record again, for example: void SubSystem1::Initialize() { CreateResource(...); CreateUploadBuffer(...); ExecuteCopyCommand(...); WaitForGPU(...); ReleaseUploadBuffer(...); DoSomethingWithResource(...); } I understand the part about command allocators and threads though, and I will make sure that each thread has their own command list and pair of command allocators so that they don't have to compete for the mutex.
  6. fighting_falcon93

    Management of CommandQueue/CommandAllocator/CommandList

    Thank you very much for your reply @MJP. I understand, will change this directly, thank you for letting me know. Multiple command lists can use the same command allocator. Is that a good idea and how would this work with threading? Also, what is considered better practice, to update multiple subsystems in parallel on their own thread, or, to update the subsystems sequentially but using multiple threads to split up the internal work? The main problem is that I can't make up my mind when it comes to the design. I like the approach of giving each subsystem their own command list and command allocators, but it bothers me that then each subsystem needs to check whether the GPU is done with the allocator or not. If I would do it like you suggested with a RenderBegin() and RenderEnd(), then I wouldn't need to check this, but at the same time, I would be limited to only calling execute once per subsystem per frame, right? Could this limitation lead to problems later? I'm creating resources (constant buffers, textures, etc) and then uploading data through upload buffers. The main issue I'm having is that I can't release the upload buffers until the GPU is done with the copying, and in order to do that, I need to execute the command list, wait until the GPU has finished with it, and only then can I release it. From what I've understood, I cannot record commands that tell the GPU to release resources itself, right? So this issue I'm having with command lists is quite related to the management of upload buffers, which I struggle with aswell. I'll check this out, thank you for the advice.
  7. Hello! I need some guidance on CommandQueue/CommandAllocator/CommandList management. In my current project I have a few "systems" that need to execute graphical commands, such as rendering terrain, rendering water, rendering particles etc. Right now my project is very simple so I'm not even using command lists during initialization. However, that's starting to be required. Currently I'm just using a single command queue with a ring buffer of 2 command allocators that get recorded by a single command list. Each time I render the scene, a command allocator and a command list is being reset and then recorded. After all commands has been recorded, the list is executed and the swap chain is flipped. Here's some pseudo-code: void Initialize() { [...] device->CreateCommandQueue(...); device->CreateCommandAllocator(...); // commandAllocator[0] device->CreateCommandAllocator(...); // commandAllocator[1] device->CreateCommandList(...); commandList->Close(); [...] } void Render() { WaitForPreviousFrame(); commandAllocator[i]->Reset(); // i = swapChain->GetCurrentBackBufferIndex() commandList->Reset(...); RecordAllCommands(); commandList->Close(); commandQueue->ExecuteCommandLists(...); Signal(...); swapChain->Present(...); } The issue with this is that I cannot record commands during initialization, and with this design it's also quite cumbersome to execute command lists multiple times during one frame since the command allocator ring buffer is tied together with the swap chain buffer index. So I started to think about how I should redesign this, preferably also with future support for threading. And I've thought about it for quite some time now and can't come up with a good solution. One idea is that each system should have their own command list with a ring buffer of 2 command allocators, and then record it and just use a global command queue to execute the list. This works well from a parallel point of view, but the issue is that now each system need to check individually if the GPU is done with the commands before resetting the command allocator. This feels like a huge CPU waste. Another idea is that there is only one global command list, that is aviable already during initialization of other systems, and after the initialization this command list gets executed, before entering the game loop. During the game loop, the global command list gets executed once per frame as I do it now. However, there are 2 issues with this. First of all, some systems might want to execute their commands earlier than at the end of each frame. Secondly, if multiple threads record into the same command list, then we might get a situation like this: commandList->SetPipelineState(pipelineState1); // Thread 1 wants pipelineState1. commandList->SetPipelineState(pipelineState2); // Thread 2 wants pipelineState2. [...] commandList->DrawInstanced(...); // Thread 1 expects pipelineState1 to be set... I'm out of ideas of how to implement this in a simple and elegant way. Or maybe I'm doing this entirely wrong. Basically what I need is: Systems should be able to record commands already during initialization. Atleast during initialization, it should be possible to execute commands in multiple steps and even wait for the GPU to complete them. When rendering the scene, it would be nice if multiple threads could record commands in parallel. Does any of you have a good solution to this problem? What is the AAA game engine way of dealing with this?
  8. I'm experimenting with terrain hardware tessellation in DirectX 11. What I want to do is to take a single control point as input in the hull shader, and produce a patch with 4 control points as output. And then the patch constant function will apply the tessellation factors to these 4 edges and send it to the tessellator. But in every tutorial and example I can find, they're using the same amount of control points for both the hull shader and the patch constant function. So I wonder if anyone could please help me a bit on how to do this? I do find it a bit weird if this would not be possible, because for example, the geometry shader can do this, and that's what I've been using so far. But the geometry shader doesn't provide any tessellation, so that's why I'm experimenting with this change. If the hull shader actually doesn't support this, are there any efficient workarounds?
  9. fighting_falcon93

    Receiving Particle Count From GPU

    Thank you both very much for the help. Having some buffers in an array that "rotates" sounds like an interesting idea: m_deviceContext->Map(m_stagingBuffer[0], 0, D3D11_MAP_READ, 0, &m_mappedResource); m_particleCount = ((u32*)m_mappedResource.pData)[0]; m_deviceContext->Unmap(m_stagingBuffer[0], 0); ID3D11Buffer* tmp; tmp = m_stagingBuffer[0]; m_stagingBuffer[0] = m_stagingBuffer[1]; m_stagingBuffer[1] = m_stagingBuffer[2]; m_stagingBuffer[2] = m_stagingBuffer[3]; m_stagingBuffer[3] = m_stagingBuffer[4]; m_stagingBuffer[4] = tmp; m_deviceContext->CopyStructureCount(m_stagingBuffer[4], 0, m_particleBufferUAV); I made some profiling and it seems like it works: ~4% CPU usage without particle count functionality. ~5% CPU usage with particle count, using a buffer rotation. ~14% CPU usage with particle count, using a single buffer. It seems like the Map method still takes around ~1% CPU usage even if I rotate the buffers, is this normal? This is what confuses me a bit. The flag is called "was still drawing", but you're describing it as "hasn't written to it yet". Does this mean that this flag will be returned if the GPU is currently using the buffer, or will it be returned if the GPU hasn't updated it yet? Sorry if I'm asking the same thing again but the naming is a bit contradicting. Should I call the Map method with or without the D3D11_MAP_FLAG_DO_NOT_WAIT flag? And by "block", I assume you mean that the CPU will flush the entire GPU queue? Is there any way that I can find out what the maximum frames in flight can be? Once again, thank you both very much for the help.
  10. I'm creating a particle system that simulates all particles on the GPU. The problem is that I want an efficient way to get the amount of particles currently alive in the system. Creating an ID3D11Buffer in this way: D3D11_BUFFER_DESC bufferDesc; memset(&bufferDesc, 0, sizeof(D3D11_BUFFER_DESC)); bufferDesc.ByteWidth = sizeof(unsigned int); bufferDesc.Usage = D3D11_USAGE_STAGING; bufferDesc.BindFlags = D3D11_BIND_SHADER_RESOURCE; bufferDesc.CPUAccessFlags = D3D11_CPU_ACCESS_READ; m_device->CreateBuffer(&bufferDesc, NULL, &m_bufferParticleCount); Will allow the GPU to write to it and allow the CPU to read from it. But then there's another problem: m_deviceContext->Map(m_bufferParticleCount, 0, D3D11_MAP_READ, 0, &m_mappedResource); m_particleCount = ((unsigned int*)m_mappedResource.pData)[0]; m_deviceContext->Unmap(m_bufferParticleCount, 0); From what I've understood, when the method Map is called, it will flush the entire GPU command queue, and force the CPU to sit idle and do nothing until the GPU has "catched up" and written the particle count into the buffer. Not a good solution. The GPU already updates another buffer with the current amount of particles after each simulation so that I can use the DrawInstancedIndirect method properly. So the purpose of this value is just that I want a method in the particle system that can return the amount of currently alive particles: unsigned int ParticleSystem::GetCurrentParticleCount() { return m_particleCount; } The point is that this value doesn't need to be exact, it's entirely fine if this value is "outdated" by a few frames. So this made me think if there's any way for the CPU to read from the GPU without waiting for the GPU to update the buffer first. For example, if the buffer is always at the same memory location, then the CPU can read whatever value is in there, even if that number represents the amount of particles a few frames earlier. Basically, the CPU would only need to wait if the GPU is actually writing to the buffer at the exact moment that the CPU would like to read from it, but it wouldn't cause a CPU-GPU sync. Also, I have looked into the D3D11_MAP_FLAG_DO_NOT_WAIT parameter: m_deviceContext->Map(m_bufferParticleCount, 0, D3D11_MAP_READ, D3D11_MAP_FLAG_DO_NOT_WAIT, &m_mappedResource); m_particleCount = ((unsigned int*)m_mappedResource.pData)[0]; m_deviceContext->Unmap(m_bufferParticleCount, 0); But for me it's still unclear what this actually does. Does it wait until the resource is aviable without flushing the entire GPU command queue, or does it simply skip the map function if the resource is used, and then when the resource finally becomes aviable it will still flush the entire GPU command queue? Any suggestions on how can I solve this issue?
  11. fighting_falcon93

    Help - DirectX11 - Color Interpolation Along Quad Diagonal

    Thank you all very much for your replies. Yeah, I've thought about that solution aswell, but since it will result in twice as many triangles , I'd prefer to avoid that solution if possible due to performance reasons. I'm sorry, but I don't think that I understand what you mean Why would the interpolation be different if I read the color from a texture rather than storing the color in the vertex structure? When rendering the terrain mesh I've swapped the vertex color to a texture coordinate, but it's the edges (the triangle layout) that controls the interpolation, not the way of reading the color, or am I mistaken? Would you like to explain why there will be issues with the normal generation? I've been following the tutorials on rastertek.com, and according to that tutorial you simply calculate the normal vector of each edge, and then add that normal vector to the normal vector of the triangles face. When all edge vectors have been added you just normalize the triangles vector and you have the triangles normal. Isn't that the correct way of doing it? Sorry if I'm not following you, I'm still learning and havn't grasped all details yet, so please have patience with me What do you mean with the gradients being messed up? This is how I initialize my DXGI_SWAP_CHAIN_DESC in the project with the terrain rendering: DXGI_SWAP_CHAIN_DESC swapChainDesc; ZeroMemory(&swapChainDesc, sizeof(DXGI_SWAP_CHAIN_DESC)); swapChainDesc.BufferCount = 1; swapChainDesc.BufferDesc.Width = GRAPHICS_SCREEN_WIDTH; swapChainDesc.BufferDesc.Height = GRAPHICS_SCREEN_HEIGHT; swapChainDesc.BufferDesc.RefreshRate.Numerator = 60; swapChainDesc.BufferDesc.RefreshRate.Denominator = 1; swapChainDesc.BufferDesc.Format = DXGI_FORMAT_R8G8B8A8_UNORM; swapChainDesc.BufferUsage = DXGI_USAGE_RENDER_TARGET_OUTPUT; swapChainDesc.OutputWindow = hwnd; swapChainDesc.SampleDesc.Count = 1; swapChainDesc.Windowed = true; D3D11CreateDeviceAndSwapChain(NULL, D3D_DRIVER_TYPE_HARDWARE, NULL, NULL, NULL, NULL, D3D11_SDK_VERSION, &swapChainDesc, &swapChain, &device, NULL, &deviceContext); And this is how I initialize the D3D11_INPUT_ELEMENT_DESC for my terrain shader: D3D11_INPUT_ELEMENT_DESC polygonLayout[6]; ZeroMemory(&polygonLayout[0], sizeof(D3D11_INPUT_ELEMENT_DESC)); polygonLayout[0].SemanticName = "POSITION"; polygonLayout[0].Format = DXGI_FORMAT_R32G32B32_FLOAT; polygonLayout[0].AlignedByteOffset = D3D11_APPEND_ALIGNED_ELEMENT; ZeroMemory(&polygonLayout[1], sizeof(D3D11_INPUT_ELEMENT_DESC)); polygonLayout[1].SemanticName = "TEXCOORD"; polygonLayout[1].Format = DXGI_FORMAT_R32G32_FLOAT; polygonLayout[1].AlignedByteOffset = D3D11_APPEND_ALIGNED_ELEMENT; ZeroMemory(&polygonLayout[2], sizeof(D3D11_INPUT_ELEMENT_DESC)); polygonLayout[2].SemanticName = "NORMAL"; polygonLayout[2].Format = DXGI_FORMAT_R32G32B32_FLOAT; polygonLayout[2].AlignedByteOffset = D3D11_APPEND_ALIGNED_ELEMENT; ZeroMemory(&polygonLayout[3], sizeof(D3D11_INPUT_ELEMENT_DESC)); polygonLayout[3].SemanticName = "TERRAIN"; polygonLayout[3].Format = DXGI_FORMAT_R32_FLOAT; polygonLayout[3].AlignedByteOffset = D3D11_APPEND_ALIGNED_ELEMENT; ZeroMemory(&polygonLayout[4], sizeof(D3D11_INPUT_ELEMENT_DESC)); polygonLayout[4].SemanticName = "TERRAIN"; polygonLayout[4].SemanticIndex = 1; polygonLayout[4].Format = DXGI_FORMAT_R32_FLOAT; polygonLayout[4].AlignedByteOffset = D3D11_APPEND_ALIGNED_ELEMENT; ZeroMemory(&polygonLayout[5], sizeof(D3D11_INPUT_ELEMENT_DESC)); polygonLayout[5].SemanticName = "TERRAIN"; polygonLayout[5].SemanticIndex = 2; polygonLayout[5].Format = DXGI_FORMAT_R32_FLOAT; polygonLayout[5].AlignedByteOffset = D3D11_APPEND_ALIGNED_ELEMENT; Currently the terrain is limited to 3 different textures maximum. The terrain shaders (vertex and pixel) looks like this: cbuffer MatrixBuffer { matrix world; matrix view; matrix projection; }; struct VS_INPUT { float4 position : POSITION; float2 texcoord : TEXCOORD0; float3 normal : NORMAL; float terrain0 : TERRAIN0; float terrain1 : TERRAIN1; float terrain2 : TERRAIN2; }; struct VS_OUTPUT { float4 position : SV_POSITION; float2 texcoord : TEXCOORD0; float3 normal : NORMAL; float terrain0 : TERRAIN0; float terrain1 : TERRAIN1; float terrain2 : TERRAIN2; }; VS_OUTPUT main(VS_INPUT input) { input.position.w = 1.0f; VS_OUTPUT output; output.position = mul(input.position, world); output.position = mul(output.position, view); output.position = mul(output.position, projection); output.texcoord = input.texcoord; output.normal = mul(input.normal, (float3x3)world); output.normal = normalize(output.normal); output.terrain0 = input.terrain0; output.terrain1 = input.terrain1; output.terrain2 = input.terrain2; return output; } Texture2D texture0 : register(t0); Texture2D texture1 : register(t1); Texture2D texture2 : register(t2); SamplerState SampleType : register(s0); cbuffer LightBuffer { float4 ambient; float4 diffuse; float3 direction; float padding; } struct PS_INPUT { float4 position : SV_POSITION; float2 texcoord : TEXCOORD0; float3 normal : NORMAL; float terrain0 : TERRAIN0; float terrain1 : TERRAIN1; float terrain2 : TERRAIN2; }; float4 main(PS_INPUT input) : SV_TARGET { float4 color0; color0 = texture0.Sample(SampleType, input.texcoord); color0 = color0 * input.terrain0; float4 color1; color1 = texture1.Sample(SampleType, input.texcoord); color1 = color1 * input.terrain1; float4 color2; color2 = texture2.Sample(SampleType, input.texcoord); color2 = color2 * input.terrain2; float4 color; color = float4(0.0f, 0.0f, 0.0f, 1.0f); color = color + color0; color = color + color1; color = color + color2; color = saturate(color); return color; } Is there something wrong with the shader or should I just change the DXGI_FORMAT? Thank you very much for the article, that was exactly what I was looking for. I've been searching on google like crazy but it never showed up, sadly. Thank you very much for your reply and the very illustrative picture. Let's see if I've understood this correctly. So currently we're talking about 3 different triangulation methods: Am I right so far? I'll try to implement triangulation method 2 and come back with a picture of the visual result. In the meanwhile, I understand why triangulation method 3 is a pain to work with (because the vertices are not alinged as pixels in a texture), but it does result in a perfect interpolation for every single vertex. Is there any decent way of translating for example a heightmap to this format? Or how do you store the height and texture data when you use triangulation method 3?
  12. Imagine that we have a vertex structure that looks like this: struct Vertex { XMFLOAT3 position; XMFLOAT4 color; }; The vertex shader looks like this: cbuffer MatrixBuffer { matrix world; matrix view; matrix projection; }; struct VertexInput { float4 position : POSITION; float4 color : COLOR; }; struct PixelInput { float4 position : SV_POSITION; float4 color : COLOR; }; PixelInput main(VertexInput input) { PixelInput output; input.position.w = 1.0f; output.position = mul(input.position, world); output.position = mul(output.position, view); output.position = mul(output.position, projection); output.color = input.color; return output; } And the pixel shader looks like this: struct PixelInput { float4 position : SV_POSITION; float4 color : COLOR; }; float4 main(PixelInput input) : SV_TARGET { return input.color; } Now let's create a quad consisting of 2 triangles and the vertices A, B, C and D: // Vertex A. vertices[0].position = XMFLOAT3(-1.0f, 1.0f, 0.0f); vertices[0].color = XMFLOAT4( 0.5f, 0.5f, 0.5f, 1.0f); // Vertex B. vertices[1].position = XMFLOAT3( 1.0f, 1.0f, 0.0f); vertices[1].color = XMFLOAT4( 0.5f, 0.5f, 0.5f, 1.0f); // Vertex C. vertices[2].position = XMFLOAT3(-1.0f, -1.0f, 0.0f); vertices[2].color = XMFLOAT4( 0.5f, 0.5f, 0.5f, 1.0f); // Vertex D. vertices[3].position = XMFLOAT3( 1.0f, -1.0f, 0.0f); vertices[3].color = XMFLOAT4( 0.5f, 0.5f, 0.5f, 1.0f); // 1st triangle. indices[0] = 0; // Vertex A. indices[1] = 3; // Vertex D. indices[2] = 2; // Vertex C. // 2nd triangle. indices[3] = 0; // Vertex A. indices[4] = 1; // Vertex B. indices[5] = 3; // Vertex D. This will result in a grey quad as shown in the image below. I've outlined the edges in red color to better illustrate the triangles: Now imagine that we’d want our quad to have a different color in vertex A: // Vertex A. vertices[0].position = XMFLOAT3(-1.0f, 1.0f, 0.0f); vertices[0].color = XMFLOAT4( 0.0f, 0.0f, 0.0f, 1.0f); That works as expected since there’s now an interpolation between the black color in vertex A and the grey color in vertices B, C and D. Let’s revert the previus changes and instead change the color of vertex C: // Vertex C. vertices[2].position = XMFLOAT3(-1.0f, -1.0f, 0.0f); vertices[2].color = XMFLOAT4( 0.0f, 0.0f, 0.0f, 1.0f); As you can see, the interpolation is only done half of the way across the first triangle and not across the entire quad. This is because there's no edge between vertex C and vertex B. Which brings us to my question: I want the interpolation to go across the entire quad and not only across the triangle. So regardless of which vertex we decide to change the color of, the color interpolation should always go across the entire quad. Is there any efficient way of achieving this without adding more vertices and triangles? An illustration of what I'm trying to achieve is shown in the image below: Background This is just a very brief explanation of the problems background in case that would make it easier for you to understand the problems roots and maybe help you with finding a better solution to the problem. I'm trying to texture a terrain mesh in DirectX11. It's working, but I'm a bit unsatisfied with the result. When changing the terrain texture of a single vertex, the interpolation with the other vertices results in a hexagon shape instead of a squared shape: As the red arrows illustrate, I'd like the texture to be interpolated all the way into the corners of the quads.
  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!