Jump to content
  • Advertisement

mlfarrell00

Member
  • Content Count

    13
  • Joined

  • Last visited

Posts posted by mlfarrell00


  1. wow.. so get this.. I've been doing this work on a mac rebooted into windows 10 via bootcamp (not a VM, actually running in windows on mac hardware).  The same exact OpenGL demo when rebooted into OSX gets 20- 23 frames per second.  Confirms what everyone already knows.  Apple gives zero fucks about optimizing their OpenGL drivers.  I knew nvidia had an edge on apples drivers but this is ridiculous.  One of the reasons I scouted ahead to learn D3D12 was to get an edge on the concepts that will likely be available when vulkan releases, but with apple touting their vender-lockin metal API, its a wonder if we'll ever see vulkan on OSX at all.  Food for thought I guess.  Either way, if metal becomes the ONLY way to get performant 3D on OSX, I'll be abandoning it in favor of windows in a heatbeat.


  2. Bam!  I finally beat OpenGL.  Man the NVIDIA developers of the OpenGL driver on windows are on point, that's all I'm gonna say, cause this was a bitch.

     

    Even after latency fixes, and large heap allocations, I had to do tons of CPU-bound optimizations to thin-out my VGL layer as much as possible.  Things like STL containers were big bottlenecks.  Things I already knew but don't think about until optimization needs hit a certain level.

     

    On OpenGL, rendering 5000 unique objects via 5000 separate draw calls amounts to about 48-52 FPS

    now on my D3D12 backend, I'm achieving up to 57 FPS

     

    I finally beat it!

    There's likely even more room for optimization on the D3D12 side so I'm happy.

     

    What I'm doing basically is allowing a D3D12 renderer backend to my graphics engine which allows me to Open scene files that I made with my app (http://vertostudio.com).  The scenes were created using an OpenGL ES variant of the same engine.  Being able to load them up in a D3D12 environment with good performance is awesome.  In fact, the actual same C++ graphics engine has been built into JS (via emscripten) and runs on that same website inside of the cloud viewer.  

     

    This whole experiment was about the continuation of my goals to make my graphics engine as platform-independent and versatile as possible - by offering different rendering backends for different systems.  Now I've got OpenGL 3, Webgl, and D3d12, and soon to be vulkan.

     

    Here's my "final" core state class below for reference.

    
    
    #include "vgl.h"
    #include "System.h"
    #include "CoreStateMachine.h"
    #include "BufferArray.h"
    #include "FrameBuffer.h"
    #include "VertexArray.h"
    #include "Texture.h"
    
    using namespace std;
    using namespace Microsoft::WRL;
    
    namespace vgl
    {
      static const int GPUDescriptorHeapSize = 2048;
      static const int GlobalCBufferMaxSize = 4096;
      static const int GlobalCBufferMaxCalls = 10000;
      static const int TriangleFanEmulationBufferSize = 1000000 * 4;
    
      CoreStateMachine::CoreStateMachine()
      {
        //init cannot be done until we have the device & queue pointers
      }
    
      CoreStateMachine::~CoreStateMachine()
      {
    
      }
    
      void CoreStateMachine::shutdown()
      {
        //must be done BEFORE destruction
        for(int i = 0; i < MaxLatencyFrames; i++)
        {
          if(globalConstantBuffers[currentFrameIndex])
          {
            globalConstantBuffers[i]->buffers[0]->Unmap(0, nullptr);
          }
        }
      }
    
      void CoreStateMachine::performDeferredInit()
      {
        D3D12_FEATURE_DATA_D3D12_OPTIONS featureOps;
        ThrowIfFailed(device->CheckFeatureSupport(D3D12_FEATURE_D3D12_OPTIONS, &featureOps, sizeof(featureOps)));
        if((int)featureOps.ResourceBindingTier < (int)D3D12_RESOURCE_BINDING_TIER_2)
        {
          MessageBox(NULL, L"D3D12 Resource Tier 2 support required and not found!", L"It's over", MB_ICONERROR | MB_OK);
          throw vgl_runtime_error("It's over");
        }
    
        ThrowIfFailed(device->CreateCommandAllocator(D3D12_COMMAND_LIST_TYPE_DIRECT, IID_PPV_ARGS(&setupCommandAllocator)));
    
        for(int i = 0; i < MaxLatencyFrames; i++)
          ThrowIfFailed(device->CreateCommandAllocator(D3D12_COMMAND_LIST_TYPE_DIRECT, IID_PPV_ARGS(&renderCommandAllocator[i])));
    
        CD3DX12_DESCRIPTOR_RANGE descRange1, descRange2, descRange3;
        descRange1.Init(D3D12_DESCRIPTOR_RANGE_TYPE_CBV, MaxConstantBuffers, 1);
        descRange2.Init(D3D12_DESCRIPTOR_RANGE_TYPE_SRV, D3D12_DESCRIPTOR_RANGE_OFFSET_APPEND, 0);
        descRange3.Init(D3D12_DESCRIPTOR_RANGE_TYPE_SAMPLER, D3D12_DESCRIPTOR_RANGE_OFFSET_APPEND, 0);
    
        //4 or so constant buffers & 8 textures possible per draw call
        textureTableSize = MaxConstantBuffers + MaxTextures;
    
        CD3DX12_ROOT_PARAMETER rootParam[3];
        CD3DX12_DESCRIPTOR_RANGE ranges[2] = { descRange1, descRange2 };
        rootParam[0].InitAsDescriptorTable(2, ranges);
        rootParam[1].InitAsDescriptorTable(1, &descRange3);
        rootParam[2].InitAsConstantBufferView(0);
    
        CD3DX12_ROOT_SIGNATURE_DESC rootSignatureDesc;
        rootSignatureDesc.Init(3, rootParam, 0, nullptr, D3D12_ROOT_SIGNATURE_FLAG_ALLOW_INPUT_ASSEMBLER_INPUT_LAYOUT);
    
        ComPtr<ID3DBlob> signature;
        ComPtr<ID3DBlob> error;
        ThrowIfFailed(D3D12SerializeRootSignature(&rootSignatureDesc, D3D_ROOT_SIGNATURE_VERSION_1, &signature, &error));
        ThrowIfFailed(device->CreateRootSignature(0, signature->GetBufferPointer(), signature->GetBufferSize(), IID_PPV_ARGS(&rootSignature)));
    
        // Describe and create the graphics pipeline state object (PSO).
        psoDesc = {};
        psoDesc.pRootSignature = rootSignature.Get();
        psoDesc.RasterizerState = CD3DX12_RASTERIZER_DESC(D3D12_DEFAULT);
        psoDesc.RasterizerState.CullMode = D3D12_CULL_MODE_NONE;
        psoDesc.BlendState = CD3DX12_BLEND_DESC(D3D12_DEFAULT);
        psoDesc.DepthStencilState.DepthEnable = FALSE;
        psoDesc.DepthStencilState.DepthWriteMask = D3D12_DEPTH_WRITE_MASK_ALL;
        psoDesc.DepthStencilState.StencilEnable = FALSE;
        psoDesc.DepthStencilState.DepthFunc = D3D12_COMPARISON_FUNC_LESS;
        psoDesc.SampleMask = UINT_MAX;
        psoDesc.PrimitiveTopologyType = D3D12_PRIMITIVE_TOPOLOGY_TYPE_TRIANGLE;
        psoDesc.NumRenderTargets = 1;
        psoDesc.RTVFormats[0] = DXGI_FORMAT_R8G8B8A8_UNORM;
        psoDesc.DSVFormat = DXGI_FORMAT_D32_FLOAT;
        psoDesc.SampleDesc.Count = 1;
        psoDirty = true;
    
        setupFenceValue = 0;
        ThrowIfFailed(device->CreateFence(setupFenceValue, D3D12_FENCE_FLAG_NONE, IID_PPV_ARGS(&setupFence)));
    
        renderFenceValue = 0;
        ThrowIfFailed(device->CreateFence(0, D3D12_FENCE_FLAG_NONE, IID_PPV_ARGS(&renderFence)));
    
        currentFrameIndex = 0;
        ThrowIfFailed(device->CreateFence(0, D3D12_FENCE_FLAG_NONE, IID_PPV_ARGS(&frameFence)));
    
        // Create an event handle to use for frame synchronization.
        setupFenceEvent = CreateEventEx(nullptr, FALSE, FALSE, EVENT_ALL_ACCESS);
        if(setupFenceEvent == nullptr)
        {
          ThrowIfFailed(HRESULT_FROM_WIN32(GetLastError()));
        }
    
        renderFenceEvent = CreateEventEx(nullptr, FALSE, FALSE, EVENT_ALL_ACCESS);
        if(renderFenceEvent == nullptr)
        {
          ThrowIfFailed(HRESULT_FROM_WIN32(GetLastError()));
        }
    
        frameFenceEvent = CreateEventEx(nullptr, FALSE, FALSE, EVENT_ALL_ACCESS);
        if(frameFenceEvent == nullptr)
        {
          ThrowIfFailed(HRESULT_FROM_WIN32(GetLastError()));
        }
    
        auto cl = beginRenderingCommands();
        triangleFanEBOs = make_shared<BufferArray>(2);
        ushort3 eboData[2] = { { 0, 1, 2 }, { 0, 2, 3 } };
        triangleFanEBOs->provideData(0, sizeof(ushort3) * 2, eboData, BufferArray::UT_STATIC);
        triangleFanEBOs->provideData(1, TriangleFanEmulationBufferSize, nullptr, BufferArray::UT_FORCE_UPLOAD_HEAP);
        triangleFanEBOs->setInternalUsage(true);
        endRenderingCommands(cl);
    
        cbSrvHeap = make_shared<DescriptorHeap>(device.Get(), D3D12_DESCRIPTOR_HEAP_TYPE_CBV_SRV_UAV, GPUDescriptorHeapSize, true);
        cpuCbSrvHeap = make_shared<DescriptorHeap>(device.Get(), D3D12_DESCRIPTOR_HEAP_TYPE_CBV_SRV_UAV, textureTableSize, false);
        cbSrvHeaps = { cbSrvHeap };
        samplerHeap = make_shared<DescriptorHeap>(device.Get(), D3D12_DESCRIPTOR_HEAP_TYPE_SAMPLER, GPUDescriptorHeapSize, true);
        cpuSamplerHeap = make_shared<DescriptorHeap>(device.Get(), D3D12_DESCRIPTOR_HEAP_TYPE_SAMPLER, textureTableSize, false);
        samplerHeaps = { samplerHeap };
        vom::Texture::setHeaps(cbSrvHeap, samplerHeap);
    
        waitForSetupCommands();
        waitForRender();
      }
      
      void CoreStateMachine::setBlendFuncSourceFactor(BlendFactor srcFactor, BlendFactor dstFactor)
      {
        D3D12_BLEND blendFactors[] = {
          D3D12_BLEND_ONE,
          D3D12_BLEND_SRC_ALPHA,
          D3D12_BLEND_INV_SRC_ALPHA
        };
    
        auto bs = psoDesc.BlendState.RenderTarget[0];
    
        if(psoDesc.BlendState.RenderTarget[0].SrcBlend != blendFactors[(int)srcFactor] ||
           psoDesc.BlendState.RenderTarget[0].DestBlend != blendFactors[(int)dstFactor])
        {
          psoDesc.BlendState.RenderTarget[0].SrcBlend = blendFactors[(int)srcFactor];
          psoDesc.BlendState.RenderTarget[0].SrcBlendAlpha = blendFactors[(int)srcFactor];
          psoDesc.BlendState.RenderTarget[0].DestBlend = blendFactors[(int)dstFactor];
          psoDesc.BlendState.RenderTarget[0].DestBlendAlpha = blendFactors[(int)dstFactor];
          psoDirty = true;
        }
      }
    
      void CoreStateMachine::setInputLayout(const std::vector<D3D12_INPUT_ELEMENT_DESC> &descs)
      {
        bool changed = psoInputLayout.size() != descs.size();
    
        if(!changed)
        {
          for(int i = 0; i < descs.size(); i++)
          {
            auto &da = descs[i];
            auto &db = psoInputLayout[i];
    
            //fuck wasting more CPU time right now, leaving out the semantic name string comparison
            /*
            if(da.AlignedByteOffset != db.AlignedByteOffset ||
              da.Format != db.Format || da.InstanceDataStepRate != db.InstanceDataStepRate ||
              da.InputSlot != db.InputSlot || da.InputSlotClass != db.InputSlotClass ||
              da.SemanticIndex != db.SemanticIndex || 
              (string)da.SemanticName != (string)db.SemanticName)*/
    
            if(da.AlignedByteOffset != db.AlignedByteOffset ||
              da.Format != db.Format || da.InstanceDataStepRate != db.InstanceDataStepRate ||
              da.InputSlot != db.InputSlot || da.InputSlotClass != db.InputSlotClass ||
              da.SemanticIndex != db.SemanticIndex)
            {
              changed = true;
            }
          }
        }
    
        if(changed)
        {
          psoInputLayout = descs;
          psoDesc.InputLayout = { psoInputLayout.data(), (UINT)psoInputLayout.size() };
          psoDirty = true;
        }
      }
    
      void CoreStateMachine::setShaders(ShaderEffect::ShaderProgram *shaders)
      {
        if(currentProgram != shaders)
        {
          currentProgram = shaders;
          if(shaders)
          {
            psoDesc.VS = { reinterpret_cast<UINT8*>(shaders->vertexShader.blob->GetBufferPointer()), shaders->vertexShader.blob->GetBufferSize() };
            psoDesc.PS = { reinterpret_cast<UINT8*>(shaders->pixelShader.blob->GetBufferPointer()), shaders->pixelShader.blob->GetBufferSize() };
          }
          psoDesc.CachedPSO = {};
          psoDirty = true;
        }
      }
      
      void CoreStateMachine::commitPipelineStateChanges()
      {
        auto cl = continueRenderingCommands();
        ThrowIfFailed(device->CreateGraphicsPipelineState(&psoDesc, IID_PPV_ARGS(&pipelineState)));
        cl->SetPipelineState(pipelineState.Get());
      }
    
      void CoreStateMachine::enableDepthTesting(bool b)
      {
        if(psoDesc.DepthStencilState.DepthEnable != b)
        {
          psoDesc.DepthStencilState.DepthEnable = b;
          psoDirty = true;
        }
      }
      
      void CoreStateMachine::setViewport(int x, int y, int w, int h)
      {
        auto cl = continueRenderingCommands();
        bool close = false;
        if(!cl)
        {
          cl = beginRenderingCommands();
          close = true;
        }
        D3D12_VIEWPORT vp = { x, y, w, h, 0, 1 };
        D3D12_RECT scissor = { x, y, w, h };  //not 100% on this scissor rect, later on obtain from current FB
    
        viewport = { x, y, w, h };
        cl->RSSetViewports(1, &vp);
        cl->RSSetScissorRects(1, &scissor);
    
        if(close)
        {
          endRenderingCommands(cl);
          waitForRender();
        }
      }
      
      int4 CoreStateMachine::getViewport()
      {
        return viewport;
      }
    
      void CoreStateMachine::setColorMask(bool r, bool g, bool b, bool a)
      {
    
      }
      
      void CoreStateMachine::setDepthMask(bool mask)
      {
      }
      
      void CoreStateMachine::setCullFace(bool cullFace)
      {
        auto cf = cullFace ? D3D12_CULL_MODE_BACK : D3D12_CULL_MODE_NONE;
    
        if(psoDesc.RasterizerState.CullMode != cf)
        {
          psoDesc.RasterizerState.CullMode = cf;
          psoDirty = true;
        }
      }
      
      bool CoreStateMachine::enableBlending(bool b)
      {
        if(b != blendingOn)
        {
          if(psoDesc.BlendState.RenderTarget[0].BlendEnable != b)
          {
            psoDesc.BlendState.IndependentBlendEnable = FALSE;
            psoDesc.BlendState.RenderTarget[0].BlendEnable = b;
            blendingOn = b;
            psoDirty = true;
          }
          
          //state was changed
          return true;
        }
        
        return false;
      }
      
      void CoreStateMachine::drawIndexedPrimitives(PrimitiveType type, size_t count, IndexFormat format, size_t bufferOffsetInBytes)
      {
        if(!count)
          return;
    
        static const D3D12_PRIMITIVE_TOPOLOGY mode[] =
        {
          D3D_PRIMITIVE_TOPOLOGY_TRIANGLELIST,
          D3D_PRIMITIVE_TOPOLOGY_TRIANGLELIST, //fan
          D3D_PRIMITIVE_TOPOLOGY_LINELIST,
          D3D_PRIMITIVE_TOPOLOGY_POINTLIST
        };
        auto cl = continueRenderingCommands();
        UINT startIndexLocation = 0;
        INT baseVertexLocation = 0;
    
        if(bufferOffsetInBytes)
        {
          size_t indSz = 0;
          if(format == IF_USHORT)
          {
            indSz = sizeof(unsigned short);
          }
          else
          {
            indSz = sizeof(unsigned int);
          }
    
          startIndexLocation = bufferOffsetInBytes / indSz;
        }
    
        if(type == PT_TRIANGLE_FAN)
        {
          //this is a nitemare, and anyone stupid enough to use it for performance-critical applications deserves
          //this performance penalty
          assert(count == 4);
    
          auto currentIndexDataCPU = BufferArray::getCurrentIndexBufferUpload();
          UINT8 *data = nullptr;
          UINT fanInds[6];
          ThrowIfFailed(currentIndexDataCPU->Map(0, nullptr, reinterpret_cast<void **>(&data)));
          if(format == IF_USHORT)
          {
            for(int i = 0; i < 4; i++)
              fanInds[i] = ((USHORT *)data)[startIndexLocation+i];
          }
          else
          {
            for(int i = 0; i < 4; i++)
              fanInds[i] = ((UINT *)data)[startIndexLocation+i];
          }
          currentIndexDataCPU->Unmap(0, nullptr);
    
          //0,1,2, 0,2,3
          D3D12_RANGE range = { fanEboOffset*sizeof(uint), fanEboOffset*sizeof(uint) + sizeof(uint) * 6 };
          fanInds[5] = fanInds[3];
          fanInds[3] = fanInds[0];
          fanInds[4] = fanInds[2];
          ThrowIfFailed(triangleFanEBOs->buffers[1]->Map(0, nullptr, reinterpret_cast<void **>(&data)));
          memcpy(data+(fanEboOffset*sizeof(uint)), fanInds, sizeof(uint3) * 2);
          triangleFanEBOs->buffers[1]->Unmap(0, &range);
    
          if(range.End >= TriangleFanEmulationBufferSize)
          {
            throw vgl_runtime_error("You're drawing way too many emulated triangle-fan quads in one frame which is EXTREMELY inefficient anwyay");
          }
          
          if(fanEbosSet != 2)
          {
            triangleFanEBOs->setAsIndexBuffer(1, false);
            fanEbosSet = 2;
          }
    
          count = 6;
          startIndexLocation = fanEboOffset;
          baseVertexLocation = 0;
          fanEboOffset += 6;
          format = IF_UINT;
        }
    
        cl->IASetPrimitiveTopology(mode[(int)type]);
        cl->DrawIndexedInstanced(count, 1, startIndexLocation, baseVertexLocation, 0);
    
        drawCallIndex++;
        if(shouldAdvanceOnDraw)
        {
          advanceTextureTable();
          shouldAdvanceOnDraw = false;
        }
    
        /*descriptorTablesChanged = true;
        descriptorHeapsChanged = true;
        endRenderingCommands(cl);
        beginRenderingCommands();*/
      }
      
      void CoreStateMachine::drawPrimitiveArray(PrimitiveType type, size_t count, size_t offsetInElements)
      {
        static const D3D12_PRIMITIVE_TOPOLOGY mode[] =
        {
          D3D_PRIMITIVE_TOPOLOGY_TRIANGLELIST,
          D3D_PRIMITIVE_TOPOLOGY_TRIANGLELIST, //fan
          D3D_PRIMITIVE_TOPOLOGY_LINELIST,
          D3D_PRIMITIVE_TOPOLOGY_POINTLIST
        };
        auto cl = continueRenderingCommands();
    
        if(type == PT_TRIANGLE_FAN)
        {
          assert(count == 4);
          if(fanEbosSet != 1)
          {
            triangleFanEBOs->setAsIndexBuffer(0, true);
            fanEbosSet = 1;
          }
          drawIndexedPrimitives(PT_TRIANGLES, 6, IF_USHORT, 0);
          return;
        }
    
        cl->IASetPrimitiveTopology(mode[(int)type]);
        cl->DrawInstanced(count, 1, offsetInElements, 0);
    
        drawCallIndex++;
        if(shouldAdvanceOnDraw)
        {
          advanceTextureTable();
          shouldAdvanceOnDraw = false;
        }
    
        /*endRenderingCommands(cl);
        ThrowIfFailed(commandQueue->Signal(drawCallFence.Get(), drawCallFenceValue + 1));
        drawCallFenceValue++;
        beginRenderingCommands();*/
      }
    
    
      void CoreStateMachine::beginFrame()
      {
        currentFrameCount++;
        currentFrameIndex = currentFrameCount % MaxLatencyFrames;
    
        //wait until we've caught up latency-wise
        if(currentFrameCount > MaxLatencyFrames)
        {
          frameFence->SetEventOnCompletion(currentFrameCount - MaxLatencyFrames, frameFenceEvent);
          DWORD wait = WaitForSingleObject(frameFenceEvent, 10000);
          if(wait != WAIT_OBJECT_0)
            throw vgl_runtime_error("Failed WaitForSingleObject().  Pipeline froze up.");
    
          ThrowIfFailed(renderCommandAllocator[currentFrameIndex]->Reset());
    
          //drop any resources needed by completed frame
          renderNeededResources[currentFrameIndex].clear();
        }
    
        auto fb = dynamic_pointer_cast<vom::FrameBuffer>(vom::FrameBuffer::getScreen());
        auto cl = beginRenderingCommands();
    
        //reset some per-frame values and incrementers
        descriptorHeapIndex = 0;
        textureTableIndex = 0;
        drawCallIndex = 0;
        shouldAdvanceOnDraw = false;
        fanEbosSet = 0;
        fanEboOffset = 0;
        descriptorTablesChanged = descriptorHeapsChanged = true;
        clPsoDirty = true;
    
        fb->prepareForDraw();
      }
    
      void CoreStateMachine::endFrame()
      {
        auto fb = dynamic_pointer_cast<vom::FrameBuffer>(vom::FrameBuffer::getScreen());
    
        fb->prepareForPresent();
        endRenderingCommands(defaultRenderCommandList);
      }
    
      void CoreStateMachine::beginSetup()
      {
        //seems fine to use the render queue for this too...
        beginRenderingCommands();
      }
    
      void CoreStateMachine::endSetup(bool wait)
      {
        endRenderingCommands(defaultRenderCommandList);
        if(wait)
          waitForRender();
      }
    
      ComPtr<ID3D12GraphicsCommandList> CoreStateMachine::beginSetupCommands()
      {
        ComPtr<ID3D12GraphicsCommandList> commandList;
    
        ThrowIfFailed(device->CreateCommandList(0, D3D12_COMMAND_LIST_TYPE_DIRECT, setupCommandAllocator.Get(), nullptr, IID_PPV_ARGS(&commandList)));
        setupCommandLists.push_back(commandList);
        activeSetupCommandLists.push(commandList);
    
        return commandList;
      }
    
      ComPtr<ID3D12GraphicsCommandList> CoreStateMachine::continueSetupCommands()
      {
        return activeSetupCommandLists.top();
      }
    
      void CoreStateMachine::endSetupCommands(ComPtr<ID3D12GraphicsCommandList> commandList)
      {
        // Execute the outermost command list.
        /*vector<ID3D12CommandList *> ppSetupCommandLists(setupCommandLists.size());
        for(int i = 0; i < setupCommandLists.size(); i++)
        {
          auto &cl = setupCommandLists[i];
          ppSetupCommandLists[i] = cl.Get();
        }
        setupCommandQueue->ExecuteCommandLists(setupCommandLists.size(), ppSetupCommandLists.data());*/
    
        commandList->Close();
    
        ID3D12CommandList *ppCommandLists[] = { commandList.Get() };
        commandQueue->ExecuteCommandLists(_countof(ppCommandLists), ppCommandLists);
    
        activeSetupCommandLists.pop();
      }
    
      void CoreStateMachine::waitForSetupCommands()
      {
        // Signal and increment the fence value.
        const UINT64 fence = setupFenceValue;
        ThrowIfFailed(commandQueue->Signal(setupFence.Get(), fence));
        setupFenceValue++;
    
        // Wait until the previous frame is finished.
        if(setupFence->GetCompletedValue() < fence)
        {
          ThrowIfFailed(setupFence->SetEventOnCompletion(fence, setupFenceEvent));
          WaitForSingleObject(setupFenceEvent, INFINITE);
        }
    
        setupCommandLists.clear();
        while(!activeSetupCommandLists.empty())
          activeSetupCommandLists.pop();
        ThrowIfFailed(setupCommandAllocator->Reset());
      }
    
      ComPtr<ID3D12GraphicsCommandList> CoreStateMachine::beginRenderingCommands()
      {
        if(!defaultRenderCommandList)
        {
          ThrowIfFailed(device->CreateCommandList(0, D3D12_COMMAND_LIST_TYPE_DIRECT, renderCommandAllocator[currentFrameIndex].Get(), nullptr, IID_PPV_ARGS(&defaultRenderCommandList)));
        }
        else
        {
          ThrowIfFailed(defaultRenderCommandList->Reset(renderCommandAllocator[currentFrameIndex].Get(), nullptr));
        }
    
        defaultRenderCommandListAvailable = true;
        renderFenceValue++;
    
        return defaultRenderCommandList;
      }
    
      ComPtr<ID3D12GraphicsCommandList> CoreStateMachine::continueRenderingCommands()
      {
        if(!defaultRenderCommandListAvailable)
          return nullptr;
    
        return defaultRenderCommandList;
      }
    
      void CoreStateMachine::endRenderingCommands(ComPtr<ID3D12GraphicsCommandList> commandList)
      {
        ThrowIfFailed(commandList->Close());
    
        ID3D12CommandList *ppCommandLists[] = { commandList.Get() };
        commandQueue->ExecuteCommandLists(_countof(ppCommandLists), ppCommandLists);
        defaultRenderCommandListAvailable = false;
    
        ThrowIfFailed(commandQueue->Signal(renderFence.Get(), renderFenceValue));
        ThrowIfFailed(commandQueue->Signal(frameFence.Get(), currentFrameCount));
      }
    
      void CoreStateMachine::waitForRender()
      {
        // Signal and increment the fence value.
        const UINT64 fence = renderFenceValue;
        ThrowIfFailed(commandQueue->Signal(renderFence.Get(), fence));
    
        // Wait until the previous frame is finished.
        if(renderFence->GetCompletedValue() < fence)
        {
          ThrowIfFailed(renderFence->SetEventOnCompletion(fence, renderFenceEvent));
          WaitForSingleObject(renderFenceEvent, INFINITE);
        }
    
        for(int i = 0; i < MaxLatencyFrames; i++)
        {
          renderNeededResources[i].clear();
          ThrowIfFailed(renderCommandAllocator[i]->Reset());
        }
        descriptorHeapIndex = 0;
        textureTableIndex = 0;
        drawCallIndex = 0;
        shouldAdvanceOnDraw = false;
        fanEbosSet = 0;
        fanEboOffset = 0;
        descriptorTablesChanged = descriptorHeapsChanged = true;
      }
    
      void CoreStateMachine::preserveResourceUntilRenderComplete(ComPtr<ID3D12Pageable> resource)
      {
        renderNeededResources[currentFrameIndex].push_back(resource);
      }
    
      void CoreStateMachine::presentAndSwapBuffers(bool waitForFrame)
      {
        auto fb = dynamic_pointer_cast<vom::FrameBuffer>(vom::FrameBuffer::getScreen());
    
        fb->getSwapChain()->Present(1, 0);
        if(waitForFrame)
        {
          //waitForRender();
          fb->updateSwapFrameIndex();
        }
      }
    
      void CoreStateMachine::advanceTextureTable()
      { 
        textureTableIndex += textureTableSize;
        descriptorTablesChanged = true;
    
        //check if ran out of descriptor table space
        if(textureTableIndex + textureTableSize >= cbSrvHeap->getSize())
        {
          textureTableIndex = 0;
          descriptorHeapIndex++;
          descriptorHeapsChanged = true;
    
          if(cbSrvHeaps.size() < descriptorHeapIndex + 1)
          {
            auto cbSrv = make_shared<DescriptorHeap>(device.Get(), D3D12_DESCRIPTOR_HEAP_TYPE_CBV_SRV_UAV, GPUDescriptorHeapSize, true);
            auto sampler = make_shared<DescriptorHeap>(device.Get(), D3D12_DESCRIPTOR_HEAP_TYPE_SAMPLER, GPUDescriptorHeapSize, true);
    
            cbSrvHeaps.push_back(cbSrv);
            samplerHeaps.push_back(sampler);
    
            if(DebugBuild())
            {
              vout << "Forced to grow number of GPU heaps to " << descriptorHeapIndex + 1 << endl;
            }
          }
        }
      }
    
      void CoreStateMachine::setShouldAdvanceTablesOnDraw()
      {
        shouldAdvanceOnDraw = true;
      }
    
      UINT CoreStateMachine::getSrvHeapDescriptorIndexForTextureSlot(int slot)
      {
        return textureTableIndex + MaxConstantBuffers;
      }
    
      void CoreStateMachine::prepareToDraw()
      {
        auto cl = continueRenderingCommands();
        cl->SetGraphicsRootSignature(rootSignature.Get());
    
        //prepare any buffers required by current shader program
        auto prog = currentProgram;
    
        if(prog)
        {
          if(globalConstantBuffers.empty())
          {
            globalConstantBuffers.resize(MaxLatencyFrames);
          }
    
          for(int i = 0; i < MaxLatencyFrames; i++)
          {
            if(!globalConstantBuffers[currentFrameIndex])
            {
              const size_t sz = GlobalCBufferMaxSize * GlobalCBufferMaxCalls;
              globalConstantBuffers[i] = make_shared<BufferArray>();
              globalConstantBuffers[i]->ensureStaticSize(sz);
              globalConstantBuffers[i]->provideData(0, sz, nullptr, BufferArray::UT_FORCE_UPLOAD_HEAP);
              ThrowIfFailed(globalConstantBuffers[i]->buffers[0]->Map(0, nullptr, (void **)&globalConstantBufferData[i]));
            }
          }
    
          auto ba = globalConstantBuffers[currentFrameIndex];
    
          memcpy(globalConstantBufferData[currentFrameIndex] + GlobalCBufferMaxSize * drawCallIndex, prog->globalCBufferData, prog->globalCBufferSize);
          cl->SetGraphicsRootConstantBufferView(2, ba->buffers[0]->GetGPUVirtualAddress() + GlobalCBufferMaxSize*drawCallIndex);
        }
        
        auto currentVA = VertexArray::current();
        assert(currentVA != nullptr);
    
        if(lastUsedVA != currentVA)
        {
          currentVA->prepareForDraw();
          lastUsedVA = currentVA;
        }
        
        if(descriptorTablesChanged)
        {
          device->CopyDescriptorsSimple(textureTableSize, cbSrvHeaps[descriptorHeapIndex]->hCPU(textureTableIndex), cpuCbSrvHeap->hCPU(0), D3D12_DESCRIPTOR_HEAP_TYPE_CBV_SRV_UAV);
          device->CopyDescriptorsSimple(textureTableSize, samplerHeaps[descriptorHeapIndex]->hCPU(textureTableIndex), cpuSamplerHeap->hCPU(0), D3D12_DESCRIPTOR_HEAP_TYPE_SAMPLER);
        }
    
        if(descriptorHeapsChanged)
        {
          ID3D12DescriptorHeap *descHeaps[] = { cbSrvHeaps[descriptorHeapIndex]->get(), samplerHeaps[descriptorHeapIndex]->get() };
          cl->SetDescriptorHeaps(ARRAYSIZE(descHeaps), descHeaps);
          descriptorHeapsChanged = false;
        }
    
        if(psoDirty)
        {
          //todo:  these caches might be stale!  I don't know if they rebuild properly when psoDesc doesn't correspond
          if(prog->psoCache)
          {
            psoDesc.CachedPSO = { prog->psoCache->GetBufferPointer(), prog->psoCacheSize };
          }
          if(pipelineState)
          {
            preserveResourceUntilRenderComplete(pipelineState);
            pipelineState = nullptr;
          }
          if(FAILED(device->CreateGraphicsPipelineState(&psoDesc, IID_PPV_ARGS(&pipelineState))))
          {
            prog->psoCache = nullptr;
            psoDesc.CachedPSO = {};
    
            ThrowIfFailed(device->CreateGraphicsPipelineState(&psoDesc, IID_PPV_ARGS(&pipelineState)));
          }
          else
          {
            ComPtr<ID3DBlob> blob;
            ThrowIfFailed(pipelineState->GetCachedBlob(&blob));
    
            prog->psoCache = blob;
            prog->psoCacheSize = blob->GetBufferSize();
          }
          psoDirty = false;
          clPsoDirty = true;
        }
    
        if(clPsoDirty)
        {
          cl->SetPipelineState(pipelineState.Get());
          clPsoDirty = false;
        }
    
        if(descriptorTablesChanged)
        {
          cl->SetGraphicsRootDescriptorTable(0, cbSrvHeaps[descriptorHeapIndex]->hGPU(textureTableIndex));
          cl->SetGraphicsRootDescriptorTable(1, samplerHeaps[descriptorHeapIndex]->hGPU(textureTableIndex));
          descriptorTablesChanged = false;
        }
      }
    
      void CoreStateMachine::indexBufferChanged()
      {
        fanEbosSet = 0;
      }
    
      void CoreStateMachine::bindConstantBuffer(BufferArray::Pointer buffer, int bufferIndex, size_t bufferSize, int registerIndex)
      {
        auto cbvHandle = cbSrvHeap->hCPU(registerIndex);
        D3D12_CONSTANT_BUFFER_VIEW_DESC cbvDesc = {};
        auto &buf = *(buffer);
    
        assert(registerIndex > 0);
        cbvDesc.BufferLocation = buf[bufferIndex]->GetGPUVirtualAddress();
        cbvDesc.SizeInBytes = bufferSize;
        device->CreateConstantBufferView(&cbvDesc, cbvHandle);
      }
    
      void CoreStateMachine::createShaderResourceView(int slot, ID3D12Resource *resource, D3D12_SHADER_RESOURCE_VIEW_DESC *rvd, D3D12_SAMPLER_DESC *svd)
      {
        //cpu-readable descriptors
        device->CreateSampler(svd, cpuSamplerHeap->hCPU(slot));
        device->CreateShaderResourceView(resource, rvd, cpuCbSrvHeap->hCPU(MaxConstantBuffers + slot));
    
        //shader-visible descriptors
        //device->CreateSampler(svd, samplerHeap->hCPU(textureTableIndex+slot));
        //device->CreateShaderResourceView(resource, rvd, cbSrvHeap->hCPU(getSrvHeapDescriptorIndexForTextureSlot(slot)));
      }
    }
    

  3. Sort of.

     

    You want a frame of latency in there so that the CPU and GPU can actually do their work in parallel. While the GPU is rendering frame N, you want the CPU to be building the command lists and writing the constants for frame N+1. Without that parallelism your best case frame time is going to be the time it takes the GPU to render the frame *plus* the time it takes the CPU to build a frame's worth of commands / data. With the CPU and GPU operating in parallel on different frames your frame time would be max(GPU Time Per Frame, CPU Time Per Frame). Depending on which of those is larger determines whether you're what we call "GPU bound" or "CPU bound". There's little value in optimising the work the GPU has to do when it takes 5ms if the CPU is only producing a new frame every 20ms (and vice versa).

     

    If you look at the D3D12HelloTriangle sample on Github you have an example of what *not* to do. The "WaitForPreviousFrame" function doesn't let the CPU render the next frame until the GPU has finished rendering the one just submitted. You'll probably get away with this in trivial samples where the combined GPU and CPU time is low enough, but the function clearly calls out that this is bad practice.

     

    However if you look at the D3D12DynamicIndexing sample you'll see that it creates a number of Frame Resources and it cycles between them every frame in a round robin fashion. The CPU needs to ensure that the resources it's about to use are unused by the GPU (see ::OnUpdate) but since there's at least two of these Frame Resources it's not the data that was used to render the frame we just finished building. This is the system by which you can have the GPU rendering frame N while the CPU builds N+1 but waits before rendering N+2 until N has finished rendering.

     

    Hello triangle is my nemesis.  I used it as the guide for most of my work.  This example seems to be WAY better and I'm refactoring based off of it.  https://github.com/shobomaru/HelloD3D12/tree/master/ParallelFrameRootConstant

     

    I knew that forcing the GPU to wait before building the next command list was stupid, I just didn't realize HOW stupid it was.  I suppose it's a miracle that I got 45-50 FPS with 1000 draw calls as it was doing that.  


  4. This gist shows my implementation of a DynamicBuffer and how it's used:

     

    https://gist.github.com/anonymous/b74eabea44d6cf7cfcb2

     

    The DynamicBuffer creates a single buffer in an upload heap large enough to store N frames worth of dynamic data where N is the number of frames of latency I allow (usually 2). It's a single buffer, but logically I treat is as the first half being for even frames and the second half being used for odd frames.

     

    My GraphicsDevice tells me whether we're on an odd or an even frame frame (GetFrameID) and when the frame count (GetFrameCount) changes since we last uploaded some data then I know it's time to reset back to 0 and switch sides of the buffer.

     

    My root signature is setup such that I have a slot for a root constant buffer view, that is, I don't need a descriptor for it. "Set[Graphics|Compute]RootCBV" is just a function that wraps SetGraphicsRootConstantBufferView / SetComputeRootConstantBufferView.

     

    As you can see, I map the buffer when it's created and from then on "mapping" the buffer is just a case of deciding whether to use the first half or second half and then offset that pointer by however many bytes I've already added on this frame (with an assert to make sure I don't exceed the amount which I said was enough for one frame). If I change GraphicsDevice::MAX_FRAMES_LATENCY to 3, the buffer is just logically treated as having 3 sections rather than two and allows the CPU to get an extra frame ahead of the GPU assuming you've got your higher level synchronisation right.

     

    DynamicBuffer::GetGPUVirtualAddress turns a CPU pointer into a GPU pointer for the purposes of inserting into the root constants, but that is just a case of calculating an offset from the resource's GPU virtual address. I should probably have cached off the resource's GPU virtual address, but chose not to.

     

    Thanks a ton for this.  I'm still in hack mode until I get my performance, then I'll probably write a similair class or modify my BufferArray to act this way.  I did the map memory thing and haven't yet noticed a performance benefit.  

    //New section for root cbuffer updates
        if(prog)
        {
          if(globalConstantBuffers.empty())
          {
            globalConstantBuffers.resize(2);
          }
    
          if(!globalConstantBuffers[0])
          {
            const size_t sz = 4096 * 10000;
            globalConstantBuffers[0] = make_shared<BufferArray>();
            globalConstantBuffers[0]->ensureStaticSize(sz);
            globalConstantBuffers[0]->provideData(0, sz, nullptr, BufferArray::UT_FORCE_UPLOAD_HEAP);
            ThrowIfFailed(globalConstantBuffers[0]->buffers[0]->Map(0, nullptr, (void **)&globalConstantBufferData));
          }
    
          auto ba = globalConstantBuffers[0];
          //ba->provideData(0, prog->globalCBufferSize, prog->globalCBufferData, BufferArray::UT_FORCE_UPLOAD_HEAP);
          memcpy(globalConstantBufferData + 4096 * drawCallIndex, prog->globalCBufferData, prog->globalCBufferSize);
          cl->SetGraphicsRootConstantBufferView(2, ba->buffers[0]->GetGPUVirtualAddress() + 4096*drawCallIndex);
        }
    

    My guess is my final issue is ignoring frame latency.  That concept is entirely alien to me since I've spent most of my career using OpenGL and other APIs that don't really mention that kind of thing.

     

    My question is, whats the exact point.  

     

    My hunch is this, correct me if I'm wrong:  You need latency so that you can compute the buffer data for the NEXT frame so that by the time its uploaded, the GPU next frame won't stall to access it. 

     

    Is that the reason?  Is that what OpenGL does too but I've never noticed.


  5. Without trying to be too dismissive of your technical abilities, I feel pretty confident in saying that the fault probably lies at your door rather than any shortcomings in the API.

     

    The ability to rapidly "map" and set new constants very quickly is one of the key areas that has become a lot faster in D3D12. The simplest scheme to be able efficiently update constants is simply to allocate two chunks of memory, each large enough to hold all the constants that need to be written for a single frame. You can persistently map a buffer for the lifetime of the application and need only memcpy your constants to a monotonically increasing address during the frame before switching to the other of the two buffers for the following frame. 

     

    Perhaps you could explain how you've implemented constant updates?

     

    Trust me, you've got it right on.  I know D3D12 could beat the pants off GL/D3D11 when used properly.  Whats frustrating me is that after this much effort, I'm still falling short.

    Can you explain to me what you mean by mapping to a monotonically increasing address?  

     

    Here's my new per-draw-call update method.

     

     

    With this code, I'm now about 45-50 FPS compared to OpenGL backend of same engine which gets a solid 60 fps (or more).  Again, sorry about the tabs.

     

    The "psoDirty" is only true about twice throughout the entire rendering.  descriptorHeapsChanged never occurs (frame fits in just the one heap).

    I'm pretty sure the problem still lies in my constant buffer updates.  They were performing terribly when I used a default heap (with a copy from upload heap), and perform okay when I use upload heap only, but still are quite slow.  When I comment out the heap updates (done via map, memcpy, unmap), FPS shoots back up to solid 60.

     
    void CoreStateMachine::prepareToDraw()
      {
        auto cl = continueRenderingCommands();
        cl->SetGraphicsRootSignature(rootSignature.Get());
    
        //prepare any buffers required by current shader program
        auto prog = currentProgram;
    
        if(prog)
        {
          if(globalConstantBuffers.empty())
          {
            globalConstantBuffers.resize(10000);
          }
    
          if(!globalConstantBuffers[drawCallIndex])
          {
            globalConstantBuffers[drawCallIndex] = make_shared<BufferArray>();
            globalConstantBuffers[drawCallIndex]->ensureStaticSize(4096);
          }
          auto ba = globalConstantBuffers[drawCallIndex];
          ba->provideData(0, prog->globalCBufferSize, prog->globalCBufferData, BufferArray::UT_FORCE_UPLOAD_HEAP);
          cl->SetGraphicsRootConstantBufferView(2, ba->buffers[0]->GetGPUVirtualAddress());
        }
        
        auto currentVA = VertexArray::current();
        assert(currentVA != nullptr);
    
        if(lastUsedVA != currentVA)
        {
          currentVA->prepareForDraw();
          lastUsedVA = currentVA;
        }
        
        device->CopyDescriptorsSimple(textureTableSize, cbSrvHeaps[descriptorHeapIndex]->hCPU(textureTableIndex), cpuCbSrvHeap->hCPU(0), D3D12_DESCRIPTOR_HEAP_TYPE_CBV_SRV_UAV);
        device->CopyDescriptorsSimple(textureTableSize, samplerHeaps[descriptorHeapIndex]->hCPU(textureTableIndex), cpuSamplerHeap->hCPU(0), D3D12_DESCRIPTOR_HEAP_TYPE_SAMPLER);
    
        if(descriptorHeapsChanged)
        {
          ID3D12DescriptorHeap *descHeaps[] = { cbSrvHeaps[descriptorHeapIndex]->get(), samplerHeaps[descriptorHeapIndex]->get() };
          cl->SetDescriptorHeaps(ARRAYSIZE(descHeaps), descHeaps);
          descriptorHeapsChanged = false;
        }
    
        //might be smarter to set this up earlier if I can.. not sure what the tradeoff is here
        if(psoDirty)
        {
          if(prog->psoCache)
          {
            psoDesc.CachedPSO = { prog->psoCache->GetBufferPointer(), prog->psoCacheSize };
          }
          if(pipelineState)
          {
            preserveResourceUntilRenderComplete(pipelineState);
            pipelineState = nullptr;
          }
          if(FAILED(device->CreateGraphicsPipelineState(&psoDesc, IID_PPV_ARGS(&pipelineState))))
          {
            prog->psoCache = nullptr;
            psoDesc.CachedPSO = {};
    
            ThrowIfFailed(device->CreateGraphicsPipelineState(&psoDesc, IID_PPV_ARGS(&pipelineState)));
          }
          else
          {
            ComPtr<ID3DBlob> blob;
            ThrowIfFailed(pipelineState->GetCachedBlob(&blob));
    
            prog->psoCache = blob;
            prog->psoCacheSize = blob->GetBufferSize();
          }
          psoDirty = false;
        }
    
        cl->SetPipelineState(pipelineState.Get());
    
        if(descriptorTablesChanged)
        {
          cl->SetGraphicsRootDescriptorTable(0, cbSrvHeaps[descriptorHeapIndex]->hGPU(textureTableIndex));
          cl->SetGraphicsRootDescriptorTable(1, samplerHeaps[descriptorHeapIndex]->hGPU(textureTableIndex));
          descriptorTablesChanged = false;
        }
      }
    
      void CoreStateMachine::indexBufferChanged()
      {
        fanEbosSet = 0;
      }
    
      void CoreStateMachine::bindConstantBuffer(BufferArray::Pointer buffer, int bufferIndex, size_t bufferSize, int registerIndex)
      {
        auto cbvHandle = cbSrvHeap->hCPU(registerIndex);
        D3D12_CONSTANT_BUFFER_VIEW_DESC cbvDesc = {};
        auto &buf = *(buffer);
    
        assert(registerIndex > 0);
        cbvDesc.BufferLocation = buf[bufferIndex]->GetGPUVirtualAddress();
        cbvDesc.SizeInBytes = bufferSize;
        device->CreateConstantBufferView(&cbvDesc, cbvHandle);
      }
    
      void CoreStateMachine::createShaderResourceView(int slot, ID3D12Resource *resource, D3D12_SHADER_RESOURCE_VIEW_DESC *rvd, D3D12_SAMPLER_DESC *svd)
      {
        //cpu-readable descriptors
        device->CreateSampler(svd, cpuSamplerHeap->hCPU(slot));
        device->CreateShaderResourceView(resource, rvd, cpuCbSrvHeap->hCPU(MaxConstantBuffers + slot));
    
        //shader-visible descriptors
        //device->CreateSampler(svd, samplerHeap->hCPU(textureTableIndex+slot));
        //device->CreateShaderResourceView(resource, rvd, cbSrvHeap->hCPU(getSrvHeapDescriptorIndexForTextureSlot(slot)));
      }
    }
    

  6. It may be time to give up on d3d12.  My opengl system massively outperforms it with a drastically simpler architecture, even on a freaking ipad.  I thought I'd be able to get performance gains without dicking around too much - that was the appeal of d3d12 and the coming vulkan to me.  But man... 7 days in a row of staying up til 2am... and I still don't have it.  This just plainly is not worth it for me.  

     

    My system changes "uniform" global constants very often and at unpredictable times.  At a high level, the kind of "precomputation" that d3d12 would need to be fast just isnt there.  It feels like instanced rendering all over again.  I can't seem to find a way to efficiently copy the needed cbuffer data before draw calls in a way that doesn't hurt performance.

     

    I know my code is suboptimal, but I expected to outperform opengl at the very least with this kind of baseline.

     

    very dissapointing...


  7. I'm getting about 10 FPS with only about 20 draw calls of very small meshes.

     

    Clearly I'm doing something very very wrong.

     

    Is there an issue with populating a single command list to draw 20 items?  I figured even if that isn't optimal, it shouldn't be THIS terrible.  At this point my iPad running OpenGL ES is outperforming my d3d12 engine by a massive margin.

     

    The following code runs before most of my draw calls to set up necessary state:  

     

    Sorry for the messed up tab spacings

    	void CoreStateMachine::prepareToDraw()
    	{
    		auto cl = continueRenderingCommands();
        cl->SetGraphicsRootSignature(rootSignature.Get());
    
    		//prepare any buffers required by current shader program
    		auto prog = currentProgram;
    
    		if(prog)
    		{
    			if(!prog->globalCBuffer)
    			{
    				prog->globalCBuffer = make_shared<BufferArray>();
    				prog->globalCBufferDirty = true;
    			}
    			else
    			{
    				//this should be cleaned up at some point
    				preserveResourceUntilRenderComplete(prog->globalCBuffer->uploadBuffers[0]);
    				preserveResourceUntilRenderComplete(prog->globalCBuffer->buffers[0]);
    				prog->globalCBuffer = make_shared<BufferArray>();
    				prog->globalCBufferDirty = true;
    			}
    
    			if(prog->globalCBufferDirty)
    			{
    				prog->globalCBuffer->provideData(0, prog->globalCBufferSize, prog->globalCBufferData, BufferArray::UT_DYNAMIC);
    				prog->globalCBufferDirty = false;
    			}
    
    			cl->SetGraphicsRootConstantBufferView(2, prog->globalCBuffer->buffers[0]->GetGPUVirtualAddress());
    		}
    
    		auto currentVA = VertexArray::current();
    		assert(currentVA != nullptr);
    		currentVA->prepareForDraw();
    
    		device->CopyDescriptorsSimple(textureTableSize, cbSrvHeaps[descriptorHeapIndex]->hCPU(textureTableIndex), cpuCbSrvHeap->hCPU(0), D3D12_DESCRIPTOR_HEAP_TYPE_CBV_SRV_UAV);
    		device->CopyDescriptorsSimple(textureTableSize, samplerHeaps[descriptorHeapIndex]->hCPU(textureTableIndex), cpuSamplerHeap->hCPU(0), D3D12_DESCRIPTOR_HEAP_TYPE_SAMPLER);
    
        if(descriptorHeapsChanged)
        {
          ID3D12DescriptorHeap *descHeaps[] = { cbSrvHeaps[descriptorHeapIndex]->get(), samplerHeaps[descriptorHeapIndex]->get() };
          cl->SetDescriptorHeaps(ARRAYSIZE(descHeaps), descHeaps);
          descriptorHeapsChanged = false;
        }
    
    		//might be smarter to set this up earlier if I can.. not sure what the tradeoff is here
    		if(pipelineState)
    		{
    			preserveResourceUntilRenderComplete(pipelineState);
    			pipelineState = nullptr;
    		}
    		ThrowIfFailed(device->CreateGraphicsPipelineState(&psoDesc, IID_PPV_ARGS(&pipelineState)));
    
    		cl->SetPipelineState(pipelineState.Get());
    
        if(descriptorTablesChanged)
        {
          cl->SetGraphicsRootDescriptorTable(0, cbSrvHeaps[descriptorHeapIndex]->hGPU(textureTableIndex));
          cl->SetGraphicsRootDescriptorTable(1, samplerHeaps[descriptorHeapIndex]->hGPU(textureTableIndex));
          descriptorTablesChanged = false;
        }
    	}
    

  8. Posting this to help others who land here with a similar problem

     

    I was right in suspecting one global c buffer.  

    The following code allows me to reliably determine the cbuffer index of the global buffer.

                    if(shadersBuilt[0])
    		{
    			ID3D12ShaderReflection *reflector = nullptr;
    			ThrowIfFailed(D3DReflect(prog->vertexShader.blob->GetBufferPointer(), prog->vertexShader.blob->GetBufferSize(), IID_ID3D12ShaderReflection, (void **)&reflector));
    
    			D3D12_SHADER_DESC descShader;
    			ThrowIfFailed(reflector->GetDesc(&descShader));
    
    			for(int i = 0; i < descShader.ConstantBuffers; i++)
    			{
    				auto global = reflector->GetConstantBufferByIndex(i);
    
    				D3D12_SHADER_BUFFER_DESC desc;
    				ThrowIfFailed(global->GetDesc(&desc));
    				if(string(desc.Name).find("$Global") != string::npos)
    				{
    					vout << desc.Name << endl;
    
    					auto var = global->GetVariableByIndex(0);
    					D3D12_SHADER_VARIABLE_DESC descVar;
    					ThrowIfFailed(var->GetDesc(&descVar));
    					vout << descVar.Name << endl;
    				}
    			}
    		}
    

  9. I'm using angle to automatically translate my GLSL shader code for use in a D3D12 engine.

    I already know how to update constant buffers from the new API.
    Where I am stuck is how to get at variables that look like this:
    uniform float4 someVar : register c0;
    From the D3D12 API

    Does anybody know how I can do this? How do I populate "c" variables from DirectX 12?

    Edit: am I right to believe these go into the "$global" c buffer following the same packing rules?
  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!