turanszkij

Members
  • Content count

    76
  • Joined

  • Last visited

  • Days Won

    1

turanszkij last won the day on October 13

turanszkij had the most liked content!

Community Reputation

395 Neutral

2 Followers

About turanszkij

  • Rank
    Member

Personal Information

Social

  • Twitter
    turanszkij
  • Github
    turanszkij

Recent Profile Visitors

3086 profile views
  1. The DirectXMath library is inline, so no copy will happen on return, the function is just expanded in place.
  2. Well, I don't know why it worked for you with so little knowledge about your specific app. For me, I like to be very explicit about bind slot IDs for buffers and resources. I declare them in a shared header between C++ app code and HLSL shaders. Then the application uses SetConstantBuffers() with that ID and the shaders are specifying bind locations with that ID as well. HLSL is tricky because you have to concatenate the number with a string, like this: cbuffer MyCBuffer : register(b4) For that I am using a macro like this: #define CBUFFER(name, slot) cbuffer name : register(b ## slot) Regarding the padding, you are right, HLSL does automatically pad your structs when needed, but I like to keep that explicit as well, so I insert dummy data members where padding would occur. This makes it easier to share structs with the C++ application which you should also do in a shared header in my opinion.
  3. Hi, I am not sure what optimizations you have already tried, I have some for you: Use texture compression for the heightmap like you already mentioned: Choose an appropriate format, for example DXT1. Sample only a single channel explicitly Use mipmaps: you will need to calculate the texture uv gradients upfront before the ray marching loop and feed them into textureGrad (I think that's what it's called in glsl?) Use a dynamic loop for the raymarching and terminate early once you found the intersection Use fewer amount of steps in the ray marching. Use smaller resolution heightmap When calculating uv derivatives, you could try using dFdyCoarse instead of standard dFdy (or dFdx) You can check out my implementation, but it is written in HLSL. Hope I could help!
  4. DX11 Writing to buffer outside Map/Unmap?

    I take it we were talking about something like this: struct MappedMemory { void* address; Id3d11devicecontext* context; Id3d11Buffer* buffer MappedMemory(ID3D11Buffer* buffer, Id3d11DeviceContext* context) : context(context), buffer(buffer) { context->Map(buffer, ...); } ~MappedMemory() { context->Unmap(buffer); } }; And that we could use it like this: { MappedMemory mem(constantbuffer, context); ((MyType*)mem.address)->color = float4(1,2,3,4); } context->SetConstantBuffer(constantbuffer); context->Draw();
  5. DX11 Writing to buffer outside Map/Unmap?

    I remained from that solution in the case of locks as well.
  6. DX11 Writing to buffer outside Map/Unmap?

    I thought about that and I don't like that solution, because now the destructor needs to reference the graphics device, and I must still call the destructor before the draw manually in most cases (by a dummy scope for example). Calling a function explicitly is more self documenting.
  7. DX11 Writing to buffer outside Map/Unmap?

    Well, now I have AllocateFromRingBuffer call, which returns the pointer and an offset to use as vertex buffer offset and an InvalidateBufferAccess call which just calls Unmap, so they are a bit more high level than exposing Map/Unmap. I meant I wanted to avoid having an explicit Unmapping after allocations.
  8. DX11 Writing to buffer outside Map/Unmap?

    Thank you. MSDN also states this: ID3D11DeviceContext::Unmap: "Invalidate the pointer to a resource and reenable the GPU's access to that resource." So I will do the right thing and call unmap after writing memory.
  9. DX11 Writing to buffer outside Map/Unmap?

    Right, I will expose the Unmap then, no big deal, just don't really like it..
  10. If I do a buffer update with MAP_NO_OVERWRITE or MAP_DISCARD, can I just write to the buffer after I called Unmap() on the buffer? It seems to work fine for me (Nvidia driver), but is it actually legal to do so? I have a graphics device wrapper and I don't want to expose Map/Unmap, but just have a function like void* AllocateFromRingBuffer(GPUBuffer* buffer, uint size, uint& offset); This function would just call Map on the buffer, then Unmap immediately and then return the address of the buffer. It usually does a MAP_NO_OVERWRITE, but sometimes it is a WRITE_DISCARD (when the buffer wraps around). Previously I have been using it so that the function expected the data upfront and would copy to the buffer between Map/Unmap, but now I want to extend functionality of it so that it would just return an address to write to.
  11. Deferred device contexts

    You are probably right, I have only used them a long time ago, can't remember that well.
  12. Deferred device contexts

    You on the application side can keep a single buffer resource and you can Map it however many times you like from different contexts as well. The allocations are done by the driver and I assume they allocate constant buffers from the command list memory. You might want to avoid doing this a very high amount per frame, for example AMD GCN drivers have a command buffer of 4MB and once you extend that limit, then there is probably some sort of synchronization involved.
  13. Deferred device contexts

    When updating buffers, you have two options, UpdateSubResource() and Map(). Updatesubresource just works on deferred contexts normally, meaning, it probably creates a copy of the buffer and uploads to GPU. When using Map(), you can only use WRITE_DISCARD or NO_OVERWRITE flags. WRITE_DISCARD is a buffer rename operation, meaning that it allocates a new copy of the buffer in CPU accessable memory and provides you the pointer that you can write to. When using NO_OVERWRITE, then you say to the driver that you just want access to the memory, and your application will explicitly ensure that there will be no race conditions for the resource, so neither the GPU, nor an other CPU thread will be competing for the same memory. Take constant buffers for example which are usually updated with WRITE_DISCARD. If you have a global constant buffer like PerFrameVariables, then you have to update that for each deferred context that will reference it.
  14. Depth-of-field

    The standard approach is definetly downsample + blur in a separate pass and blend by depth in an other because it is the fastest and still looks somewhat good enough for casual people. The problem is that blur radius is not variable, the blurred parts leak into the sharp parts of the image and gaussian blur is not life like at all and doesn't produce nice bokeh effect. I want to implemnt something like in the new Doom, as described in this excellent graphics breakdown article. Basically, there is a separation pass of the foreground - background scene, clever blurring (HDR image mandatory), then combine.
  15. For sure, this is like one of my oldest codes so it is a bit low quality