Jump to content
  • Advertisement


  • Content Count

  • Joined

  • Last visited

  • Days Won


MJP last won the day on October 2

MJP had the most liked content!

Community Reputation

20135 Excellent


About MJP

  • Rank
    XNA/DirectX Moderator & MVP

Personal Information


  • Twitter
  • Github

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

  1. Something like this should work for converting the MSAA render target to a regular render target that's either 2x, 4x, or 8x width: Texture2DMS<uint> MSAATexture; uint UnpackMSAA(in float4 ScreenPos : SV_Position) { const uint2 srcTexelPos = uint2(ScreenPos.xy) / uint2(msaaSampleCount, 1); const uint srcSubSampleIdx = uint(ScreenPos.x) % msaaSampleCount; return MSAATexture.Load(srcTexelPos, srcSubSampleIdx); }
  2. Calling ID3D11DeviceContext::Map won't increase the size of the buffer. Think about it: how would that even work? Nowhere do you even have a place to inform D3D how big the buffer should be when you call Map...this is because the buffer has a fixed size specified when you create it. What seems to be happening in your code is that Map is returning a pointer to a block of memory with size == 6 * sizeof(VertexType) and you are overrunning that buffer whenever your string is greater than a single character. To be honest I'm surprised you're not crashing with an access violation. If you enable the debug validation layer I'm sure it will also complain that you're drawing with more indices/vertices than what's present in your buffer. Your second approach is definitely more in line with what you want: you want to pre-allocate your dynamic vertex buffer to some maximum size and make sure that you write no more than that maximum size whenever you call Map. If this isn't working you probably have another bug somewhere. Have you verified that your strings don't have a length greater than m_MaxCharInLine? You may need to clamp the size, or break up those strings into multiple batches of Map, Copy, Unmap, Draw sequences. Either way I would definitely enable the debug validation layer like I suggested earlier, since it will often let you know when you're doing things incorrectly.
  3. That's correct, there's no hardware resolve support for UINT formats: https://docs.microsoft.com/en-us/windows/win32/direct3ddxgi/format-support-for-direct3d-11-0-feature-level-hardware You also can't create a multisampled texture with STAGING usage, so there's no direct way to read back the data on the CPU. Instead you should write a compute shader or pixel shader that can read the raw subsample data from the MSAA render target, and then output it to a larger 2D texture or buffer.
  4. Well, with borderless you still want to support a configurable render resolution. You just may need an upscaling step to the full display resolution, which you'll want anyway if you do dynamic resolution and/or if you want the UI rendered at full display resolution. Either way your backbuffer doesn't need to match the window size. Swap chains in DX12 can handle certain amounts of upscaling during scanout when you're in "direct flip" mode (bypassing the compositor). There's some more info here: https://docs.microsoft.com/en-us/windows/win32/direct3ddxgi/for-best-performance--use-dxgi-flip-model
  5. Do you even really need to do fullscreen mode anymore? On Win10 you'll bypass the compositor as long you have a borderless window that covers the entire monitor, which is a much nicer experience both for us and for end users.
  6. MJP

    DirectX Instancing

    You should probably look into what's actually happening inside of Pass.Apply(), since that is not a part of core D3D11. I'm not really familiar with SharpDX, but it sound like it's a wrapper over the Effects framework. If so, that function is going to be doing things like applying your shaders, filling and binding constant buffers, and also binding textures. I would suggest stepping through what happens in there to get a sense of the work being done, and perhaps also try taking a capture with RenderDoc to see what actual D3D11 API calls happen under the hood when you call Apply().
  7. It's probably worth your while to have some sort of "deferred cleanup" mechanism you can use to destroy things that the GPU might still be using. I've handled this in the past by having per-frame lists that the resource can go into when it's no longer needed, and then I'll clean them all up after the waiting for the frame fence. For uploading resources in particular, the way I handled it was to allocate from a large ring buffer instead of creating/destroying temporary uploading resources. This gives you bounded memory usage, and frees you from having to worry about the cleanup. It also gives you viable a path for doing on-the-fly uploads that happen every frame, without having to go through an expensive memory allocations. I've done things for things like large structured buffers containing parameters for every active light in the world, since you can potentially get improved GPU performance when reading from DEFAULT resources instead of UPLOAD resources. You can look at the implementation that I use for my samples here.
  8. First of all, make sure you signal your fence *after* you Present your swap chain. The Present causes a tiny bit of GPU work to get scheduled on the queue, and if you really want to make sure that your GPU has gone idle (for instance so you can delete everything during shutdown), then you want your fence to only signal after that bit of work for the swap chain has completed. Some other suggestions/notes: If you want multiple threads recording commands, then you need as many command lists and command allocators (x2 for double buffering) as you have threads running simultaneously. There's no way to have multiple threads write to a single command list simultaneously. A simple thing you can do is to split your Render() function into RenderBegin() and RenderEnd(), and have these denote the start and end points of your frame where recording to command lists (and touching any GPU-accessible memory!) is allowed to happen. RenderBegin() would wait for the previous frame and get the command buffers ready, and RenderEnd() would submit the command buffers, present, and signal the fence. Then you can do things like RenderBegin() -> Kick off tasks for subsystems that record commands -> RenderEnd() without the subsystems needing to handle waiting on fences to know that they can use a command buffer. I'm not sure I understand the problems you're having with issuing commands during initialization. A simple approach is to just have at least 1 command list ready that's already attached to a command allocator, and once initialization is done you can either submit that command list or just keep adding onto it when you render the frame (note that this may bloat the command allocator size if you do record a ton of commands during initialization). What sorts of things exactly are you trying to do during initialization that require recording commands? If it's for initializing GPU resource memory, then I would suggest creating a separate system for that. You generally need to handle that in a special-case way, and you'll also want to submit on the COPY queue when running on dedicated video cards. One option you can consider is to wait on the previous frame immediately after submitting the current frame. This can make it harder to absorb transient GPU spikes since your wait is earlier, but on the upside your know for the entire next frame that your command buffers and GPU-accessible memory is ready to be written to.
  9. Lux is basically the photometric equivalent of irradiance: they're both units of flux density. For a punctual directional light computing irradiance/illuminance is really simple: you multiply the intensity of the directional light by N dot L. This means your dirlight intensity is essentially "Illuminance on a surface that's perpendicular to the directional light". If you then want to apply physical camera exposure values to your rendered luminance values in an HDR framebuffer, you can follow through this blog series.
  10. You will only get pixel shader executions for pixels that lie within your specified viewport. With your current method you're not going to get any clipping along the Z axis in NDC space, since you're forcing z = w = 1. I would try setting w to 1.0, and outputting a proper value for Z instead of forcing it to 1 (keep in mind that Z is [0, 1] in NDC space, unlike X and Y which are [-1, 1]. So basically you'll want to generate Z by swizzling just like you're doing for X and Y, but then do Z = Z * 0.5 + 0.5 to get it to the [0, 1] range.
  11. "b" registers are for D3D10 and above (SM 4.0+), and "c" registers are for D3D9 (SM 2.x and 3.0). In D3D9 era the shader programming model exposed a set of constant registers, and you would set those registers one at a time using functions like SetPixelShaderConstantF. The shader would then use the values of those constants when when executing. For D3D10 and Shader Model 4.0 the constant registers were scrapped, and replaced with constant buffers. Instead of dealing with 1 constant register at a time, you would instead create buffer resources of arbitrary size (up to 64KB or so) and bind the entire buffer to a single slot/register. The "cb" register is used for these constant buffer slots when the shader assembly needs to read from a particular buffer. Confusingly even though the assembly refers to the registers as "cb", in HLSL you use "register(bX)" to specify which register you want the constant buffer bound to.
  12. Yes, if you do StructuredBuffer<float2> then you're fine: you'll get xyxyxyxy layout. Basically, your original version (option 1) is fine.
  13. Merging two buffers into one generally isn't going to change much from a GPU performance point of view. There are some minor considerations with descriptors, but for this sort of thing your performance is mainly going to be determined by the latency of loading the data itself. To optimize for that, you'll want to make sure that the data is packed in a way to maximize cache hits. This generally means packing data together based on how you actually load/access it. So if you always access X and Y together, you'll probably want to store your data as XYXYXYXYXYXYXY so that the N threads in your warp/wave all can access their data in a coalesced load without any wasted cache space. However if your access pattern is to only access X in one pass and then Y in another, then XXXXXXXXXXXX....YYYYYYYYYY could be be more efficient.
  14. In practice anisotropic filtering is only relevant for minification, not for magnification. You can check the TextureFilterCaps member of the D3DCAPS9 structure to see what filter types are supported for minification and magnification, and you'll find that all hardware out there will have the D3DPTFILTERCAPS_MINFANISOTROPIC bit set, but not D3DPTFILTERCAPS_MAGFANISOTROPIC. It's been a very long time since I've done any D3D9, but I believe that the debug layer will complain at you if you try to use an unsupported filter type. As for the MaxAnisotropy value, it's been even longer since I've worked with the Effect framework but I *think* it will not set the underlying sampler state at all if you omit the value from your HLSL definition. According to this the default value for MaxAnisotropy is 1, so if you never set that sampler state elsewhere that's what you'll get. I believe you can do fancy stuff with the FX format where you can define an int variable that gets set to MaxAnisotropy, and then set the value from your CPU code using ID3DXBaseEffect::SetInt.
  • Advertisement

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!