Reitano

Members
  • Content count

    82
  • Joined

  • Last visited

Community Reputation

728 Good

About Reitano

  • Rank
    Member

Personal Information

  1. Thank you all, you've been very helpful. @Hodgmann You are so right, I shouldn't even consider code with undefined behavior. I fixed the allocator to always have Map and Unmap calls around a memory write operation. On the API side, client code can use a convenient Upload method to upload small structures like camera data, and manual Map/Unmap methods to upload potentially large chunks of data, like model instances, lights, materials etc. You can find the new code at https://codeshare.io/2p7ZbV I am planning to refactor the rendering engine at a high level. The idea is to upload ALL constants in a first stage, and only at the end bind them and issue draw calls. This should allow a single call to Map/Unmap per constant buffer like I had originally.
  2. Thank you guys for your replies. Mapping/unmapping and then writing to mapped memory indeed smells of undefined behaviour. So far it works on my machine but I should definitely test it on other GPUs to be more confident.I like this approach as the client code is quite concise, not requiring two calls to Map and Unmap for every constant upload operation. A pity DX11 does not have the concept of persistent mappings. As for the latency, I am now ignoring the value returned by IDXGIDevice1::GetMaximumFrameLatency and using instead a conservative latency equal to 5 for the allocator. I will also add a loop to block the CPU in case the number of queued frames goes above this value (which really shouldn't). @SoldierOfLight I will read about the new presentation modes. Thanks!
  3. Hi, I am writing a linear allocator of per-frame constants using the DirectX 11.1 API. My plan is to replace the traditional constant allocation strategy, where most of the work is done by the driver behind my back, with a manual one inspired by the DirectX 12 and Vulkan APIs. In brief, the allocator maintains a list of 64K pages, each page owns a constant buffer managed as a ring buffer. Each page has a history of the N previous frames. At the beginning of a new frame, the allocator retires the frames that have been processed by the GPU and frees up the corresponding space in each page. I use DirectX 11 queries for detecting when a frame is complete and the ID3D11DeviceContext1::VS/PSSetConstantBuffers1 methods for binding constant buffers with an offset. The new allocator appears to be working but I am not 100% confident it is actually correct. In particular: 1) it relies on queries which I am not too familiar with. Are they 100% reliable ? 2) it maps/unmaps the constant buffer of each page at the beginning of a new frame and then writes the mapped memory as the frame is built. In pseudo code: BeginFrame: page.data = device.Map(page.buffer) device.Unmap(page.buffer) RenderFrame Alloc(size, initData) ... memcpy(page.data + page.start, initData, size) Alloc(size, initData) ... memcpy(page.data + page.start, initData, size) (Note: calling Unmap at the end of a frame prevents binding the mapped constant buffers and triggers an error in the debug layer) Is this valid ? 3) I don't fully understand how many frames I should keep in the history. My intuition says it should be equal to the maximum latency reported by IDXGIDevice1::GetMaximumFrameLatency, which is 3 on my machine. But, this value works fine in an unit test while on a more complex demo I need to manually set it to 5, otherwise the allocator starts overwriting previous frames that have not completed yet. Shouldn't the swap chain Present method block the CPU in this case ? 4) Should I expect this approach to be more efficient than the one managed by the driver ? I don't have meaningful profile data yet. Is anybody familiar with the approach described above and can answer my questions and discuss the pros and cons of this technique based on his experience ? For reference, I've uploaded the (WIP) allocator code at https://paste.ofcode.org/Bq98ujP6zaAuKyjv4X7HSv. Feel free to adapt it in your engine and please let me know if you spot any mistakes Thanks Stefano Lanza
  4. Deferred texturing

    Thank you for the link. I am aware of LEAN mapping and the Bruneton paper. In fact, I already use a baked C-LEAN variance texture for the computation of the wave variance and the filtering of all functions that depend on the normal. The issue I am having is related to the undersampling of the displacement map and the temporal aliasing implicit in the projective grid technique.  Anyway, by biasing the tessellation towards the horizon I managed to reduce these artifacts. I also refactored the pipeline and now the water geometry and shading phases are entirely decoupled, with a noticeable performance improvement on my 5 years old laptop :) I also uploaded a new demo on my website along with new features. I'd really appreciate the feedback of anyone reading this thread !   Thanks !   Mandatory screenshot:
  5. My engine has a Position class which represents a 3D position in world space and internally uses doubles/__m128d simd registers, with utilities to add/subtract Position(s) and compute relative vectors in float coordinates. The Transform component contains it and consequently all model instances, cameras, sound sources, AI agents, particle effects etc take advantage of it. At the beginning of each frame, a new world origin is chosen, which usually coincides with the position of the main camera. All Transform components are then converted to float precision = Transform.position.Subtract(worldOrigin). Frustum culling, rendering, water simulation, AI and other simulation tasks then operate in this float-based relative world space. It's an elegant approach and it works very well. To give an example, I am working on a new demo with an island located very far from the zero origin (longitude: 354750, latitude: 3703690) and everything works perfectly. For the depth buffer, I am now using a reversed depth buffer and it magically fixed all the z-fighting artifacts I was having.
  6. Deferred texturing

    Thank you for the replies. I should have provided more information in my original post. I would use deferred texturing for the rendering of water normals only, not for the whole scene. This pass is particularly expensive due to the pixel shader complexity (it combines several wave layers), many texture fetches with anisotropic filtering and the abovementioned quad overdraw (a term I borrowed from RenderDoc). I use the projective technique for the geometry and in order to minimize temporal aliasing, caused by sampling slightly different points on the water plane as the camera moves, the tessellation must be very high, especially near the horizon. I tried a couple of stabilization approaches but sadly none worked to my satisfaction. @MPJ Luckily divergent sampling is not a problem in my case as the shader uses the same set of textures for the whole pass. I will work on an implementation in the weekend and let you know my findings.   Thanks again!
  7. The most expensive pass in my rendering pipeline processes highly tessellated geometry and thus suffers from a high degree of quad overdraw. At the highest quality setting, the tessellation generates sub pixel triangles and the performance loss is quite drastic. An obvious optimization is deferred texturing, which consists in running a pre pass which rasterizes the geometry and saves to some buffers all the data required by the original rendering pass: texture coordinates, derivatives etc. The original pass is then replaced by either a fullscreen triangle or a compute shader, with optimal quad utilization. My question is: are gradient-based sampling functions nowadays less efficient than the ones not taking the gradient as argument ? That's my main worry and I could not find any information on this. I know that Deus Ex Manking uses this technique but I'd like to have some confirmation before coding a prototype. Also, apart from the need for intermediate buffers, are there other non obvious disadvantages ? Thanks! Stefano
  8. If I understood it correctly, your problem is related to the seamless filtering of a spherical function, a heightmap encoded with a cubemap in your case. That's analogous to the convolution of radiance maps for IBL. It's easier to work in 3D with the required 2D->3D and 3D->2D mapping steps where necessary. In pseudo code: for each cubemap texel     .compute the corresponding 3D direction vector D; D along with an aperture (user defined or fixed) define a disk around the cubemap texel     .compute a orthogonal tangent frame for D : T, B     .take N samples inside the disk. For each sample (u,v) (**1)         .compute the corresponding 3D vector as Di = u * T + v * B + D         .compute the cubemap texel coordinates and face corresponding to Di         .fetch the cubemap with bilinear filtering (**2)     .process the N samples and compute the result. What operator are you using ?     .write back the result to the cubemap The image processing part happens in 3D space which is continuous, and the difficulties due to corners and edges are implicitly taken care of by the mapping from/to cubemap space. (**1) I'd suggest a uniform sampling because the tangent frame is going to be discontinuous  (**2) If you're doing this on the GPU, as Hodgman mentioned modern cards offer seamless bilinear filtering of adjacent cubemap faces. If you're doing this in software, you'll have to emulate bilinear filtering yourself. If something is unclear or you need some code, please let me know.   Stefano
  9. Hi all,   In the past months I have been working on a brand new version of Typhoon. Typhoon is an engine specialized in the simulation and rendering of oceans and underwater environments, targeting AAA games and maritime simulations. It is written in C/C++ 11/Lua/HLSL and currently runs on the Windows/DirectX 11 platform. I have released a demo on www.typhoon3d.com and I am now looking for testers. The requirements are Windows 7 64-bit or later and a DirectX 11 compliant card.   Regarding water, the features are:   - Projective grid tessellation - FFT ambient waves - Procedural waves - Kelvin waves - Ship wakes - Whitecaps - Physically based shading - Specular anti aliasing (baked LEAN) - Reflections - Refractions - Cascaded caustics - Underwater godrays - Underwater shadows - Underwater defocusing - Seamless rendering at the water/air interface - Buoyancy simulation - Support for bathymetry maps   After a break, I will focus on these new features: - Underwater reflections - Support for round earth - Wave particles - Convolution waves for water/bodies interaction - More wave and foam primitives (e.g. helicopter rotors, missile trails) - Spray effects - Anti aliased caustics - Wide angle cameras - Scuba diving pack - And many more engine features...   Next year I will then focus on the seemingly impossible problem of simulating breaking waves and wave refraction in shallow water in real-time, which I'd say is my programming-related dream. I will also seek potential partnerships and work on the integration with other engines/products in order to finance further developments.   Please let me know your feedback and bug reports here or by email (typhoon3d@gmail.com).   Thanks!   Stefano Lanza www.typhoon3d.com   [attachment=36017:gamedev1.jpg][attachment=36018:gamedev2.jpg] [attachment=36019:gamedev3.jpg][attachment=36020:gamedev4.jpg][attachment=36021:gamedev6.jpg][attachment=36022:gamedev7.jpg]
  10. Thank you for the information. I will profile the use of SV_Depth and see what overhead and performance savings brings to my scenes. Manually rejecting pixels in the pixel shader with dynamic branching might be good enough but we'll see.  Related: what is the granularity of depth and stencil rejection on recent GPUs ? And would it be more efficient to use both in cases where one suffices ? For example, draw the sky on pixels whose depth == 1 AND stencil == skyRef instead of only depth == 1.        
  11. Is it possible in DirectX 11 to downsample a depth stencil buffer, and then re-bind it again as a read-only depth stencil buffer ? I only managed to bind it as a shader resource view but I would like, if possible, to take advantage of early depth culling and the stencil buffer. The use case is volumetric effects, rendered at half resolution, although the question is relevant to lower res rendering in general.
  12. Mistery solved. I was binding the depth map as R16_FLOAT instead of R16_UNORM. All works fine now, thank you again!
  13. Thank you so much! All these years of hlsl programming and I never knew about this. On a related note, according to the debug layer, my card does not support a comparison sampler with a 16-bit depth/shadow map, which forces me to either emulate PCF filtering with GatherRed or use a 32 bit depth format. First, is that true for all cards ? I remember that hardware PCF worked fine in DX9 for 16 bit depth maps... My use case is volumetric shadows (8 samples per ray, 1x1 PCF). I will have to profile the two options but I suppose the reduced bandwidth will win over the additional ALU and register pressure. What's your recommendation ?   Thanks
  14. I render shadowmaps the standard way, that is by binding a depth/stencil buffer and by setting a null render target and a null pixel shader. That works fine except for alpha tested geometry. In this case I need a pixel shader to discard pixels that fail the alpha test. Of course the DirectX debug layer complains as no render target is bound: #D3D11 WARNING: ID3D11DeviceContext::DrawIndexedInstanced: The Pixel Shader expects a Render Target View bound to slot 0, but none is bound. The results are correct but I'd like to suppress this annoying warning. Any suggestions ? For example, is there something in DirectX 11 equivalent to the NULL render target hack in DirectX 9 ? Or should I simply create and bind a temporary render target for alpha tested geometry ?    Thanks
  15. Instead of using a shadowmap for shadows, you could bake the visibility between the terrain and the main light in an occlusion map. For each texel, cast a ray between the corresponding terrain sample towards the light, and store 0 or 1 depending on whether the ray hits the terrain or not. The intersection code can be optimized in several ways, for instance by precaching the terrain geometry instead of evaluating the fractal function on the fly. An occlusion map has many advantages: it filters correctly, takes little memory (an 8 bit format is enough), gives you soft shadows for free and it's view independent. Of course you can tweak its resolution depending on your memory and runtime budget. That's the approach I used more than 10 years ago for shadowing my terrains (on the CPU) and it worked very well. You should be able to prototype it on the GPU rather quickly. Just an idea!