Jump to content
  • Advertisement


  • Content Count

  • Joined

  • Last visited

Community Reputation

114 Neutral

1 Follower

About zhangdoa

Personal Information


  • Twitter
  • Github

Recent Profile Visitors

793 profile views
  1. You are (almost) correct, just only one thing you may not catch up with so well, that in microfacet theory, we always give the "surface" 2 normal, the macro-normal n, and the micro-normal m. SIGGRAPH 2013 Course: Physically Based Shading in Theory and Practice - Background: Physics and Math of Shading I suggest you read "Surface Reflectance (Specular Term)" from pg.12 thoroughly and I hope you could understand the reason why it's VdotH, that we would use halfway-vector h as the assumption of the micro surface's normal. The second answer in https://computergraphics.stackexchange.com/questions/2494/in-a-physically-based-brdf-what-vector-should-be-used-to-compute-the-fresnel-co explained as above, it's crystal clear ๐Ÿ˜ƒ The reason we don't need kS is not "as it is included in the F term", rather it's because F term is kS, it's all about the nature of specular reflectance. When you split each term and try to visualize them, you could only expect that the visual result is depending on the parameters of the term's formula, rather than the complete visual appearance. When you put a point light behind the sphere, you could only count on the macro Visibility term (or macro Geometry term) to occlude the light, which in microfacet theory the macro V term is made up by micro D and G term. F term did nothing here.
  2. zhangdoa

    3D Modern rendering process

    There are couple-few nice SIGGRAPH courses covered the topic in some sort of degrees, for example: Physically Based Shading at Disney Extending the Disney BRDF to a BSDF with Integrated Subsurface Scattering Practical Multilayered Materials in Call of Duty: Infinite Warfare
  3. "Sky light" is a vague term, let's use sky irradiance instead๐Ÿ˜€. The injection of sky irradiance into the voxel irradiance could be approximated by a linear addition: L_voxel = L_voxel_local + Co_sky * V_sky * L_sky. L_voxel_local is all the local irradiance you gathered by your Voxel Cone Tracing and other local illumination techniques. The L_sky is the sky irradiance. You could get sky irradiance directly from the cubemap of your sky by some downsample and convolution processes, and then you could inject the sky irradiance into your voxel's irradiance later. Depends on how you stored the sky irradiance (Sphere harmonics? Sphere Gaussians? HL2?), the Co_sky - sky irradiance coefficient would variant from some simple constants to some normal-interleaved variables. When calculating the sky irradiance, the diffuse part it's just a simple cosine-weighted convolution, for specular part you could use the "split-sum" trick which was first introduced in "Real Shading in Unreal Engine 4" by Brian Karis in SIGGRAPH 2013. The V_sky is the "sky visibility". In order to get the "sky visibility" or "shadow mask" of each voxel, you could reuse the directional light shadow information (or any kind of voxelized shadow techniques' results if you had) from any previous pass (if you've already had some shadow passes). The algorithm I list above won't be too expensive, it's basically just some IBL techniques running fully in real-time. If your sky irradiance didn't change in a quite rapid way, you could use some temporal techniques to reuse some previous frame's data in order to boost the performance. It's the diffuse part of the sky irradiance. It's the question about the multiple-light-bounce problem in Global Illumination already. Since you are playing with some voxel techniques, I guess it's won't be soo difficult for you to figure out how to simulate the light propagation among different voxels. The real problem may be how to effectively calculate the distant voxel's irradiance and integrate them to the nearby voxels. I'd choose cascaded + temporal solutions maybe, but again there won't be a silver bullet, personally speaking I would recommend you to take a look at Scalable Real-Time Global Illumination for Large Scenes in GDC 2019, they implemented some amazing voxelization GI solutions in real-time, and maybe they could answer your question better than me๐Ÿ˜….
  4. Your texCoord is unused. You are actually using input.TexCoord. Besides, your texture coordinate calculation seems not so correct. If you really read some articles carefully you should understand Perspective Division is used to represent point from Homogeneous coordinates to Cartesian coordinates. Or in a more mathematical point of view, transform an element in projective space to Euclidean space. Because after projection matrix all points in your 3D Euclidean space going to a 4D projective space, you need to transform them back to get your texture coordinates later. The Perspective Division has been done to the SV_POSITION if you are trying to use it in the pixel shader. (https://docs.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-semantics)
  5. What MJP talked about is basically the algorithm of the voxelization on GPU, the 22nd chapter in "OpenGL Insights", Octree-Based Sparse Voxelization Using the GPU Hardware Rasterizer covered all the fundamental theory and implementation. You may need conservative rasterization for a better voxelization result, whether by handcrafting it in the geometry shader or some hardware alternatives depends on the specific API you used.
  6. In the garbage data, the m_DS and m_GS pointed to some memory range of the d3d11_3SDKLayers.dll, that looks like some linkage errors. I've occurred such similar scenario (very rare) when the incremental linkage feature was activated and the linker can't resolve the segment offset correctly, then in debug runtime some weird memory addresses were pop up. Try to fully cleanly rebuild the solution and see if it's solved.
  7. zhangdoa

    DX12 Use Texture2DArray in D3D12

    Sorry for my misinterpreting answer๐Ÿ˜…. In D3D12 creating a texture has more steps than D3D11, you basically need: 1. Reserve a heap memory for the resource. It could be your main memory or your dedicated video card memory, depends on the target platform memory architecture (UDMA?/DMA?) and the creation info you specified; Different heap type has different CPU/GPU accessibility; 2. Create a resource handle. You will get an ID3D12Resource* as similar as ID3D11Texture2D* for further binding or other operations; 3. Upload the texture data to the reserved heap; 4. Transit the resource barrier of your texture resource handle to the final usage stage; 5. Create the SRV or UAV by your usage case. You have 2 or more choices to implement the 1st and 2nd steps: A. Using ID3D12Device::CreateHeap for the 1st step, and using ID3D12Device::CreatePlacedResource for the 2nd step; B. Using ID3D12Device::CreateCommittedResource for a combined result of 1st and 2nd steps. When implementing the 3rd step: As @pcmaster mentioned, you could map-write-unmap, but your resources must stay in a heap that CPU is writable (the D3D12_CPU_PAGE_PROPERTY is not D3D12_CPU_PAGE_PROPERTY_NOT_AVAILABLE), so it should be an Upload heap or a Readback heap. Then the better solution is, create an Upload heap, upload your resource to it and then issue an ID3D12GraphicsCommandList::CopyResource command to copy it to the Default heap in order to get the best GPU accessibility. You would need another temporary ID3D12Resource* for the resource inside the upload heap, it would be created by the same processes in the 1st and 2nd steps. You have 2 or more choices to create and upload the texture data to the Upload heap resource: A. Create the Upload heap and the Upload heap resource handle by your own, and then map-write-unmap; B. Create the Upload heap by your own, then use UpdateSubresources method provided by d3dx12.h to upload. The 4th step is easy: ID3D12GraphicsCommandList::ResourceBarrier. The 5th step requires you to create SRV or UAV in a Descriptor Heap, this is another topic but generally speaking, if you could survive from the texture creation process above, then that won't be a problem. Also, all command execution need you to take explicit care about synchronization. I suggest you take a look at the DirectX 12 Graphics samples, there should be some real code examples. If anyone found any mistakes please point out, thanks! (MESSY D3D12๐Ÿ˜…)
  8. zhangdoa

    DX12 Use Texture2DArray in D3D12

    If you were familiar with DDSTextureLoader for DirectX 11, then you could take a look at the same tool in DirectXTK12.
  9. As @wintertime has already mentioned, template is a good idea to solve the polymorphism problem in compile-time. Basically, if you could combine composition over inheritance, function overloading and template specialization in an elegant way then you would get a similar or even better performance than vptr and vtable old school games, DOD is not always against OOD ๐Ÿ˜€. If you want to use std::vector efficiently, then manage heap memory allocation "by yourself". It's not about how to implement some complex scary stuff, just design some allocation tracker or wrapper classes at least (the default std::allocator implementation of MSVC 19.xx on my computer is just a wrapper around the global new()), and use some factory to instantiate your component vectors, then you could fully ensure that they would be put into coherent heap memory ranges. Another choice is using a pre-allocated std::array if your component's maximum count is finite (but it's already doesn't matter whether it's an std::array or std::vector). Or you could implement an object pool with pre-allocated raw heap memory. Again, all options are possible, just you need to find the best one for your situation. Main memory is typically (almost actually) a DRAM, while cache memories inside the CPU chip are typically (almost actually) SRAM, just be aware that whether how you represent the array structure in your C++ code, the physics of the hardware would always have the same behavior, all the overburdens come from the abstraction. Who paid the bill of your sort operation ๐Ÿ˜€? Any std::sort stuff? Did you have any overloaded operator==() or operator<()? Do you sort without any comparison? If so, that's awesome! But if not, then you still have the cost of branching at a certain moment when comparison happens ๐Ÿ˜€. I'm fully agreed with DOD as far as Mike Acton broadcasted the idea louder in his cppcon talk, also I've read the Bitsquid's blog post long time ago while I was also ECS-ing at that moment (His post is awesome btw!). Just my experiences told me that nothing in reality is really as simple and elegant as the example codes they are. When you move your head to the products, you always need to complex things up or compromise things down. One major "con" of DOD, or of ECS, is it quite rely on a well designed Software Model (Or a brain blew up programmer ๐Ÿคฏ), you must model the Domain Model into computer-friendly rather than programmer-friendly tasks, then you have to profile often in order to make sure that your design is really cache-friendly. If you have a further interest you could read chapter 6 of Computer Systems: A Programmer's Perspective, it discussed the memory and cache related topics thoroughly ๐Ÿบ.
  10. Inheritance also could achieve the so-called DOD, while it's more flexible to choose a composition approach. e.g.: struct A { int foo; }; struct B { A a; int bar; }; struct C : public A { int bar; }; Would you consider to use some sort of the Singleton Pattern for the aggregation of the component tables? If you could implement any kind of mechanisms that ensures different std::vector<T> could be allocated adjacent to each other in heap memory, then they would achieve the same result. Don't build the wheel by yourself if the language and the standard library has already built it for you. What you are talking about is almost a custom heap memory management module. Take a look at how to implement your own std::allocator if you want to keep sticking with STL. How do you know this approach actually minimizes branching? Is branching hurt your performance? Is there any practical profiling results that support your design? Are you sure this kind of design approach suitable enough for your target CPU architecture? There is no silver bullet for a one-in-one-out factory, at least hard to design. My personal idea about the object instantiation is tended to keep things simple, that I would parse the object creation information to the final stage when they could be put into the component's data directly, until then I won't care about the immediate data because they are temporary garbage, I'd use some RAII or other things to get rid of the footprint. Again, it's a situational-oriented solution for me, I don't always SoA so much if they are non-sense for certain business, just let the L2-cache hit-rate misses like hell until it really hurts the performance significantly. Don't be a fundamentalist of DOD, function pointers, lambda, callable objects and whatever, they are also a kind of data, the code is data, the procedure is data, everything is data when they are consumed by a certain module. If you're targeting C++11 or later, why not take a look at std::function and std::packaged_task? Functional Programming is a naturally good friend of DOD! Finally, ECS is just ECS, it's not a worthwhile solution if it is not worthwhile for a certain problem. Hope my mumbling could help you a little bit, happy coding๐Ÿ‘จโ€๐Ÿ’ป๐Ÿ‘ฉโ€๐Ÿ’ป
  11. zhangdoa

    making sound support in my engine

    A generally compatible enough choice is OpenAL, an equivalent of OpenGL. The vendor or community's implementation (OpenAL Soft) could cover all your engine's target platforms. Few cons: macOS would deprecate OpenAL support from version 10.15; The community is less active than OpenGL's; No further successor standard so far. Personally I recommend you take a look at FMOD or Wwise, they are both widely adopted by the industry nowadays, and have more advantages around the maintenances, user/dev community, and software maturities. Wwise encapsulated the low-level audio business tighter than FMOD, in contrast, it's easier to get your hand dirty through FMOD. They both use Event-Driven design for high-level communication between the host application and themselves, and they both have the (almost) unified implementation across different platforms. But since they are targeting the actual products so the API is more verbose and messy, the learning curve would be a little steep.
  12. CPU related questions: How's about the cost of MapBuffer() in Debug build? What's the difference between the different compiler optimization level? Do you need to consider about to optimize the O(m*n) for-loop? Do you have to submit the data of every single sprite every frame? Could you identify and optimize out any unnecessary temporary variables like the return value of std::vector<T>::size(), or any unnecessary and expensive copy construction? GPU related questions: What's the buffer usage pattern (GL_STATIC_DRAW/etc) you specified when create and upload the vertex buffer? What's the mapping flag (GL_MAP_PERSISTENT_BIT/etc) you specified when map the vertex buffer? How do you handle the CPU-GPU synchronization between CPU-write operation and GPU-read operation? Is there any double buffering/triple buffering you've implemented? And could you share the blog post you're referencing to?
  • Advertisement

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net isย your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!