Jump to content
  • Advertisement

zhangdoa

Member
  • Content Count

    18
  • Joined

  • Last visited

Everything posted by zhangdoa

  1. What MJP talked about is basically the algorithm of the voxelization on GPU, the 22nd chapter in "OpenGL Insights", Octree-Based Sparse Voxelization Using the GPU Hardware Rasterizer covered all the fundamental theory and implementation. You may need conservative rasterization for a better voxelization result, whether by handcrafting it in the geometry shader or some hardware alternatives depends on the specific API you used.
  2. In the garbage data, the m_DS and m_GS pointed to some memory range of the d3d11_3SDKLayers.dll, that looks like some linkage errors. I've occurred such similar scenario (very rare) when the incremental linkage feature was activated and the linker can't resolve the segment offset correctly, then in debug runtime some weird memory addresses were pop up. Try to fully cleanly rebuild the solution and see if it's solved.
  3. zhangdoa

    Use Texture2DArray in D3D12

    Sorry for my misinterpreting answer😅. In D3D12 creating a texture has more steps than D3D11, you basically need: 1. Reserve a heap memory for the resource. It could be your main memory or your dedicated video card memory, depends on the target platform memory architecture (UDMA?/DMA?) and the creation info you specified; Different heap type has different CPU/GPU accessibility; 2. Create a resource handle. You will get an ID3D12Resource* as similar as ID3D11Texture2D* for further binding or other operations; 3. Upload the texture data to the reserved heap; 4. Transit the resource barrier of your texture resource handle to the final usage stage; 5. Create the SRV or UAV by your usage case. You have 2 or more choices to implement the 1st and 2nd steps: A. Using ID3D12Device::CreateHeap for the 1st step, and using ID3D12Device::CreatePlacedResource for the 2nd step; B. Using ID3D12Device::CreateCommittedResource for a combined result of 1st and 2nd steps. When implementing the 3rd step: As @pcmaster mentioned, you could map-write-unmap, but your resources must stay in a heap that CPU is writable (the D3D12_CPU_PAGE_PROPERTY is not D3D12_CPU_PAGE_PROPERTY_NOT_AVAILABLE), so it should be an Upload heap or a Readback heap. Then the better solution is, create an Upload heap, upload your resource to it and then issue an ID3D12GraphicsCommandList::CopyResource command to copy it to the Default heap in order to get the best GPU accessibility. You would need another temporary ID3D12Resource* for the resource inside the upload heap, it would be created by the same processes in the 1st and 2nd steps. You have 2 or more choices to create and upload the texture data to the Upload heap resource: A. Create the Upload heap and the Upload heap resource handle by your own, and then map-write-unmap; B. Create the Upload heap by your own, then use UpdateSubresources method provided by d3dx12.h to upload. The 4th step is easy: ID3D12GraphicsCommandList::ResourceBarrier. The 5th step requires you to create SRV or UAV in a Descriptor Heap, this is another topic but generally speaking, if you could survive from the texture creation process above, then that won't be a problem. Also, all command execution need you to take explicit care about synchronization. I suggest you take a look at the DirectX 12 Graphics samples, there should be some real code examples. If anyone found any mistakes please point out, thanks! (MESSY D3D12😅)
  4. zhangdoa

    Use Texture2DArray in D3D12

    If you were familiar with DDSTextureLoader for DirectX 11, then you could take a look at the same tool in DirectXTK12.
  5. As @wintertime has already mentioned, template is a good idea to solve the polymorphism problem in compile-time. Basically, if you could combine composition over inheritance, function overloading and template specialization in an elegant way then you would get a similar or even better performance than vptr and vtable old school games, DOD is not always against OOD 😀. If you want to use std::vector efficiently, then manage heap memory allocation "by yourself". It's not about how to implement some complex scary stuff, just design some allocation tracker or wrapper classes at least (the default std::allocator implementation of MSVC 19.xx on my computer is just a wrapper around the global new()), and use some factory to instantiate your component vectors, then you could fully ensure that they would be put into coherent heap memory ranges. Another choice is using a pre-allocated std::array if your component's maximum count is finite (but it's already doesn't matter whether it's an std::array or std::vector). Or you could implement an object pool with pre-allocated raw heap memory. Again, all options are possible, just you need to find the best one for your situation. Main memory is typically (almost actually) a DRAM, while cache memories inside the CPU chip are typically (almost actually) SRAM, just be aware that whether how you represent the array structure in your C++ code, the physics of the hardware would always have the same behavior, all the overburdens come from the abstraction. Who paid the bill of your sort operation 😀? Any std::sort stuff? Did you have any overloaded operator==() or operator<()? Do you sort without any comparison? If so, that's awesome! But if not, then you still have the cost of branching at a certain moment when comparison happens 😀. I'm fully agreed with DOD as far as Mike Acton broadcasted the idea louder in his cppcon talk, also I've read the Bitsquid's blog post long time ago while I was also ECS-ing at that moment (His post is awesome btw!). Just my experiences told me that nothing in reality is really as simple and elegant as the example codes they are. When you move your head to the products, you always need to complex things up or compromise things down. One major "con" of DOD, or of ECS, is it quite rely on a well designed Software Model (Or a brain blew up programmer 🤯), you must model the Domain Model into computer-friendly rather than programmer-friendly tasks, then you have to profile often in order to make sure that your design is really cache-friendly. If you have a further interest you could read chapter 6 of Computer Systems: A Programmer's Perspective, it discussed the memory and cache related topics thoroughly 🍺.
  6. Inheritance also could achieve the so-called DOD, while it's more flexible to choose a composition approach. e.g.: struct A { int foo; }; struct B { A a; int bar; }; struct C : public A { int bar; }; Would you consider to use some sort of the Singleton Pattern for the aggregation of the component tables? If you could implement any kind of mechanisms that ensures different std::vector<T> could be allocated adjacent to each other in heap memory, then they would achieve the same result. Don't build the wheel by yourself if the language and the standard library has already built it for you. What you are talking about is almost a custom heap memory management module. Take a look at how to implement your own std::allocator if you want to keep sticking with STL. How do you know this approach actually minimizes branching? Is branching hurt your performance? Is there any practical profiling results that support your design? Are you sure this kind of design approach suitable enough for your target CPU architecture? There is no silver bullet for a one-in-one-out factory, at least hard to design. My personal idea about the object instantiation is tended to keep things simple, that I would parse the object creation information to the final stage when they could be put into the component's data directly, until then I won't care about the immediate data because they are temporary garbage, I'd use some RAII or other things to get rid of the footprint. Again, it's a situational-oriented solution for me, I don't always SoA so much if they are non-sense for certain business, just let the L2-cache hit-rate misses like hell until it really hurts the performance significantly. Don't be a fundamentalist of DOD, function pointers, lambda, callable objects and whatever, they are also a kind of data, the code is data, the procedure is data, everything is data when they are consumed by a certain module. If you're targeting C++11 or later, why not take a look at std::function and std::packaged_task? Functional Programming is a naturally good friend of DOD! Finally, ECS is just ECS, it's not a worthwhile solution if it is not worthwhile for a certain problem. Hope my mumbling could help you a little bit, happy coding👨‍💻👩‍💻
  7. zhangdoa

    making sound support in my engine

    A generally compatible enough choice is OpenAL, an equivalent of OpenGL. The vendor or community's implementation (OpenAL Soft) could cover all your engine's target platforms. Few cons: macOS would deprecate OpenAL support from version 10.15; The community is less active than OpenGL's; No further successor standard so far. Personally I recommend you take a look at FMOD or Wwise, they are both widely adopted by the industry nowadays, and have more advantages around the maintenances, user/dev community, and software maturities. Wwise encapsulated the low-level audio business tighter than FMOD, in contrast, it's easier to get your hand dirty through FMOD. They both use Event-Driven design for high-level communication between the host application and themselves, and they both have the (almost) unified implementation across different platforms. But since they are targeting the actual products so the API is more verbose and messy, the learning curve would be a little steep.
  8. CPU related questions: How's about the cost of MapBuffer() in Debug build? What's the difference between the different compiler optimization level? Do you need to consider about to optimize the O(m*n) for-loop? Do you have to submit the data of every single sprite every frame? Could you identify and optimize out any unnecessary temporary variables like the return value of std::vector<T>::size(), or any unnecessary and expensive copy construction? GPU related questions: What's the buffer usage pattern (GL_STATIC_DRAW/etc) you specified when create and upload the vertex buffer? What's the mapping flag (GL_MAP_PERSISTENT_BIT/etc) you specified when map the vertex buffer? How do you handle the CPU-GPU synchronization between CPU-write operation and GPU-read operation? Is there any double buffering/triple buffering you've implemented? And could you share the blog post you're referencing to?
  9. Question: What is the reason that you have to do some "parallel computation friendly" looks-like work on the CPU? Is there any necessary simulation needed to be run on the CPU before you drawing your sprites? Do you have to use different vertex positions and colors for each sprite instance? If you just want to draw one topologically identical mesh with multiple instances on the screen, I highly recommend you spend some time to investigate "Instance Rendering" and "Indirect Rendering" techniques. Both of them are intended to solve CPU-GPU draw-call related bottlenecks. It's better just create one quad-shape mesh instance as the billboard, then use multiple transformation matrices to draw different instances if your case was such kind of. Also, I highly recommend you try to re-design your rendering pipeline to something like "Gathering per-object data on CPU side->Upload a large buffer to GPU once->Bind GPU data with range and offset in the large buffer->Issue draw call", leave some of the parallel work to GPU rather than craft them by your hand on CPU.
  10. Hey Thomas, I'd like to share some of my own architecture design ideas about how I implement "ECS" in my engine. For me I first followed the definition of ECS without too much alteration, thus the Entity is ID, the Component is POD of some logically coherent data and the System is the manager of the component instances. But later I realized and observed that this architecture implicates another important rule, that you must design the abstraction level of the system very clearly. That means you can't easily use your gameplay related components directly in your lower-level systems, because for the lower-level systems such like rendering or network system the data they needed actually is some deduced data from your higher-level components. So if you organized your higher-level data to some human-friendly components, then you may need to add additional intermediate components for lower-level systems to access in a more efficient way; or, you may need to separate or aggregate several of your higher-level components to more or fewer components until you could reduce the redundant traversal over the entire component pool. My current system design has two different responsibility types, one is focusing on the single component type related business, such like LightComponent and LightComponentSystem, which only have the responsibility to manage the corresponding component's data change, and they have the producer role to produce intermediate data for other lower-level consumer systems to use; another one is focusing on tasks and policies like rendering, physics simulation, and OS event handling, they typically only have the consumer role. Because my engine is a job-based parallel model currently so each system's time-domain dependency is naturally handled (You could reference to Naughty Dog or Budgie's talk and publication for more information). The whole data processing pipeline won't have too many duplicated operations due to this factor, because each upstream systems only iterate some components once and feed the result to the downstream systems when the time is suitable. And in my personal point of view, nowadays I'd rather consider ECS as an architectural result of the Functional Programming paradigm and DOD, if you build your game and engine in a quite FP mindset then sooner or later you'll invented ECS by yourself, don't be limited by it and try to use the more general philosophy behind to solve your problems. Hope it could help you a little. Happy coding🕹️🎮
  11. In my engine's graphics module I designed a ResourceBinder class to abstract all the views/descriptors around all the underlying APIs. The real mesh, texture and buffer wrapper classes are designed just as some Mesh/Texture/GPUBufferDataComponent classes (may unify to one GPUMemoryComponent in the future) which only owns an OpenGL/DirectX/Vulkan resource pointer or handle and a (few) ResourceBinder references. There are no ActivateTexture or PSSetSRV or BindSomethingElse interface, instead only an ActivateResourceBinder that user-level code could access, and all the polymorphism/implementation details would be resolved in runtime (or compile-time in the future maybe). If you're familiar with DirectX (especially DX12) or Vulkan thus I assume you've managed heap video memory explicitly, because of this kind of trends that I thought why we bother ourselves with some "Mesh"/"Texture"/"ConstantStructuredByteOffsetBlaBlaBuffer" blob classes design, what we are doing every day is just uploading some bytes to GPU memory and issuing some computation tasks to GPU, what we need at final are just a raw GPU memory address and some different "views" of that memory for different usages. What we need to change is our OpenGL 2.1 mind! Rendering client code example: //Start to record commands... auto l_renderingServer = g_pModuleManager->getRenderingServer(); //m_SDC, SamplerDataComponent, a sampler wrapper //m_RPDC, RenderPassDataComponent, an aggregation of render target textures, pipeline state object and shader object //l_CameraGBDC, GPUBufferDataCompoent, our friends whose name are UBO/SSBO/ConstantBuffer/StructuredBuffer... l_renderingServer->CommandListBegin(m_RPDC, 0); l_renderingServer->ActivateResourceBinder(m_RPDC, ShaderStage::Pixel, m_SDC->m_ResourceBinder, 17, 0); l_renderingServer->ActivateResourceBinder(m_RPDC, ShaderStage::Pixel, l_CameraGBDC->m_ResourceBinder, 0, 0, Accessibility::ReadOnly); l_renderingServer->ActivateResourceBinder(m_RPDC, ShaderStage::Pixel, SunShadowPass::GetRPDC()->m_RenderTargetsResourceBinders[0], 13, 7); l_renderingServer->DispatchDrawCall(m_RPDC, a_mesh_from_nowhere); // Deactivate ResourceBinder... l_renderingServer->CommandListEnd(m_RPDC); //Execute commands when it's a good day... And the ResourceBinder classes: class IResourceBinder { public: ResourceBinderType m_ResourceBinderType = ResourceBinderType::Sampler; Accessibility m_GPUAccessibility = Accessibility::ReadOnly; size_t m_ElementCount = 0; size_t m_ElementSize = 0; size_t m_TotalSize = 0; }; class DX12ResourceBinder : public IResourceBinder { public: D3D12_CPU_DESCRIPTOR_HANDLE m_CPUHandle; D3D12_GPU_DESCRIPTOR_HANDLE m_GPUHandle; }; // Use union or template class would be a little bit better class DX11ResourceBinder : public IResourceBinder { public: ID3D11SamplerState* m_Sampler = 0; ID3D11ShaderResourceView* m_SRV = 0; ID3D11UnorderedAccessView* m_UAV = 0; }; class GLResourceBinder : public IResourceBinder { public: GLuint m_Handle = 0; }; The real resources classes: class GPUBufferDataComponent //Or TextureDataComponent or blabla { public: //Sadly we can't change heap type in runtime freely at present Accessibility m_GPUAccessibility = Accessibility::ReadOnly; IResourceBinder** m_ResourceBinders = 0; /*Trivial member here... */ }; class DX12GPUBufferDataComponent : public GPUBufferDataComponent // Similar for DX11/OpenGL/Vulkan even Metal, not an API problem at all { public: ID3D12Resource* m_ResourceHandle = 0; /*The Descs... */ }; Literally speaking in these kinds of design you could create any kinds of and any number of ResourceBinders for the same region of GPU memory. A "cubemap" or a 6 slices 2D array "texture" or a "VertexBuffer", the only limitation is the underlying API you're targeting at (We are lucky, and we are unlucky). And of course, the cost, any kinds of abstraction is the cost, and any chasing for a general and flexible solution, any noobie code like what I showed above, they are all cost that haunting your mind at the midnight before you release your Minecraft. And then what I thought is, why bother again? Why don't we just stick with one API tightly straightly and nakedly without any daydream for free wrapping lunch?
  12. Hi everyone here, Hope you just had a great day with writing something shining in 60FPS:) I've found a great talk about the GI solution from Global Illumination in Tom Clancy's The Division, a GDC talk given by Ubisoft's Nikolay Stefanov. Everything looks nice but I have some unsolvable questions that, what is the "surfel" he talked about and how to represent the "surfel"? So far as I searched around there are only some academic papers which look not so close to my problem domain, the "surfel" those papers talked about is using the point as the topology primitives rather than triangled mesh. Are these "surfel"s the same terminology and concept? From 10:55 he introduced that they "store an explicit surfel list each probe 'sees'", that literally same as storing the surfel list of the first ray casting results that from the probe to the certain directions (which he mentioned just a few minutes later). So far I have a similar probe capturing stage during the GI baking process in my engine, I would get a G-buffer cubemap at each probe's position with facing 6 coordinate axes. But what I stored in the cubemap is the rasterized texel data of the world position normal albedo and so on, which is bounded by the resolution of the cubemap. Even I could tag some kind of the surface ID during the asset creation to mimic "surfel", still, they won't be accurately transferred to the "explicit surfel list each probe 'sees'" if I keep doing the traditional cubemap works. Do I need to ray cast on CPU to get an accurate result? Thanks for any kinds of help.
  13. zhangdoa

    Questions about surfel

    So far as I've implemented all the geometry data baking process the original talk introduced, I've observed that the overlap rate of the surfels is quite tightly coupled with the probe location and density, but the surfel de-duplication process has an acceptable time if we only eliminated the duplication in the finite area. And I have implemented a (part of) SVOGI module before, it just somehow inspired me that using some sort of voxelization maybe is also good for the offline geometry data baking, if we've already gazed at the surfel approach for some days.
  14. zhangdoa

    Questions about surfel

    Thanks Frantic PonE, I've heard about DDGI before but since it depends on GPU ray-tracing so I didn't evaluate more. And for the Ambient Dice it looks quite a nice alternative and advance of the traditional SH9 or HL2 approach, and I'd give it a try when I finished the GI pipeline in the future. Thanks for the references!
  15. zhangdoa

    Questions about surfel

    Thanks for the explanation MJP, If surfel is a sample point, then how to eliminate the sample rate/accuracy problem if using cubemap sampling? Since a texture must have limited resolution, thus if two probes see the same geometry in the different distance, they could "see the same exact point" but also would have possibilities that produce different surfels for the same region/triangle of the geometry. Or I shouldn't worry about the surfel duplication at this level?
  16. The OpenGL solution: WriteMemoryFromCPUSide(); for(auto object : objects) { offset = object->offset; // The 3rd parameter should be aligned to GL_UNIFORM_BUFFER_OFFSET_ALIGNMENT, and the 4th parameter should be at least GL_UNIFORM_BLOCK_SIZE_DATA glBindBufferRange(GL_UNIFORM_BUFFER, uniformBlockBindingPoint, UBO, offset * UBOElementSize, UBOElementSize); // issue drawcall }
  17. Hello everyone here in GameDev forums, It's my first post and I'm very happy to participate in the discussion here, thanks for everyone! My question is, is there any workaround that we could implement something in DX11 that similar to the DX12's CBV with the offset, and/or something in OpenGL similar to the dynamic UBO descriptor in Vulkan? My initial purpose is about to achieve a unified per-object resource updating design across different APIs in my engine. I've gathered all the per-object resource updating to a few of coherent memory write operations in DX12 and Vulkan rendering backends, and later record all the descriptor binding commands with the per-object offset. DX12 example code: WriteMemoryFromCPUSide(); for(auto object : objects) { offset = object->offset; commandListPtr->SetGraphicsRootConstantBufferView(startSlot, constantBufferPtr->GetGPUVirtualAddress() + offset * elementSize); // record drawcall } Vulkan example code: WriteMemoryFromCPUSide(); for(auto object : objects) { offset = object->offset; vkCmdBindDescriptorSets(commandBufferPtr, VK_PIPELINE_BIND_POINT_GRAPHICS, pipelineLayoutPtr, firstSet, setCount, descriptorSetPtr, dynamicOffsetCount, offset); // record drawcall } I have an idea to record the per-object offset as the index/UUID, then use explicit UBO/SSBO array in OpenGL and StructuredBuffer in DX11, but still, I need to submit the index to GPU memory at a certain moment. Hope everyone has a good day!
  18. Thanks for help MJP! For anyone who occurred this problem, here is the example code about how to bind partial ConstantBuffer in DX11: WriteMemoryFromCPUSide(); for(auto object : objects) { offset = object->offset; // Each shader constant is 16 bytes (4*32-bit components), and shader needs to access the constant with a count of 16, so we must provide the l_constantCount as the multiple of 16. // Assume constantBufferElementSize is the multiple of 16 (bytes) * 16 (counts) = 256 bytes, thus the l_constantCount would always be satisfied with the count-of-16 requirement, so the total constant count is unsigned int l_constantCount = constantBufferElementSize / 16; unsigned int l_firstConstant = offset * l_constantCount; deviceContext->VSSetConstantBuffers1(startSlot, 1, &ConstantBufferPtr, &l_firstConstant, &l_constantCount); // issue drawcall }
  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!