• Content count

  • Joined

  • Last visited

Community Reputation

5822 Excellent

About Hiwas

  • Rank

Personal Information

  • Interests
  1. Nested namespaces useless typing?

    I tend to agree with most in terms that 'it depends'. Take your path example, oddly enough that is exactly a case I run into very often. IO::Path:* can be appropriate given that you might also have AI::Path::* or DAG::Path::*. Yes, I could rely on prefixing such as "IOPath", "AIPath" or "DAGPath" and avoiding the namespace but to me the purpose of the namespace here is to allow each API to use the most appropriate and obvious names without stomping on each other. Once out of headers and in the CPP's where you use such things, simply typing in 'using namespace IO/AI/DAG' generally takes care of the extra typing involved. The rare exceptions where you need combinations, serialization comes to mind, those are the only places you usually need to continue typing in the prefixes. As to the depth of the namespaces. I tend to believe 2-3 deep is appropriate with the only exceptions usually being things like an internal namespace that pushes into the 4+ range. 2-3 may seem a bit much but it all comes out of usability needs. Probably the most common header name in nearly any library you get: '#include "Math.h[pp]"'. This is a common problem if you are using 3rd party libraries: who's math file are you including with that? So, in order to make sure I don't have that problem, there is *always* a prefix directory in my libraries: '#include "LIB/Math.hpp"' which guarantee's I get *my* math header file. Out of this flows the rule I use that including a file should be as close to namespace of the object it provides as possible. I.e. '#include "LIB/IO/Path.hpp"' gives me LIB::IO::Path ready for use. While my rules don't work for everyone and are a bit excessive for normal everyday work, there is one rule I suggest anyone follow: "Be Consistent". If you adopt a set of rules and styles, apply it everywhere with as few exceptions as possible. Consistency trumps individual decisions folks may dislike pretty much every time.
  2. Yes you can. This was actually the first approach I tried but it is very bad for the GPU in terms that it severely under-utilizes parallelism. As a quick and dirty 'get it running' solution it is fine though. If you read that tutorial again you might catch that they talk about there being implicit 'subpasses' in each render pass, it's kinda 'pre' 'your pass' 'post'. By setting dependencies on those hidden subpasses that is how you can control renderpass to renderpass dependency, or at least ordering. Nope, I still expose quite a bit of that for things not directly within a renderpass since it is common to all three API's, or null op where Metal does some of it for you. The most specific case is the initial upload of a texture where you still have to do the transition to CPU visible, copy the data, transition to GPU visible copy to GPU only memory. While I have considered hiding all these items the goal is to keep the adapter as close to the underlying API's as possible and maintain the free threaded externally synchronized model of DX12/Vulkan and to a lesser degree Metal. Hiding transitions and such would mean building more thread handling and synchronization into that low level than I desire. This would be detrimental since I'm generating command buffers via 32 threads (Thread ripper is fun) which would really suck if the lowest level adapter was introducing synchronization at the CPU level in order to automate hiding a couple barrier & transition calls. Long story short, the middle layer 'rendering engine' will probably hide all those details. I just haven't really bothered much and hand code a lot of it since I'm more focused on game systems than on the rendering right now.
  3. The short answer is that you are likely to have problems and will need to do something to introduce dependencies between render passes. The longer answer is that this is very driver specific and you might get away with it, at least for a while. The problem here is that unlike Dx12, Vulkan does do a little driver level work to help you out which you usually want to duplicate in Dx12 yourself anyway. Basically though, if you issue 10 render passes, the Vulkan driver does not have to heed the order you sent them to the queue unless you have supplied some form of dependency information or explicit synchronization. Vulkan is allowed to, and probably will, issue the renderpasses out of order, in parallel or completely ass backwards depending on the drivers involved. Obviously this means that the draw commands associated with each begin/next subpass executes out of order. When I implemented my wrapper, I ended up duplicating the entire concept of render/sub passes in Dx12, it was the most appropriate solution I could find which would allow me to solve the various issues on the three primary rendering API's I wanted to support. The primary reason for putting the work into this was exactly the problems you are asking questions about. At least in my case, when I really dug into and understood the render passes I realized I had basically just been doing exactly the same things except in my rendering code. By pushing it down to an abstraction it cleaned things up quite a lot and made for a much cleaner API. Additionally, at a later time, it will make it considerably easier to deal with optimizing for better parallelism and GPU utilization since all the transitions and issuance is in one place that I can get clever with, without having to re-organize large swaths of code. So, yup, I'd suggest you consider implementing the subpass portion because it has a lot of benefits and solves the problem you are asking about.
  4. Engine Core Features

    Hardware/OS specific items should still have an interface layer in your core such that it is consistent no matter how you implement the backend. In regards to that, you really should add input device handling (keyboard, mouse & joystick) and window management items. Depending on goals, a lot of folks roll window management into the rendering engine but I tend to think it should be separated for quite a few reasons. Just my $0.02.
  5. I guess I didn't explain it well enough or our terminology is getting mixed up. What you are describing is a multi-step process merged into a single step which is inherently going to have redundant data and cause this sort of problem. Break down the pipeline into several steps and this is what I usually end up with: Intermediate (My 'all' format or Collada, Obj whatever) -> Basic prepared data (All data but in a GPU usable form, just not optimized and data which will be ignored by various materials.) -> Material bound data (Unused data removed and optionally full vertex cache optimization.) -> GPU ready What I was describing is that the offline asset processing only does the first two steps, the material binding that reduces the vertex elements to only what is needed is what I was describing as the runtime cache portion. There are a number of reasons to leave the material binding till runtime, primary among these is that the runtime is the only thing which actually knows what makes sense. For instance, if I use a mesh with a pipeline that expects uv's and another pipeline that is the same except it doesn't use UV's, it is generally best to just reuse the same mesh and ignore the uv's in the second case. All said and done, this is a case of pre-mature optimization until you actually understand what the game needs and what combinations make the most sense. So, a little runtime cost is well worth it until later.
  6. I look at this a little different. I tend to break this into two separate items, what you have and then another class which represents what the graphics API expects. The general idea is that from Max/Maya I spit out the intermediate structure which contains all the data available. This is a 'slow' item since it is bloated and not formatted in a manner usable by the graphics API's. Then I create the low level immutable graphics representation from the intermediate data which has done all the copies and interleaving you are mentioning. This does mean that when I ask to render a mesh I load the big bloated data and perform the conversion step. This probably sounds like what you are already doing, but there is a trick here. When I make a request I generate a unique hash that represents the intermediate mesh, the target graphics api and any custom settings involved with the intermediate to graphics conversion process. I then look in a cache for that key, if I have already performed the conversion I just grab the immutable cache version and use it, if I have not, perform the one time conversion and store it in the cache, potentially even saving it to disk for next run. Later in development, or whenever it becomes appropriate, you can offline generate all these variations, remove the intermediates and 'only' use the final graphics data. This split becomes your asset pipeline eventually and if you leave the intermediate handling in engine, you can still use it for fallbacks to older API capabilities as needed. A one time startup and processing overhead is not too much to ask of the end user, so long as it is not hours of processing of course.
  7. The cost of virtual functions are usually greatly exaggerated in many posts on the subject. That is not to say they are free but assuming they are evil is simply short sighted. Basically you should only concern yourself with the overhead if you think the function in question is going to be called >10000 times a frame for instance. An example, say I have the two API calls: "virtual void AddVertex(Vector3& v);" & "virtual void AddVertices(Vector<Vector3>& vs);" If you add 100000 vertices with the first call the overhead of the indirection and lack of inlining is going to kill your performance. On the other hand, if you fill the vector with the vertices (where the addition is able to be inlined and optimized by the compiler) and then use the second call, there is very little overhead to be concerned with. So, given that the 3D API's do not supply individual vertex get/set functions anymore and everything is in bulk containers such as vertex buffers and index buffers, there is almost nothing to be worried about regarding usage of virtual functions. My API wrapper around DX12, Vulkan & Metal is behind a bunch of pure virtual interfaces and performance does not change when I compile the DX12 lib statically and remove the interface layer. As such, I'm fairly confident that you should have no problems unless you do something silly like the above example. Just keep in mind there are many caveats involved in this due to CPU variations, memory speed, cache hit/miss based on usage patterns etc and the only true way to get numbers is to profile something working. I would consider my comments as rule of thumb safety in most cases though.
  8. While folks are correct that this is the poster child use case of mutable, keep in mind that the contract for mutable has changed in C++ 11 if this code is ever to be multi-threaded. As of C++ 11, the contract for mutable now also includes a statement of thread safety. A use case such as this in a multi-threaded engine will likely fail pretty miserably and you need to protect the cacheResult_ value. I'm only pointing this out 'in case' you intend to multi thread any of this code, if not it doesn't impact you..
  9. In a general way, that is fairly close to a very simplistic solution. Unfortunately at this level it is really all about how clever the drivers get when they solve the path through the dag generated by the subpasses. They could do the very simplistic solution of just issuing a vkCmdPipelineBarrier with top and bottom of pipe flags set between subpasses with dependencies or they could look at the subpass attachments in detail and figure out a more refined approach. Since this is all just a state transition chain, building a simple DAG allows for a much more optimized approach to issuing a mix of pipeline and memory barriers. I can't find the article I remember that describes some of this but this one may be of interest: https://gpuopen.com/vulkan-barriers-explained/ as it is related.
  10. In the subpass descriptions you have arrays of VkAttachmentReference which is a uint and layout. The uint is the 0 based index into the VkRenderPassCreateInfo structure's pAttachment array where you listed all of the attachments for the render pass. So, effectively, what I'm saying with those is: // assume you have pRenderPass and pSubPass pointers to the respective Vk structures. theImageWeWantToMessWith = pRenderPass->pAttachments[ pSubPass->pInputAttachments.attachment ] That is effectively what is going on behind the scenes to figure out which image to call memory barriers on. So, when I said attachment 0 and 1, I was talking about the index into the VkRenderPassCreateInfo structure's pAttachments array. Note that render pass info does not separate inputs/outputs etc, it just takes one big list, only subpasses care about usage. Hope that clarifies things.
  11. I recently wrote an abstraction for this mechanism so my graphics API would not be D3D12 specific. Given that, I can only really describe this from the point of view of writing the code but since things seem to be working, I believe the details I figured out are pretty close to accurate. First off, you need to look at the three related info structures again since they most certainly do tell you exactly which images are being referenced, it is just a bit indirect. Basically there is an array of all images used in the overall pass found in the render pass info structure, sub passes reference these images via 0 based indexing. As to the behavior, at the start and end of each subpass the API issues an image transition barrier if needed to put the attachment in the requested format. So, for instance, if you were doing a post processing blur, you might end up with the following chain of events: NextSubPass Transition attachment 0 to writable .. Draw your scene NextSubPass Transition attachment 0 to readable Transition attachment 1 to writable .. Draw post processing quad to run vertical blur with input attachment 0 and output attachment 1 NextSubPass Transition attachment 0 to writable Transition attachment 1 to readable .. Draw post processing quad to run horizontal blur with input attachment 1 and output attachment 0 So the attachments involved are ping ponging from readable to writable as required for the post processing to occur. Hopefully this makes sense and helps you out. I had to look at those structures quite a few times till I figured out the details. The structures themselves are pretty simple, it's just the relationships that are hard to see until you try and fail a couple times to get the correct behavior.
  12. If I'm understanding the problem, anytime you mess with the vectors of RenderPassContainer, PipelineContainer or Material, you are going to have a bunch of invalid pointers correct? I.e. this is the standard problem of iterator invalidation after modification in most stl containers. There are a number of ways around the issue but you need to clarify your intentions. The first and easiest, if you don't need the content to be contiguous in memory, is to remove ownership from the containers and use pointers instead. The extra indirection solves the problem for the most part at a minimal cost, given that this sort of thing is a couple hundred times a frame, it should have limited to no noticeable overhead. A second solution which is a bit more complicated but maintains the linear memory layout would be not using pointers but instead use indexes into the arrays. Adding new items will work without problems, removing items just means walking through all the vectors looking for indices >= the removed item and erasing it or subtracting one. This means that you need to move the add/remove interface to a top level owner of all the vectors so it can iterate them, but that's generally a good idea to centralize the API anyway. The third solution is even a little more complicated but has properties which I needed in my system. I extend the index idea by adding a version tag to each index. Since I'm promising myself I will never have more than 65k materials in the system at any given time, this handle is a simple 32 bit value, 16 bits of index and 16 bits of version. Now, when I go to get the material via the handle, I first check that the version stored in the handle and the version in the slot match, if not I return nullptr. If the callers gets a nullptr they look up the material by hash and assuming is still exists they fix their internal handle. The reason for this solution is that I only ever add/remove things dynamically in tools or debug builds and the whole check and re-fetch thing compiles out to nothing in release, but it is still fast enough that debug builds are not hobbled by a bunch of overhead. Again though, this all circles back to what are the requirements for you. I'd personally start with the second solution as it is easy, fast and leaves the important properties of your layout in place. The third solution is not suggested unless you start doing a lot of hot reloading, which is why I wanted it.
  13. C# OOP in game programming

    Well, the other option which I tend to prefer is none of the above. I try and encapsulate the concept of doing damage as a third object in the group. The intention is to keep the details of how damage is calculated out of the entities so all the rules are in one place instead of split between attack and receive damage functions. It also means that weapons can generate these objects such that you can have different rules for different weapons, or even multiple damage objects being generated by the weapon. Additionally, this allows a better data driven design since you write a few damage type objects, parameterize them and then just fill in the details for each new weapon. The utility of this of course depends on your type of game. If you only have 5 weapons and they are generally just remove damage till zero, there is no reason to do this. If you intend to have 10+ different weapons and many variations, that's when the separation becomes well worth the more indirect approach.
  14. You answered the question, just run a post process pass which sends the texture to the swap chain(s). That is likely the best, if not the only, method to get the images setup correctly. it is also quite fast, your network transfer is going to be the big bottleneck.
  15. This is one of those 'can of worms' sorts of questions as there are just so many different problems with porting. I'll just start listing things and I'm sure others will add to it, here are some of the 'technical' difficulties: Graphics engine. The consoles all have different graphics API's, even the XBox is not 'exactly' D3D so you have to go through and port certain pieces. OS in general. Different calls to do simple things like get current working directory, creating threads, file IO, etc. Equivalency of API's. For instance if you use IOCP on Windows, expect to rewrite the entire system for each of the other platforms as they all do async work differently. TRC's, i.e. requirements you must meet to get onto the platforms. For instance, a difficult one for many games is that you can't show a static screen for more than x (6 on many if I remember correctly?) seconds. You need loading animation, progress or something to tell the user things have not crashed. Different memory configurations. Some consoles have dedicated areas of memory for different things, sometimes this is enforced, sometimes it is not. Often you need many different memory allocators in order to utilize this difference. Different compilers. While not 'as' bad as it used to be, there are still different compilers, versions of the compilers, library support, out right bugs in ports at the SDK level etc. This is just touching the surface of all the problems you can/will run into. Of course there are also game play and input changes to deal with: Often you need to revamp your UI's for the consoles, unless the game was specifically written in a console style up front. Deal with different display resolution requirements. I believe you still are required to support 480P on many of the consoles. The Switch presents some issues since when detached it's a really tiny screen, will folks be able to deal with your UI on that screen? Input, hope you didn't use mouse/keyboard in a manner that won't port well to gamepads. How folks usually deal with this is as you say, spend about a year porting things. Otherwise you have to start with the support built from day one and keep everything running on all the targets. As an indie dev, I suggest not worrying about this much as more than likely if your game does really well and has potential on a console that is the only time you'd have to worry about it. At which point you can try and do it yourself or get folks who do this sort of thing all the time.