Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 14 Feb 2007
Offline Last Active Today, 08:33 PM

#5294144 responsiveness of main game loop designs

Posted by Hodgman on Today, 06:56 AM

frankly, this revelation has put me into shock at what passes for "good engineering" in "professional" game development. and is making me seriously reconsider whether my time spent on this forum is time well spent.

Making games run faster, use modern hardware more efficiently, and feel more responsive to the user.... is bad?


There's no need to be rude just because you got an answer that contradicted your asserted preconception and gave you an opportunity to learn new techniques that are applicable to certain situations. If you're going to simply ignore anything that doesn't fit your preconceptions, and invent new self-contradictory reasons why you should ignore them, despite evidence disproving your objections, then no, you're not making good use of your time.

#5294115 DX9 Doubling Memory Usage in x64

Posted by Hodgman on Today, 12:39 AM

Adding to what vstrakh suggested, your VERTEX structure could be responsible if it uses some datatype that expands, such as 32-bit floats on x86 to 64-bit floats on x64. If you have big vertex buffers that would potentially double their total size then.

The VERTEX structure contains floats, which are 32bit on both platforms.

#5294103 (Physically based) Hair shading

Posted by Hodgman on Yesterday, 09:41 PM

Not hair exactly, for for anisotropic materials in general:

IIRC the Disney BRDF paper that all the game PBR papers seem to cite, includes two forms of GGX -- the common (isotropic) one, but also an anisotropic version.

I'm using that at the moment, but yeah, now IBL is a problem. I allow importance sampling at runtime, which solves it, but it's not really feasible except as an "Uber" detail option.


IIRC (again), the frostbite PBR presentation introduced a nice hack, which just bends the IBL lookup vector based on the anisotropy data, which is a completely empirical model rather than physically-based... but it creates the right impression for the viewer and is better than doing nothing. You can also fiddle with using an anisotropic texture filter and passing your own ddx/ddy values into TextureCube::SampleGrad to try and blur your IBL probe along the anisotropy direction (just more hacks though).

#5294101 Destructible Mesh w/ Physics

Posted by Hodgman on Yesterday, 09:12 PM

Is bullet the suggested physics engine? What about physX? I'd really love to roll my own so I have control of the code but I think it's going to be a pain.

Bullet is very good. I'm using it in my racing game at the moment :D

I've got all of the Bullet code inside my engine project directly, not even linked as a DLL/static library - which gives you a lot of control over exactly how you use all its parts.

One other downside is that Bullet is not a google'able word. Searching for e.g. "bullet collision detection" will mostly bring up results that have nothing to do with the "Bullet" physics library :lol:


PhysX is better. Its CCD is amazing compared to Bullet's and I think it's generally faster. It's also got all the fancy GPU-side stuff if you care, but personally this just pisses me off that NVidia develops this as a weapon in their marketplace battle against AMD (i.e. it's a conflict of interest). If your engine is multi-threaded, you can also very easily plug PhysX into your engine's threading system to get it to perform work across multiple cores.

The catch is that you don't get source without paying, and you only get Windows without paying. If you want to support other platforms or get the source code, you better have a real game budget, or a silver tongue :(

Oh, it actually has proper center of mass controls too :wink:


Rolling your own would be an intense learning experience, but without years of practice, would likely be much slower and less stable than Bullet -- unless you're focusing on a very small and specialized area.

#5294099 Are Third Party Game Engines the Future

Posted by Hodgman on Yesterday, 09:04 PM

Those numbers aren't completely correct - You have to clear a certain threshold per quarter before Epic take their cut, and then you only pay on the amount over the threshold and only for that quarter.
So, you have to have $3000/quarter gross before you have to pay them a penny.

Yeah I remembered that half way through, but didn't think it changed the numbers enough to bother fixing them :lol:

In my indie example, it's pretty fair to assume the sales all happen in one quarter (most games have a sales spike with no real tail), which case Unreal costs them $4850 instead of $5k. If they tail over two quarters, then they pay $4700, which is still more than 5x the Unity cost in that example.

If they're unfortunate enough to take a whole year to reach their sales target, Unreal comes down to 4.9x more expensive than Unity :lol:

That's also a ridiculously small scale example -- two people working on entry-level wages for six months and trying to break even. I feel like Unreal would only be cheap for hobbyists who have no requirement to make money or not. Of course, it depends on what its productivity boost actually quantifies to -- if Unreal lets you complete the project in 3 months instead of 6, then it may be worth this higher price.


In the AAA case (or even "double A"... what do we call middle tier games these days?) the $3k threshold makes negligible difference. At that scale, it's probably a better investment to build your own tech.

As a side note, in my country there's tax offsets for R&D work, which can actually reduce your payable tax to a negative number (i.e. the government can end up paying you tax for building an engine). These kinds of incentives can be the icing on the cake :)

#5294026 Are Third Party Game Engines the Future

Posted by Hodgman on Yesterday, 05:07 AM

For AAA the ability to have a fully-functional engine with complete asset pipelines is just too good. The ability to have all of your artists create things, get them in game, etc. is a tremendous amount of work and getting that all done on the spot by downloading an engine like Unreal is just often cost-effective. From a business standpoint yes pre-made engines are absolutely a good option in most cases.

That's the thing though, Unreal isn't actually cheap or cost effective.
For a AAA game, 1 million sales would be a very low number. Let's say a AAA game also sells for $60 retail. Unreal want 5% of the retail takings - which would be $3 million in this example.
For three million dollars, you could pay the wages of thirty talented programmers for a year and build an engine that suits your needs exactly.

A more likely number is 10M sales, which means your engine cost is $30M -- for that much you could fund a huge amount of your own engine/tools tech development :)

Some time ago I worked on a sub-AAA console game, which retailed for $100 (local currency - equiv to US$60), wholesaled for $80, and have production/distribution/royalty costs of $30 -- that's $50 left over. After paying 30% tax on it, that's $35 in the publisher's pocket (and none in the developer's pocket :lol:).
This game needed to sell just 50k units to break even for the publisher - which means a total budget of ~$1.75M... but the retail takings would be $5M, so Unreal's price would be $250k.
If you assume a 50/50 budget split between marketing and development, that's a development cost of $875k, so Unreal's price of "just 5%" would actually have been ~29% of the development budget!
Spending near one third of your budget on off the shelf tech is a big ask -- which does motivate many developers to maintain their own tech. If you only upgrade your engine gradually for every game made, you can get away with as little as one full time engine programmer in your company :wink:

I also know a lot of indie companies who use Unity over Unreal because of the cost.
Lets say you plan to sell 10k units on steam at $10 each. That's $100k retail, so Unreal costs $5k. However, after Steam and the tax-man take their cut, you get $49k, which is enough to fund a very, very small game... say, two people for half a year. Unity is $75/mo per person, so that would be $900 in this case, or more than 5x cheaper than Unreal.

In that case though, $900 for an engine is probably going to be much cheaper than your own time -- so you'll find that for indies it will often make sense to license an off-the-shelf engine, while for AAA it makes a lot of sense to build your own.

#5294024 Changing a descriptor set's buffer memory every frame?

Posted by Hodgman on Yesterday, 04:22 AM

That means my only(?) option is to use a descriptor set.
The idea is to bind the descriptor set inside the secondary command buffer recording, then update the descriptor set with the new data every frame, right before executing the secondary command buffer.

Just to get the terminology right - a descriptor set is a group of descriptors. A descriptor is a small structure that points to a resource.
You can either update a descriptor to point to a different resource, or just update the data within that existing resource.

Since the memory of the descriptor set's buffer changes every frame (=non-coherent) it has to be created without the VK_MEMORY_PROPERTY_HOST_COHERENT_BIT flag.

That's not what coherent/non-coherent means. Memory coherency means that two processors see the same version of events in memory. Coherency is an issue for multi-core CPU design too -- when one core writes to memory, that write might be stored in the core's cache for some time before actually reaching RAM. This means that other cores will see a non-coherent view of RAM. CPU manufacturers solve this by networking the cache of each CPU together, and following a coherency protocol, e.g. MESI.

By default, the CPU and GPU are not coherent because the CPU is accessing RAM via it's cache, and the GPU is accessing it directly -- so the GPU won't see any values that are lingering in the CPU's cache.
Programmers can achieve coherency themselves, via functions like vkFlushMappedMemoryRanges/etc (internally this is ensuring that the CPU's writes have actually reached RAM, and informs the GPU to invalidate any caches that it may be using).
Or, if your hardware supports it, some PC's are capable of auto-magically establishing a coherent view of RAM. For example, these systems may be able to route the GPU's RAM read request to flow via the CPU's L2 cache, so that the latest values are picked up without the need for any flushing/invalidation commands. The downside is that this will be a longer route, so the latency will be increased -- so coherent memory heaps are ok for things like command buffers or some constant updates, but not so good for textures :)

In your case, you should be able to put your data in coherent or non-coherent heaps, as long as you follow the guildelines to achieve coherency yourself via Flush/etc...
As for the barrier -- vkFlushMappedMemoryRanges occurs on the CPU timeline and flushes the CPU cache out to RAM. The barrier occurs on the GPU timeline and invalidates any values that already exist in the GPU's cache, so that it will actually fetch fresh values from RAM - but as in your quote, this happens already for each command buffer submission.

Another thing I'm wondering about:
The memory of the buffer is updated and used by the pipeline every frame. What happens if a frame has been queued already, but not fully drawn, and I'm updating the buffer for the next frame already?
Would/Could that affect the queued frame? If so, could that be avoided with an additional barrier (source = VK_ACCESS_SHADER_READ_BIT, destination = VK_ACCESS_HOST_WRITE_BIT ("Wait for all shader reads to be completed before allowing the host to write"))?
Would it be better to use more than 1 buffer/descriptor set (+ more than 1 secondary command buffer), and swap between them each frame? If so, would 2 be enough (Even for mailbox present mode), or would I need as many as I have swapchain images?

Whenever the CPU is updating data that will be used by the GPU, you need to take care as the GPU is usually one frame behind the CPU. This usually means double or even triple-buffering your data. This is usually achieved by creating two (or more) resources and binding a different one each frame. This would also mean creating two descriptor sets, and two of your secondary command buffers...
You also need to use two (or more) fences to make sure that the CPU/GPU don't get too far ahead of each other. e.g. for double buffering, at the start of frame N, you must first wait on the fence that tells you that the GPU has finished frame N-2.
Once you've implemented this fencing scheme, you can use this one mechanism to ensure safe access to all of your per-frame resources.
e.g. once you know for a fact that the GPU is only ever 1 frame behind the CPU, then any resource that's more than 1 frame old is safe for the CPU to recycle/overwrite/reuse... and anything younger than that must be treated as if it's still being used by the GPU...

So, if you want to edit your descriptor set, or edit the resources that it points to... you're not allowed to until the GPU has finished consuming them. You can solve this by double buffering as above -- two resources, so you can have two sets of values in flight... which means two descriptor sets in flight... which means pre-creating two versions of your command buffer :(

Alternatively, you can use a single descriptor set (not double-buffered, never updated) and a single resource (not double-buffered, but updated on the GPU timeline instead of the CPU timeline) :)
If these updates occur on the GPU timeline, then there's no need to double buffer the resource, which means there's no need for multiple descriptor sets.
However, this also introduces its own pitfalls... To perform this update on the GPU timeline, you now need the "main" version of the resource, which is referenced by the descriptor set and read by your shaders. You also need a double-buffered "upload" resource, which is written to by the CPU each frame. You then submit a command buffer that instructs the GPU to copy from (one of) the upload resources to the "main" resource.

#5294003 Destructible Mesh w/ Physics

Posted by Hodgman on 28 May 2016 - 11:10 PM

If they're all disconnected from each other, it would be 15 rigid bodies with 1 collision shape each.
Bullet is a pain in the ass with this -- there is no ability to set the center of mass of an object; the center of mass is always it's local origin. Also, shapes are always centered around the rigid body's local origin :(
So yep, to deal with this you've got to make some kind of "physics object bind pose" system, which exists in between your gameplay/rendering code and the bullet API.
You'll probably also need to wrap your collision shape in a btCompoundShape, which allows the child shape to exist at an offset, which allows you to adjust the center of mass.

In my engine, I've got this kind of crap to get the kind of matrices that the graphics system is expecting:
	Mat4 GetPhysicsMatrix()
		const btTransform& centreOfMassOffset = m_scene->m_centreOfMassOffset[m_body->getUserIndex()];
		return m_body->getWorldTransform() * bindPose;
	void SetPhysicsMatrix( const Mat4& tx )
 		Mat4& inverseBindPose = m_scene->m_centreOfMassOffset[m_body->getUserIndex()].inverse();
		m_body->setWorldTransform( inverseBindPose * tx );

#5294000 Destructible Mesh w/ Physics

Posted by Hodgman on 28 May 2016 - 10:13 PM

Which physics library are you using?

Usually there's a distinction between a rigid body, and the collision shapes that are attached to it.

In some libraries, it should be no problem for the collision shape to be offset from the local origin, and have the center of mass exist at the middle of that shape.

#5293999 Question about GI and Pipelines

Posted by Hodgman on 28 May 2016 - 09:58 PM

These are a good start:




NVidia doesn't like to share information publicly... usually only publishing presentations like this, and their marketing people love to step in and deliberately attempt to blur the line between hardware design and software techniques (which are often cross-vendor, or even applicable to older NVidia GPUs)...


But AMD and Intel give out enough info that you could write your own hardware drivers if you wanted to! In fact, Intel started being this open so that the Linux community could/would write their own drivers :)






^^ The Instruction Set Architecture documents explain the way that the hardware actually works -- or, the language(s) that their driver is translating all your D3D calls into.




AMD has also recently started the http://gpuopen.com/ site, which has some gems on there.

#5293987 Question about GI and Pipelines

Posted by Hodgman on 28 May 2016 - 07:19 PM

Except that a Z prepass will reduce the effects of an overdraw by only making changes to visible fragments :P. However, it requires the scene to be rendered twice, I've heard that it's possible to do it just once by using Computer Shaders to do the Prepass for you. I can never find information on that though.

There's also "pixel quad efficiency", which will be lower in forward.
GPU's always run pixel shaders on a 2x2 "quad" of pixels, not actually on individual pixels. If your triangle edges cut through this 2x2 sized grid, then there will be some wasted computation -- the GPU will execute the pixel shader on the full quad, and then throw away the results that aren't needed.
So, pixel sized triangles will run the PS 4x slower than 2x2-pixel pixel sized triangles.
In forward rendering with high-poly meshes, this can be a big inefficiency. In deferred, you only pay this inefficiency during the GBuffer creation step, but not during your lighting step (especially if lighting via compute shader).

IIRC, AMD also runs 16 quads per work group, Nvidia runs 8, and Intel runs 2. GPUs may or may not be able to run multiple triangles within a single group.
I can't remember this detail so... if the GPU is unable to pack quads from multiple triangles into a work group, then on AMD your triangles need to cover 16 quads (16 to 64 pixels) in order to get full work group efficiency.

Lastly, there's shader complexity. An AMD compute unit can "hyperthread" between 1 and 10 work-groups simultaneously, in theory (up to 640 pixels). However, the actual number of work-groups that it can "hyperthread" like this (which they call the "occupancy" value) depends on how complex your shader is (actually: how many temporary registers it requires). A simple shader can have occupancy of 10, while a complex shader might have occupancy of 1 or 2.
This is an extremely important value to optimize your shaders for, as in very, very basic terms, the latency of memory fetches can be divided by this number - i.e. it's an opportunity to make RAM seem 10x faster.
Forward uses a single, complex shader to do everything in one pass. Deferred breaks that shader in half, and does it in two simpler passes, which makes optimizing for occupancy easier.
The lighting pass of deferred can also be done on the compute hardware, which opens up optimization techniques that are not available to pixel shaders... and on modern AMD hardware, also lets you use "async compute" to run it in parallel with rasterization workloads.

So basically, there's never a simple winner :lol:

#5293934 New Cryengine/Unity/Unreal 3rd/2nd/1st person shooter

Posted by Hodgman on 28 May 2016 - 08:33 AM

So, how much funding you got for this already?
...because I don't know if you're aware, but what you've described would cost millions (pounds, dollars...) to make. Many, many millions. And if you're not cutting corners, then, many, many more millions than the games that do apparently cut corners... So, like, a hundred millions?
If you don't have a hundred million dollars, then you should probably try to design a game that's possible to make with less money than that.
P.S. Infinite detail is not a game engine. It's a point cloud visualization tool that's not at all suitable for games with terrible, terrible hype videos full of mistruths from a snake oil salesman...

Pretty certain it's a joke. Even the fact that he puts great but old games like Crusader no remorse on there indicates that he's older, and shouldn't be inexperienced enough to suggest this seriously.

One would hope, but he's done it from an account name which a few minutes of googling gave me his personal details... so not just a fake troll account.

#5293927 Question about GI and Pipelines

Posted by Hodgman on 28 May 2016 - 07:39 AM

No, it would be possible with either. 
...but, the voxel dataset itself can also be deferred or forward lit :lol: You can voxelize material properties and then light the voxels, or you can light them during the voxelization process! Which gives forward scene + forward voxels, forward scene + deferred voxels, deferred scene + forward voxels, deferred scene + deferred voxels... 
Lighting in general tends to be more efficient in a deferred pipeline, which is why people use it :)

You can write most of your lighting code to not care about whether you're using forward or deferred, and then have two different sets of shaders that both call this shared lighting code. That lets you fairly easily support both for testing :)

#5293926 Steamworks <16?

Posted by Hodgman on 28 May 2016 - 07:27 AM

Search for "germany sole trader". e.g. here's the different kinds of businesses: http://www.commercial-register.com/legalformsgermany.html

GmbH (equiv to LLC) is what you'd ideally get as it means that the business is a separate legal entity to you (important if someone sues you!), but "sole trader" is the easiest option.


Apparently "sole trader" is Einzelunternehmen?




And you will need to talk to a real accountant / tax consultant. Apparently you have something called the Bundessteuerberaterkammer who might be of help.

#5293921 Steamworks <16?

Posted by Hodgman on 28 May 2016 - 06:42 AM

Yeah as above, you (probably) at least need to be a sole proprietor. You will have to check the laws in your country for how you declare income gained from sales.


e.g. In my country, to declare income gained from sales or contract-work, you must be a business. So, I am registered as a sole proprietor so that I carry out independent work by myself. This was a quick 10 minute registration on the tax office website for free, and just gives me a "business number" which is required for taxation purposes. There is no legal distinction between myself and this "business" -- if the business earns money then I report it on my personal tax forms, and if the business owes money then I pay it out of my personal bank account.


If it's not possible for you to register own own sole proprietorship, you may be able to get a family member to do it for you... however, this will impact their own personal taxation situation. For example, they may end up paying more tax on their existing job...

You will have to speak to an accountant within your country to find out the best kind of company/business to create.