Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 15 Dec 2001
Offline Last Active Private

#5281207 no good way to prevent these errors?

Posted by on 14 March 2016 - 08:35 AM

location structs were a later addition to the game, and the entity struct was never re-factored to use location structs

Then refactor the code to make things like this go away?

Speaking as someone who has seen your code posted here before, the amount of time and the errors don't surprise me in the slightest...

#5281206 how could unlimited speed and ram simplify game code?

Posted by on 14 March 2016 - 08:33 AM

of course i'd imagine that under the hood, the PC/OS would turn single thread code into multi core code automatically (if we even still needed multi cores), and it would also apply any possible hardware graphics acceleration automatically. with unlimited ram and speed, there'd be the resources necessary to do this, while presenting a single thread non-hardware-accelerated API to the application.

Of course you wouldn't need multiple cores - you have unlimited cycles so you couldn't do any more work than you are already doing because you do it all at once anyway.

Same with 'hardware accelerated' - the concept no longer exists. Your software executes instantly, you don't need to 'hardware accelerate' anything.

Everything is software.

#5281170 how could unlimited speed and ram simplify game code?

Posted by on 14 March 2016 - 02:46 AM

The thing is that not having to worry about optimizations means that code would be easier to write and cleaner, not the other way around.


Right now, while sections of code are harder to read than others those others often have structure enforced on them because of the very need to compartmentalise things for processing, which enforces and requires structure.

Remove the processing requirement and, by human nature, you begin to remove the structural element and thus the code becomes harder to reason about. Everyone here is focusing on subsections ("no collision detection trickery!", "just raytrace everything!") without thinking about how the average person would implement them given time constraints and a need to 'get it done'.

Even in todays code, with limited resources, people will just throw things in sometimes to hit deadlines and take the performance hit with a view to 'sort it out later' - its rare enough it happens in today's case, now imagine a world where you no longer have to worry about fixing it later because hey, it "just works" fine...

Also, and this is key I feel, most programmers in the industry are a bit shit.
They are a programming version of data entry monkeys, unable to design their way out of a damp paper bag.
Now, imagine those people let loose in a codebase without any performance constraints or cares....

If I'm lucky the screaming in my head will stop in a few days...

(oh, and before you doubt the 'most programmers are shit' thing, I present "Hg's do a random thing on when I don't know how to merge" - in use by people across the world right now!)

#5281142 how could unlimited speed and ram simplify game code?

Posted by on 13 March 2016 - 06:16 PM

That they should and if they would are two totally different things; I've seen enough code by enough people to know what would happen...

#5281121 how could unlimited speed and ram simplify game code?

Posted by on 13 March 2016 - 04:10 PM

so, how could unlimited ram and clock cycles simplify game coding?

It wouldn't.

Oh, it might be nice for the first person working on a game, because you wouldn't have to give a shit about structure, design or anything else - just bung everything in a function and off it goes.

Of course, by the time the 10th person comes along to work on that bit of code all they are going to find is a horrible mess which no structure nor reasons behind it and find it impossible to maintain which would lead to either a) them rewriting it or b) murder.

("But they could just add another bit...", "Sure, because bugs never come from unintended interaction of code...")

#5278241 AMD - Anisotropic filtering with nearest point filtering not working properly?

Posted by on 26 February 2016 - 02:32 AM

As i said, it works fine on Nvidia and Intel chipsets. But on AMD it doesn't.

Just as a minor point and to reinforce something Hodgman said - just because it works as expected somewhere doesn't mean it is 'right'.

I have found in the past, typically with OpenGL, that when it comes to implementation AMD tend to stick closer to the letter of the spec, where NV take a more 'whatever works' approach. This can mean that NV will let things go where AMD will throw up errors or do what you don't expect - annoying but there it is :|

#5278034 Right way to handle people complaining about price?

Posted by on 25 February 2016 - 02:51 AM

A few people have passed comment that it was stupid/daft/whatever of them to write an engine for the game but consider this; 5 years ago the engine landscape was very different.

Unity had only recently come to Windows (and, from what I recall of it at the time, wasn't that great), UE4's open source model didn't exist, most other engines were either utter tosh or required you to pay to get access to them; in the context of THEN starting from the ground up makes sense.

If we were having this discussion in 2021 then I'd agree it might be a bit silly, but as it was...

#5275906 instancing with multiple meshes

Posted by on 16 February 2016 - 07:15 AM

IIRC DX11 doesn't have the equivalent of that; you have to issue multiple indirect draw calls however that still requires N draw calls for N sets of instances (bit of API tricky in the GL world; still breaks down to multiple draw packets on the backend.)

AMD supports it in Dx11 via their "AMD GPU services" extension library (in their GPUOpen GitHub). They also do support it in HW - the GPU's command processing front-end decodes the single multi-draw-call.

Ah, I stand corrected.
Note to self; re-read the GCN docs at some point...

#5275893 instancing with multiple meshes

Posted by on 16 February 2016 - 05:17 AM

IIRC DX11 doesn't have the equivalent of that; you have to issue multiple indirect draw calls however that still requires N draw calls for N sets of instances (bit of API tricky in the GL world; still breaks down to multiple draw packets on the backend.)

If you want to do 'multiple meshes in one draw call' then, depending on your hardware target, the OP's original idea will work.

You require a buffer with all the vertex data in, and another buffer which lets you offset index in to it on a per instance (or, I guess, per group of instances) frequency.
Bind the buffers as inputs, but not as traditional vertex buffers.
Then issue your instancing draw with the correct vertices and instance count (which would be total required).
In your vertex shader you'd use the incoming values (vertex id and instance id) to index in to the offset and vertex buffers as required to pull the vertex data yourself. (vertex and index buffers can be 'null' for this.)
At which point you proceed as normal.

One important point with this method is that your meshes have to have the same vertex count; this is because vertex count is a function of the input to the draw call so you'll always get say 64 vertices processed per instance. You could, of course, use degenerate triangles to have meshes with varying numbers of output verts but you'll still pay the vertex shader cost for them.

D3D12 (and I assume Vulkan) offer more interesting ways of doing this as you can construct your own multi-draw indirect buffers from a compute shader which also allows changing of vertex buffers per draw packet; that's a tad more complicated however ;)

On the D3D11/Vertex shader route however this presentation covers what I wrote above, but maybe in a slightly clearer way; http://www.gdcvault.com/play/1020624/Advanced-Visual-Effects-with-DirectX - the bit in question is 'Merge Instancing'.

#5275711 Vulkan is Next-Gen OpenGL

Posted by on 15 February 2016 - 03:04 AM

AFAIK (and I hope I'm out of date in my knowledge):
• AMD fully supports this -- for example, shadow-mapping draw-calls are bottlenecked by the HW rasterizer, leaving the compute cores mostly idle. Post-processing on the other hand generally fully saturates all the compute cores. Async compute means that you can render your shadow maps and perform post processing at exactly the same time, making use of compute resources that traditionally would've been sitting idle during that time.
• NVidia pretends to support this -- they'll let you set up the graphics/general queue and the dedicated compute-only queue, and let you submit your shadow draw calls into the former and your post-process dispatch calls into the latter... but in the hardware, they'll still only be working on one at a time - either the draw calls, or the dispatch calls, but never both at once.

Pretty much, although based on my reading of the hardware spec the Maxwell 1 hardware can work on compute and gfx at the same time but due to the way the dispatcher is arranged it won't always do so.

So on AMD hardware you have the ACEs and the Gfx hardware queues feeding in to a dispatcher to the CU blocks - the work can be picked from any of them and is balanced based on what is going on. So on a high end card you'll have 9 hardware pipes backed by 65 software pipes feeding into the CU array.
(This is true of all GCN hardware, it is the number of ACEs that differ.)

NV on the other hand work in one of two modes;
- Graphics/Mixed mode
- Compute only

In compute only, ever since Kepler '2' (aka Titan/780), it has had 32 queues (although NVs lack of information doesn't say what kind; I'm assuming 'software') to pull work from.

However Maxwell 1 and backwards could only pull from a single queue in 'mixed mode' and relied on the dispatcher to throw more than one lot of work in to their CUDA array.
Now, when all this hit, I put forward the idea that you could, with driver assistance on a per-game basis, probably work around this problem - you'd need to massage the commands going in to the queue so you could interleave gfx and compute sanely (I'm assuming they must do something like this to let PhysX and Gfx play nicely? or maybe PhysX and gfx doesn't play nicely...) - however that raises the question of 'why not for this benchmark?' given they apparently had access to it for some time; couldn't or didn't think it would matter? (and couldn't feeds in to my physx/gfx question.)

With Maxwell 2 this, afaik, largely goes away - mixed mode can pull from 32 queues as with compute (1 gfx, 31 compute; assuming 'software' as before) but there is still the central dispatcher setup which might still causes some limitations with regards to the amount of work 'in flight' - it is, after all, effectively doing the work of all of AMD's ACEs and gfx front end in one chunk of silicon, but a lack of technical details makes it hard to say what is going on and how it'll be impacted.

I full expect Pascal to fix all these problems for NV however - if it doesn't have a reworked front end for this kind of work load I'll be surprised.
(also the fact MW1 and MW2 are setup like this doesn't surprise me; imo NV hardware has always been a 'good enough' implementation of things, in that it'll do what is required for now but doesn't work as well in to the future. GCN seems to be more 'future proof' in some regards.)

It'll be also interesting to see what AMD's Polaris brings to the table; they appear to be adding some more dispatcher logic so I'm looking forward to hopefully getting a GCN-style deep dive on that data.

( relevant link for the compute/dispatch stuff; AMD drives deepn on async shading)

#5275675 D3D12: Copy Queue and ResourceBarrier

Posted by on 14 February 2016 - 04:58 PM

The copy queue 'could' be optimised; it is a bit hardware specific.

For example on Intel I doubt you'd get any pay back, AMD however have dedicated DMA hardware on their GPU so copying can be handled separately from other operations. (Same with compute queues; GCN has up to 8 hardware queues each servicing up to 8 software queues - although if memory serves currently you can only create one unique queue per type with D3D12.)

#5275673 Vulkan is Next-Gen OpenGL

Posted by on 14 February 2016 - 04:55 PM

Maxwell 1 supports it, but there is an 'issue' with how the hardware dispatches work which means that if not fed in a certain way problems can appear.

As I recall Maxwell 2 has a slightly changed dispatcher front end which makes the problem vanish (or if not 'vanish' then isn't noticeable).

"Supports" is always a fun word in graphics land.

(If you go and read NV's whitepaper on Maxwell 1 hardware it is clear why it would have a problem with certain workloads - the driver could probably work around some of them on a case by case basis.)

FWIW I generally think GCN is a better overall arch, although that might be down to AMD telling us things and not hiding details; NV have had a massive driver advantage for some time however and few hardware tricks which are better for games when combined with tuning.

#5275488 Vulkan is Next-Gen OpenGL

Posted by on 12 February 2016 - 06:51 PM

From an IHV point of view Fermi is dead and gone; by the time Vulkan focused engines appear (as in totally killed GL support) it'll be another couple of card releases down the line and matter even less.
Financially it makes zero sense to invest in Fermi based tech at this point - it is already two generations old and isn't selling after all. (Pascal is due later this year too...)

Sucks for the developers who can't get one, but such is life... and it's not like GL is going away any time soon so you can still continue development and, to be frank, chances are the majority of games going forward will be either AAA in-house or built on engines such as UE4 and Unity, both of which will support the tech so it Just Works on target platforms.

Same reason you won't see Vulkan on pre-GCN hardware from AMD, and I'd be surprised if the 1.0 hardware got much love at this point...

#5275060 C++ and Lumberyard engine

Posted by on 09 February 2016 - 05:44 PM

Unity reuses their engine between major releases and just rewrites the parts that are new, whereas Epic claims to rewrite their engine with every major release

Both engines are continual evolutions; bits of UE4 were indeed rewritten from UE3 but a lot was still the same on release (majority of the core systems for example). I think there was some miss communication and people made some leaps/assumptions which lead to this "UE4 was rewritten" legend.

Don't get me wrong; bits of both engines have been rewritten and reworked over time, what was good yesterday isn't going to be good today after all, but no one does a dump and rewrite, and certainly not in the time scale suggested.

(Having worked for both companies I feel I've got a degree of authority on this subject ;) )

#5273143 Vulkan is Next-Gen OpenGL

Posted by on 29 January 2016 - 02:56 AM

OK, firstly, there was nothing about 'one API per hardware vendor being bad' said - it was "we do not need another API. OpenGL is fine".
That is what is said.

Secondly, yes "write the code once with one API and get it to work everywhere" is a nice idea.. but it is a pipedream even today.

D3D : Does a pretty good job of ironing out the wrinkles of the hardware but even for that you end up with "work around for driver funkiness" code paths which apply per driver/GPU.
OpenGL : See above but worse with added extension funkiness.
OpenGL|ES : Android is where 'one code path' goes to die.

On top of that you have the PS4's API, the subtle differences which are the Xbox360's API, the Xbox One's API, the PS3's API, Wii/WiiU, PSP and more recently Metal.

People are, and were, already writing multiple backends to get the best from hardware so that piece of FUD was already a reality for many.
(And, frankly, I don't expect Vulkan to change that all that much either - Windows will still require per-GPU work arounds and code paths for Reasons, maybe Android will improve is Google somehow get a handle on things, but even that'll suffer the same problem.)

But, again, that wasn't the argument made.

The argument made was 'OpenGL is all you need.' and that 'another API wasn't needed'.

Which was my whole point back in the first post that touched on this and yet people seem to be having a problem understanding a simple concept?