Jump to content

  • Log In with Google      Sign In   
  • Create Account

phantom

Member Since 15 Dec 2001
Offline Last Active Private

#5293276 what good are cores?

Posted by phantom on 24 May 2016 - 05:22 PM

Honestly, the fact you think this means you are a good 15 years behind the curve right now - threads have been a pretty big deal for some time and far from 'bling bling'.

Threads and cores are two different things imho; having hundreds of threads doesn't imply you need many cores.


Well, it seems like he isn't using threading at all, so that's the 15 years bit.
But yeah... I was kinda still thinking in line with what Hodgman said as my mental setup is basically the same where 'working threads' == 'cores' (more or less) so I tend to get sloppy with my wording.

I believe it's actually pretty hard to usefully use more than 1-2 cores full time.


Disagree.
Arrange your data correctly and things flow nicely; when combined with a job system you can scale pretty well.
(There is a degree of 'hard' in doing that... but.. again... see '15 years' comment as people have been working on and solving this problem for a while; PS3 was the poster boy for 'jobs', and today the 'jobs' mentality extends beyond the CPU cores and to GPU compute work too.)


#5293215 what good are cores?

Posted by phantom on 24 May 2016 - 09:07 AM

so, what good are cores (at this point in time)? can they do anything truly useful? or just more BS bling bling chrome on high end machines?


Honestly, the fact you think this means you are a good 15 years behind the curve right now - threads have been a pretty big deal for some time and far from 'bling bling'.


#5284729 GLSL Mega-shader Subdivision

Posted by phantom on 02 April 2016 - 03:24 AM

The short answer is yes, shader subdivision as you call it is the main way to improve performance.

The longer answer requires a bit of hardware knowledge :)

GPUs have banks of registers; each execution unit has a maximum number of registers it can use for all shader programs currently being executed on that unit. The more registers a shader requires the smaller the number of shaders an execution unit can keep going at once - and they like to have plenty going to hide latency etc.

These registers are allocated statically; so if your shader says "hey, I want 10 registers", then even if only on run it only uses 4 the hardware will have allocated 10 to it. So if the hardware only had room for say 40 registers grabbing 10 means it can only have 4 instances of your shader in flight at once, but if it had allocated 4 instead then it could have run 10, potentially doubling the speed of overall execution.

The reason this happens, certainly with large shaders, is that the compiler via static analysis of the code has to assume the worst case setup - that you'll use all 10 of those registers all the time. It also has to statically allocate them, so 'if' statements and the like can introduce registers which might never be used but have to be reserved 'just in case'; the more 'if' statements and loops, the more you increase this requirement aka 'register pressure'.

So, it isn't the GPU being in tolerant of unused code, it is the compiler having to assume the worst and request maximnum number of registers it knows will be needed.

As to how many, well, as few as possible while keeping occupancy high is the unfortunately vague answer.

You could write a tiny shader for all cases, but this would have the impact of CPU overhead (less of an issue with Vulkan/DX12) to issue the calls and GPU overhead as small batches still cause issues (the GPU front end itself can only track so much work in flight).

The best you can do is try not to make them too big and use tools from the GPU makers to see what kind of resources your shader will take and make a judgement call from there - some times using a couple of extra registers won't hurt as you'll keep the number of shader instances executing ('in flight') high enough and will be able to issue less draw calls.


#5283212 Per Triangle Culling (GDC Frostbite)

Posted by phantom on 24 March 2016 - 01:04 PM

Eh?

There are slides which show that per triangle culling is certainly worth it in that very deck (83 - 85) - yes, it has a very GCN focus but that's what happens when the consoles all use the same GPU arch.

As to backface culling; of course NV show a greater speed up vs AMD - their hardware is setup to process more triangles per clock than AMD so they can also cull more per clock. (AMD on the other hand have focused more on Compute and async functionality, which in the longer term could be the smarter move.)

So, you are probably right, if we could get this working with async compute on NV hardware you might not see the same improvement (or maybe you would; less triangles to setup is less after all?) but given the lack of async compute support on NV hardware that isn't likely to happen for a while... (and from what I've been hearing the next chip isn't going to fix that problem either; keep an eye on NV PR, if they go full anti-async spin, more than they have already of course, then we'll know...)


#5282927 quesutions about index drawing

Posted by phantom on 23 March 2016 - 11:47 AM

Index drawing is the main form of drawing.

A vertex is not just the position; it is the combination of all the attributes which makes it up, so as soon as one thing is different it is two vertices.

In practise however this isn't a problem; very few vertices on the average model have this problem (identical position, different uv/normal).

The main reason for index drawing is in fact to reduce GPU overhead; if you reference the same index twice then the GPU can use a cached result as the calculation must be the same for both instances - which is the reason that if any input varies it is a new vertex; the output isn't the same after all.


#5282472 Opengl- Diffrent Output for Adreno and Mali GPU

Posted by phantom on 21 March 2016 - 05:40 PM

I know I say this a lot but Android and the graphics API support there is a big clusterfuck of shit.

You've run in to a driver bug; this will never be updated or fixed so best you can do is figure out a work around, detect the OS/device/driver combo and activate the work around when it is detected.


#5281837 casting double* to float*

Posted by phantom on 18 March 2016 - 05:57 AM

Studios do lots of dumb and insane things however - I'd never use a coding standard produced by one as being 'right' for anything beyond their own usage.


#5281473 how could unlimited speed and ram simplify game code?

Posted by phantom on 16 March 2016 - 06:46 AM

You'd deadlock the universe and god would have to reboot it.


#5281458 how could unlimited speed and ram simplify game code?

Posted by phantom on 16 March 2016 - 03:52 AM

Bad programmers who make a career out of writing needless boilerplate that actually does very little would proliferate.


Bad programmers would be the only programmers, but that's because the wage for programmers would basically crash.

Right now you pay for experience and expertise in making stuff so that the stuff that is made runs well and works well, but in a world where any old shit executes at the same speed as long as someone can get the right answer who gives a damn?

Of course at this point degrees in software engineering, computer science and related fields become all but worthless; none of it matters because brute force all the things! and wages would be so low that you couldn't hope to repay the loans required to get the worthless knowledge.

Which is about the only saving grace of this reality; I wouldn't have to deal with it at a professional level because no one would pay me the amount I'd want to deal with it biggrin.png


#5281457 how could unlimited speed and ram simplify game code?

Posted by phantom on 16 March 2016 - 03:49 AM

Why would you need a lookup table?
Just find someone else's code which generates values and dump it in your code base - even if you need to run 1,000,000 iterations to get the number it wouldn't matter, it executes instantly after all.


#5281375 how could unlimited speed and ram simplify game code?

Posted by phantom on 15 March 2016 - 01:38 PM

Well, that's kind of the thing;

- we have infinite in which case throw everything out the window because nothing matters
- we have near-infinite in which case everything still applies as it does today you just pick your poison

Or to put it another way;
- program in JavaScript because screw effiency, layout, control, structure and all that stuff I'll take the hit
- program in C++ because it allows more effiency, control, structure and speed so while dev time will be longer we'll have better runtime performance

(And yes, I'm holding up JavaScript, and indeed the whole clusterfuck which is the web, as an example of how much unreadable bullshit you get when you throw structure out the window...)


#5281221 C++ Self-Evaluation Metrics

Posted by phantom on 14 March 2016 - 10:25 AM

If you turn up to the interview wearing just speedos you can get away with an 8.5.


#5281207 no good way to prevent these errors?

Posted by phantom on 14 March 2016 - 08:35 AM

location structs were a later addition to the game, and the entity struct was never re-factored to use location structs


Then refactor the code to make things like this go away?

Speaking as someone who has seen your code posted here before, the amount of time and the errors don't surprise me in the slightest...


#5281206 how could unlimited speed and ram simplify game code?

Posted by phantom on 14 March 2016 - 08:33 AM

of course i'd imagine that under the hood, the PC/OS would turn single thread code into multi core code automatically (if we even still needed multi cores), and it would also apply any possible hardware graphics acceleration automatically. with unlimited ram and speed, there'd be the resources necessary to do this, while presenting a single thread non-hardware-accelerated API to the application.


Eh?
Of course you wouldn't need multiple cores - you have unlimited cycles so you couldn't do any more work than you are already doing because you do it all at once anyway.

Same with 'hardware accelerated' - the concept no longer exists. Your software executes instantly, you don't need to 'hardware accelerate' anything.

Everything is software.


#5281170 how could unlimited speed and ram simplify game code?

Posted by phantom on 14 March 2016 - 02:46 AM

The thing is that not having to worry about optimizations means that code would be easier to write and cleaner, not the other way around.


Disagree.

Right now, while sections of code are harder to read than others those others often have structure enforced on them because of the very need to compartmentalise things for processing, which enforces and requires structure.

Remove the processing requirement and, by human nature, you begin to remove the structural element and thus the code becomes harder to reason about. Everyone here is focusing on subsections ("no collision detection trickery!", "just raytrace everything!") without thinking about how the average person would implement them given time constraints and a need to 'get it done'.

Even in todays code, with limited resources, people will just throw things in sometimes to hit deadlines and take the performance hit with a view to 'sort it out later' - its rare enough it happens in today's case, now imagine a world where you no longer have to worry about fixing it later because hey, it "just works" fine...

Also, and this is key I feel, most programmers in the industry are a bit shit.
They are a programming version of data entry monkeys, unable to design their way out of a damp paper bag.
Now, imagine those people let loose in a codebase without any performance constraints or cares....

If I'm lucky the screaming in my head will stop in a few days...

(oh, and before you doubt the 'most programmers are shit' thing, I present "Hg's do a random thing on when I don't know how to merge" - in use by people across the world right now!)




PARTNERS