Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 14 Feb 2007
Offline Last Active Today, 08:21 AM

#5198508 Current-Gen Lighting

Posted by Hodgman on 16 December 2014 - 05:46 AM

but I wasn't sure if even current consoles were capable of that yet.

Even PS3/360/mobile games do PBR  these days... just with more approximations.


Use a nice BRDF (start with cook torrence / normalized blinn-phong), use IBL for ambient (pre-convolved with an approximation of your BRDF, so that you end up with ambient-diffuse and ambient-specular), use gamma-decoding on inputs (sRGB->linear when reading colour textures), render to a high-precision target (Float16, etc) and tone-map it to gamma-encoded 8bit (do linear->sRGB / linear->Gamma as the last step in tone-mapping).


Ideally you'll do Bloom/DOF/motion-blur before tone-mapping, but on older hardware you might do it after (to get better performance, but with worse quality).

#5198292 GPU skinned object picking

Posted by Hodgman on 15 December 2014 - 05:57 AM

You almost never see triangle based picking on skinned meshes (for pixel perfect accuracy), except maybe in editor apps.
Generally you'll make a bounding box for each bone, then transform the boxes on the CPU and trace against them.

#5197762 Occlusion Culling - Combine OctTree with CHC algorithm?

Posted by Hodgman on 12 December 2014 - 07:28 AM

The CHC++ algorithm (Which builds on CHC) takes care of both of these problems.
There’s no synchronization between the CPU and GPU, unfinished queries are simply queued and used in the next frame.
There’s no “popping” either, can’t quite recall how that’s solved in the algorithm, but they have that covered.

From the paper, notice they say it reduces sync points, not eliminates sync points. In the one chart where they compare with the 'OPT' dataset, their average case is very close as they claim, but their worst case is 2x the frametime / half the framerate, probably due to a sync.

If an object has been invisible but is now visible this frame, you either have to conservatively draw it anyway, or sync (or pop artefacts). So in worst-case situations, if you don't pop and don't sync, you can't cull either...

#5197742 Occlusion Culling - Combine OctTree with CHC algorithm?

Posted by Hodgman on 12 December 2014 - 02:59 AM

Before you go too far down that path, you might want to evaluate alternatives to CHC+...
Using GPU occlusion queries for CPU occlusion culling means that either - you suffer a one-frame delay in culling results, leading to horrible popping artefacts, or you have your framerate by introducing CPU/GPU sync points. IMHO, it's simply not a feasible approach.

#5197481 Physical Based Models

Posted by Hodgman on 10 December 2014 - 06:24 PM

Yeah I'm with promit - even if two different engines/art tools both use "PBR", they probably store their specular/roughnews/etc maps in completely different ways...

The best solution would be if the art files used only mask textures and material names - e.g. a greyacale map specifying which bits are clean copper, another map for "worn blue paint", etc, etc... You could then write/use a tool to convert those masks into textures that are correct for your engine.

It's always been an issue that games made up of "asset packs" will have no consistency unless retouched by an art team... PBR had just exaggerated this existing fact.

#5197142 is there a name for this type of data structure?

Posted by Hodgman on 09 December 2014 - 06:40 AM

The "interface with Update/Render methods" pattern is often called a "game object" or a "game entity".

These days I consider it an anti-pattern...

Sure you could implement any bit of software using it - though non "realtime"/interactive software would be a bad fit..
However, by choosing to put everything into a single "entity" list you're making some serious trade-offs! Sure, your main loops becomes incredibly simple - for each entity, Update; for each entity Render... But in exchange you're giving up the ability to have any high level knowledge of your application's data (and data-flows), and you're giving up all knowledge of your program's flow-of-control. The order that things happen is left up to the chance of your entity list ordering...

You end up with silly bugs, like sometimes your homing missiles are targeting a position that lags by one frame, because you're unable to ensure that all movement logic has completed before executing targeting logic... You become cursed to watch, helplessly, as your co-workers expand the interface to include PreUpdate, PostUpdate... PostPostUpdate... to work around such bugs :lol:

When considering the potential 4x boosts of a quad-core CPU, you're left scratching your head, because any of those Update calls could be touching any bit of the game-state, leading you to consider abhorrent ideas like per-object locks and even more non-deterministic update orders...

This pattern is single-handedly responsible for the oft-repeated myth that "games are not a good fit for multithreading" or "games are hard to mutithread"... And on that note - Last-gen consoles made tripple-core CPUs and NUMA co-processing standard requirements for all games. Current-gen consoles are making hex-core CPUs standard. Games are now leading the way in making consumer software take advantage of parallel processors.

IMHO, having a larger number of smaller, more homogeneous lists of self contained objects (or 'systems' of objects), combined with a more complex main loop in which the flow of data and control is explicit and easy to follow, is a far, far superior approach.
It's easier to keep everything sensibly deterministic, easier to understand and maintain, easier to optimize, easier to spread over multiple CPU cores...
Hence all the rage about 'component systems', 'data oriented design', and multithreaded game engines in recent times.

#5197055 is there a better way top refer to assets in a game?

Posted by Hodgman on 08 December 2014 - 05:38 PM

and the next step after that would be stepping up to skinned meshes vs rigid body animation. but i don't think even dx11 and the latest card could do it: 125 characters onscreen at once without slowdowns at 15fps. that would be 62 characters at once at 30 fps, or 31 skinned mesh characters onscreen at once at 60fps. games can't really do this yet can they? total war draws a lots of characters, but they're not high resolution, like a character in a typical shooter.

In the sports games I've worked on we've usually got ~30 players and referees on the field, plus ~32 low-detail spectators (which are then instanced/impostered to fill up to 100000 stadium seats). That's at 30hz on DX9/2006-era consoles (with about half the frame time being spent on post-processing), and 60Hz on the new DX11/2014-era ones.
Bigger rival companies were doing it at 60Hz in the DX9 era too...

Play any newish Assassin's creed or Hitman game and you'll see crowds of easily 100 animated NPCs, which the player can interact with (interrupt/push/etc).

Going back a ways, any Quake 3 derived shooter (e.g. every Call of Duty game) supports 32 player multiplayer on DX9.

Quake 3 was on the cusp between the CPU doing the skinning, and the GPU's vertex shader taking over that role. These days almost everyone uses GPU-skinning. GPU's can crunch *millions* of pixels per frame with highly complex pixel shaders, so 30 characters * 10k verts is a breeze.

#5196896 Does glMapBuffer() Allocate Client-Side Memory?

Posted by Hodgman on 07 December 2014 - 10:20 PM

I thought operating systems typically provide ALL memory, regardless of where in the system it's located, its own unique range of memory addresses. For example, memory address 0x00000000 to 0x2000000 point to main system memory while 0x20000001 to 0x2800000 all point to the GPU's memory.

Regarding PHYSICAL RAM, maybe... But we work with VIRTUAL RAM at all times these days.
If your process needs to access some physical RAM, the OS has to give you a range of virtual addresses, and then 'map' those virtual addresses to the physical resources you've allocated.
By default, there will be no virtual addresses corresponding to any VRAM. Also, a quirk of modern desktop OS's means only a small bit of VRAM can be mapped to CPU-side virtual addresses at one time (hence all the unmapping).

In practice, if you're using the no-overwrite/unsynchronized map flags/hints, you've got the best chance at being given an actual pointer to VRAM! If so, this means that when writing to those addresses, you'll skip the CPU's caches and go via a write-combining buffer for maximum throughput (another reason for the mandatory unmap - in this case, the driver needs to flush the CPU's write-combine cache), but if you read from that pointer, well, it's going to be dog slow (no cache, non-local resource = bad).

With any other map flags (except perhaps in write-discard/orphaning situations), the driver will almost certainly internally allocate some extra CPU-side RAM, and copy through to the GPU itself.

#5196387 Blending without changing the alpha source

Posted by Hodgman on 04 December 2014 - 10:59 PM

Yeah, the D3D11 equivalent is:


#5196243 Accurately estimating programming cost?

Posted by Hodgman on 04 December 2014 - 07:16 AM

You need an experienced lead programmer -who is familiar with their team of programmers- to make the estimates ;)

You can't just make estimates in a vacuum.
Said lead will break down the design into a list of technical features, identify the dependencies on other features, make a rough list of tasks, then refine it into more precise tasks. They'll estimate all those tasks with regards to the capabilities of their team.
If they've been given 5 veterans who they've worked with in the past, you're going to get a much lower total estimate than if they've been given 15 university graduates.
Ideally the actual staff will have input on generating these estimates (and then the lead might multiply the staff's numbers by Pi just to be safe).

Sometimes you might have to commit some experienced programmers to the project first, in a "pre-production" phase, so they can experiment on different approaches to solving the design requirements before estimates can even be guessed at...
e.g. If you haven't yet chosen an engine, you'll probably want your core team to evaluate your options and make that decision before going on to create the detailed task list.

#5196232 Multithreading with diferred context and CommandLists

Posted by Hodgman on 04 December 2014 - 06:01 AM

You won't see a GPU memory increase, but a system/CPU memory leak.
IIRC, there's a way to enumerate/query which D3D objects exist, and also check for object leaks at shutdown. You should be seeing a huge pile of leaked command lists...

#5196051 Graphics card features

Posted by Hodgman on 03 December 2014 - 06:14 AM

NVidia's Voxel Global Illumination isn't a GPU feature, it's a middleware library that you can license (for $$$$$$) on their GameWorks page.

Likewise, MFAA is a shader they'll sell you for cash...
Their marketing team is pretty disingenuous these days.

Internally, that Voxel GI library depends on partially resident textures / tiled resources.
In GL there's an extension, and in D3D you need to create an 11.1 device.
Technically, it would work fine on AMD cards too...

AMD actually beat NVidia to support tiled resources tier 2, so NVidia's marketing team is going nuts.

#5195984 Alpha blending

Posted by Hodgman on 02 December 2014 - 07:05 PM

Even without knowing what kind of GPU you're using, 1ms to fill the entire screen with a terrain shader sounds pretty cheap!
That's a mere 6% of your frametime for a 60Hz budget.

#5195982 NULL vs nullptr

Posted by Hodgman on 02 December 2014 - 06:54 PM

I have two very large codebases that disagree with you very hard on this one. You're forgetting generic code, methinks.

You cut off 'The other use is inside templates'.
IMHO, the only place I'd use it is deep inside some template metamagic. For day to day code, like initializing local pointer variables, there's no difference between it and 0 (AKA NULL), so the best thing to do is just stick with existing conventions of the rest of the codebase (which in my case is literal 0).

Also, many kinds of serialization and file streaming APIs. I've run into overload problems with things like `Write(int)` vs `Write(char const*)` and needs to write out explicit NUL values often enough (there's still issues with all the implicit conversions between integer types, but that's solved already by type suffices; pointers were not, until now).

Personally that's a non-problem to me. Passing literals into serialization functions is simply bad code™.
What if there's overloads for Write(void*), Write(Object*), etc... Your code for serializing a "null string" by hijacking nullptr is hard to read for someone new to the code (having to read through multiple files to answer "what does Write(nullptr) mean? Oh it's a missing char array!"), but more importantly it's tying the your client-code to an implementation detail of the serialization library -- each of those calls is assuming that Write(char*) is the only pointer overload. If another pointer overload is added later, all your "null string" code breaks!

Hiding explicit programmer intentions behind implicit conversions should get you reprimanded in code review.

Even with the integer overloads, I'd consider passing literals into a heavily overloaded function like this 'Write' to be bad code™.
In any case, you are explicitly specifying one overload to call, but the code makes it non obvious.
It's much more readable to assign to a variable first, use a cast, or a template arg.
Write(0); // ≤-- no!
u32 i = 0; Write(i);

#5195872 NULL vs nullptr

Posted by Hodgman on 02 December 2014 - 06:45 AM

In C, NULL is "(void*)0", but in C++ it's just “0“.
I never use it in C++, and prefer to just write 0 myself (or member() in an initializer list, rather than member(0)).

IMHO, nullptr is nearly useless - it helps avoid the above mentioned artificial case where you've got an overloaded function with a void* and an int version, and want to call it with a hard-coded zero... IMHO that's not really a real problem that needs solving ever!
The other use is inside templates, where you can easily statically-assert that a variable is of pointer-type by initializing it with nullptr.
I still use 0 as a general initializer for pointers when writing C++11...