Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 14 Feb 2007
Offline Last Active Today, 07:58 AM

#5148644 Custom view matrices reduces FPS phenomenally

Posted by Hodgman on 21 April 2014 - 10:03 PM

There are cases where it can be a good idea to keep view-projection and world matrices separate. Say you've got 10k static objects, if merging these transforms, the CPU has to perform 10k world*viewProj operations, and upload the 10k resultant matrices every frame. If kept separate, the CPU only has to upload the new viewProj matrix, and doesn't have to change any per-object data at all (but of course the GPU now has to do the 10k*numVerts matrix concatenations instead).
The "right" decision depends entirely on the game (and target hardware).

#5148491 Package File Format

Posted by Hodgman on 21 April 2014 - 12:25 AM

Most I've seen, the header is the file table ;)
This part is read into Ram on startup and kept there. It lets you perform a lookup by filename and retrieve the offset and size of a file within the archive.

The last engine that I used, used paths within filenames (e.g. An asset might be called "foo/bar/baz.type.platform") - so just a flat/non-hierarchical table containing these long names.

On my current engine, I actually ignore paths completely, basically moving all assets into a single directory when building an archive (e.g. The above would just be "baz.type.platform"). This means that you can't have two assets with the same name, but I see this as a positive feature rather than a negative ;-)
During development, assets are stored as individual files (not packaged) so it's easy to make changes to the game. Also, any folder structure can be used during development (during dev, that file might be stored in the "foo/bar" directory, the game doesn't care).

#5148440 C++ std::move() vs std::memcpy()

Posted by Hodgman on 20 April 2014 - 06:36 PM

Are you talking in general, or concerning POD types only?

If in general, then you can't memcpy objects. If specifically POD, then they're likely equivalent.

[edit]i.e. You're not allowed to use memcpy on non-POD objects, so the question only applies to POD objects....
Assuming you're breaking the rules by memcpyijg non-POD objects, haven't you answered your own question -- any time that the move constructor does any real work (refcounting, swapping pointers, etc), then obviously this work won't be done by a memcpy.

#5148305 Negatives to Triple Buffering?

Posted by Hodgman on 20 April 2014 - 12:37 AM

Why would it always give an FPS improvement?

All buffering causes input latency. With show-A/draw-B, show-B/draw-A (double buffering), they GPU is one frame behind the CPU, which is adding 16ms input lag (@60Hz).
Tripple buffering is show-A/draw-C, show-B/draw-A, show-C/draw-B, meaning the GPU is two frames behind the CPU, creating 33ms of latency (at 60Hz).

You can minimize this latency by disabling vsync, which means that the 'show/draw' blocks aren't rounded up to 16.6ms... But as you mentioned, this causes horrible jitter.
A constant 60fps/16.6ms per frame looks smooth. Having odd frames take 15ms and even frames take 1ms looks absolutely awful, movements aren't smooth, everything seems to be jerking back and forth as their perceived velocities are messed up by the varying presentation intervals...

As a thought experiment, think for a moment how our frame timers / delta time work in games - they're actually all wrong/hacky!!!
What most games do is - measure how much time has passed since last frame, advance all physics by this amount, draw all objects. The time it takes to do that update/draw work May or may not actually take "delta time" seconds to complete.
what we really want to be using is the time until THIS frame will be displayed, not the time taken by the PREVIOUS frame!
If It completes (and appears on the screen) in less time, then our game will appear to be running too fast. If it completes in more time, our game will appear to be in slow motion. However, the next frame will either slow down or speed up so that overall, game time stays in sync with real time.
Vsync is our friend here, because it tries to force all our frames to take the same amount of time (so previous time == current time, resulting in accurate predictions and smooth overall animation).
If you have a large number of buffered frames, and no vsync, then the length of time that each frame appears on the monitor for can be extremely variable. Some frames may be dropped altogether! This means our time predictions are very wrong (previous != current), so our game will alternate between too fast and too slow on a frame by frame basis, looking "jittery".

Tripple buffering with vsync is very nice for anti-jittering, as it allows you to even out your costs over a few frames. E.g. If you've got most frames taking 10ms, but one takes 20ms, the extra buffering can allow the GPU to still keep displaying one frame every 16ms. But as above, it results in an extra frames worth of latency between the user pressing a button and seeing the result, which is fine for many games, but not for a competitive FPS.

Another disadvantage is RAM usage. An extra (uncompressed) 1080p buffer is quite a decent number of megabytes!

Yes D3D is capable, you just create the extra backbuffers and use the appropriate present calls.

#5148254 Custom view matrices reduces FPS phenomenally

Posted by Hodgman on 19 April 2014 - 06:12 PM

whereas the GPU will need to do it per vertex.

If your graphics drivers are any good, it will only do the multiplication once. If your driver can't perform this optimization, switch gpu vendor.
I'd love to see proof of this. In my experience, if you ask the GPU to perform operations on uniforms per vertex/pixel, then the GPU will do so. The only "preshaders" that I've seen that are reliable are ones that modify your shader code ahead of time and generate x86 routines for patching your uniforms...
Anyway, even if this does work on 1 vendor, you're just playing into the hands of their marketing department by deliberately writing bad code that's going to (rightfully) run slow for two thirds of your users, and act as a marketing tool for one vendor :(

#5148020 Keyboards that don't Key

Posted by Hodgman on 18 April 2014 - 06:16 PM

Yeah PCs used to beep when you'd hold these impossible combinations, by my last 2 PCs haven't. Is that just because motherboard speakers have phased out?

Expensive typist or gaming keyboards won't have this flaw, but common ones will. Not too much you can do but allow key bindings to be customized.

#5148012 References or Pointers. Which syntax do you prefer?

Posted by Hodgman on 18 April 2014 - 05:36 PM

A common style that I've seen is const-references for in-params (which should act like pass-by-value, but are expensive to copy) and pointers for all out params.
The rationale behind this is that it makes out-params very obvious at the call-site, similarly where in other languages the caller must use an 'out' keyword.

It'd be nice if the syntax were a little cleaner in C++, but then it'd also be nice if owning pointers (std::unique_ptr) were a built-in feature like in Rust. C++ is not for people afraid of typing or syntax spew. :P

That would completely kill C++ as being a pay-for-what-you-use / opt-in language. Most embedded systems that I've worked on still use raw-pointers, or a smart pointer that acts like a raw one, but only performs leak detection during development.

#5147998 Speed - Texture Lookups and Structured Buffers

Posted by Hodgman on 18 April 2014 - 04:15 PM

Your buffer method has about double the floats, so double the memory bandwidth of the texture method. The fact that it also ran in around double the time is an indication that your shader is bottlenecked by memory bandwidth.
Try to reduce your memory requirements as much as possible - e.g. 3x8bit normals and 16 or 24 bit depth ;)

The more recent tiled/clustered deferred renderers improve bandwidth by shading more than 1 light at a time -- i.e. They'll read the gbuffer, shade 10 lights, add them and return the sum. Thus amortizing the gbuffer read and ROP/OM costs.

#5147855 Problem with 32-bit PNG

Posted by Hodgman on 18 April 2014 - 05:06 AM

With the last parameter to stbi_load, you're telling it that you only want a 3-channel result, regardless of the source data. That parameter should be zero to say that you want however many are in the file.

At the moment, you're branching based on how many channels were in the file, but your pixel data is always being converted to 3-channel for all cases (leading to GL trying to interpret RGB as RGBA -- giving you pixels containing RGBR GBRG BRGB...)

#5147850 Do you have any tips to share on making/creating a level editor?

Posted by Hodgman on 18 April 2014 - 04:40 AM

Sony recently open sourced a tonne of code for making editors here: https://github.com/SonyWWS/ATF

As for language, at the last 4 jobs I've had, it's been C++ for the engine and C# (mostly) for the tools because productivity is generally more important than optimization for tools.

#5147832 Globals

Posted by Hodgman on 18 April 2014 - 01:22 AM

I haven't used Unreal since 2.5, but their code used to be "typical C++ bullshit", incorrect-OO everywhere bad code ;P

Just because a building is popular, or had a lot of tenants, it doesn't mean that the foundations are sound or that renovations would be easy ;)

#5147831 Preparing Textures to Reduce Shimmering

Posted by Hodgman on 18 April 2014 - 01:16 AM

The way that you generate your mipmaps also has an impact. Standard mipmapping should completely eliminate texture shimmering, at the cost of blurring details. Some art is configure to use a filter other than bilinear durin generation in order to preserve details a bit more.

#5147779 10-bit Monitors

Posted by Hodgman on 17 April 2014 - 07:37 PM

IIRC, you might have to use a Direct3DDevice9Ex instead of a Direct3DDevice9.

#5147737 Component Entity and Data-Orientated Design

Posted by Hodgman on 17 April 2014 - 03:36 PM

1- duplicate data where required. Nothing wrong with a physics position and a render position.

2- sort your pools using a common ID (e.g. The entity ID), so that the iteration order of multiple pools will be the same.

#5147341 Shadowsampler comparison

Posted by Hodgman on 16 April 2014 - 05:28 AM

Yep. Probably good to have a few different shaders for low-end vs high-end GPUs.
But, 12 HW-PCF samples is actually 48 texels worth of data, so it's pretty good value ;)