Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 14 Feb 2007
Offline Last Active Today, 07:15 AM

#5148950 Vertex shader deformation and normals

Posted by Hodgman on Today, 07:17 AM

You need to know the slope of your procedural function at that position (and convert that slope into a normal).


For some functions, you might be able to analytically solve this problem -- rearrange the math to tell you what the derivative of that function is.

For other functions, you can solve it numerically by computing the same function 3 times, once as you have it, once with a small x offset, and once with a small z offset. Then compare the 3 heights, which tells you the slope.

#5148948 Vmax,Vmin Compute Shader

Posted by Hodgman on Today, 07:11 AM

When you compile HLSL code, it gets turned into D3D bytecode, and then the nVidia driver compiles this into nVidia assembly.

(Or when you compile GLSL, the nVidia driver directly compiles it into nVidia assembly).


If their drivers are any good, then simply write the equivalent code in HLSL/GLSL, and their driver will convert it into the most appropriate internal instructions.

#5148944 Game Engine using SDL 2

Posted by Hodgman on Today, 06:55 AM

You can remove a lot of new/delete usage from your classes, e.g.

class Global
	Global(SDL_Window& window, SDL_Renderer& renderer, Logger& logger)
		: input()
		, screen(window, renderer)
		, logger(logger)
		, audio(logger)
		, gfx(window, renderer, logger)

	Input input;
	Screen screen;
	Logger& logger;
	AudioManager audio;
	GraphicsManager gfx;

This is exactly equivalent to your original code, but removes the need for you to remember to write a destructor that deletes everything (and the chance that you'll forget to add a matching delete for every new)... The member variables will be constructed automatically when a Global object is constructed, and will be destructed automatically when that object is destructed.


You're also not following the rule of three -- if your class has a destructor, then it must also have a copy constructor and an assignment operator (the above re-write avoids all this hassle by just not having an explicit destructor cool.png).

#5148898 When do developers stop working on a game before release

Posted by Hodgman on Today, 01:33 AM

Less than zero in many cases wink.png

If a game is running behind schedule, but the publisher has already committed to a certain release date, then you've got to enter the "beta" phase early and fix any bugs that would make Sony/MS/Nintendo refuse to print disks, but not bother fixing gameplay bugs that will annoy players.

Then, once you've got the OK from Sony/MS/Nintendo to print discs (containing a buggy version of the game), you get to work on a "day zero patch" that will be available for download on the release date, and then keep working on the next patch that will finally fix most of the gameplay bugs laugh.png...wacko.png...unsure.png...sad.png...ph34r.png


But yeah, outside of these silly situations, I agree that you'd want the game to be "finished" 2-3 months before the release date. After it's finished, Sony/MS/Nintendo might come back to you with some reasons why they're failing your submission, and you'll have to make some emergency bug-fixes before submitting again. If you're doing a worldwide simultaneous release, you'd probably want to add an extra month on to your distribution time to ensure that the physical discs can get everywhere they're needed.

#5148648 Package File Format

Posted by Hodgman on 21 April 2014 - 10:40 PM

This part is read into Ram on startup and kept there.

No split ?
the header/table should only be a few KB. Easy to 'waste' RAM on storing that whole table, to make loading files easier.

How can you differentiate if it has the same name ?

I don't. If the artists have "level1/concrete.png" and "level2/concrete.png", then the engine tools gives an error, asking them to delete or rename one of them.

What's about the compression ? That still a question.

Using zlib or LZMA SDK is pretty common. Use it to compress each individual file, then in the header/table you can store the offset, compressed size and uncompressed size of each file. When loading a file, malloc the uncompressed size, then stream the compressed data off disk and through your decompression library into the malloc'ed buffer.
Unless the user has an SSD or RAM-disk, this should actually be a lot faster than loading uncompressed files! (As long as you've got the spare CPU time to do the decompression)

#5148644 Custom view matrices reduces FPS phenomenally

Posted by Hodgman on 21 April 2014 - 10:03 PM

There are cases where it can be a good idea to keep view-projection and world matrices separate. Say you've got 10k static objects, if merging these transforms, the CPU has to perform 10k world*viewProj operations, and upload the 10k resultant matrices every frame. If kept separate, the CPU only has to upload the new viewProj matrix, and doesn't have to change any per-object data at all (but of course the GPU now has to do the 10k*numVerts matrix concatenations instead).
The "right" decision depends entirely on the game (and target hardware).

#5148491 Package File Format

Posted by Hodgman on 21 April 2014 - 12:25 AM

Most I've seen, the header is the file table ;)
This part is read into Ram on startup and kept there. It lets you perform a lookup by filename and retrieve the offset and size of a file within the archive.

The last engine that I used, used paths within filenames (e.g. An asset might be called "foo/bar/baz.type.platform") - so just a flat/non-hierarchical table containing these long names.

On my current engine, I actually ignore paths completely, basically moving all assets into a single directory when building an archive (e.g. The above would just be "baz.type.platform"). This means that you can't have two assets with the same name, but I see this as a positive feature rather than a negative ;-)
During development, assets are stored as individual files (not packaged) so it's easy to make changes to the game. Also, any folder structure can be used during development (during dev, that file might be stored in the "foo/bar" directory, the game doesn't care).

#5148440 C++ std::move() vs std::memcpy()

Posted by Hodgman on 20 April 2014 - 06:36 PM

Are you talking in general, or concerning POD types only?

If in general, then you can't memcpy objects. If specifically POD, then they're likely equivalent.

[edit]i.e. You're not allowed to use memcpy on non-POD objects, so the question only applies to POD objects....
Assuming you're breaking the rules by memcpyijg non-POD objects, haven't you answered your own question -- any time that the move constructor does any real work (refcounting, swapping pointers, etc), then obviously this work won't be done by a memcpy.

#5148305 Negatives to Triple Buffering?

Posted by Hodgman on 20 April 2014 - 12:37 AM

Why would it always give an FPS improvement?

All buffering causes input latency. With show-A/draw-B, show-B/draw-A (double buffering), they GPU is one frame behind the CPU, which is adding 16ms input lag (@60Hz).
Tripple buffering is show-A/draw-C, show-B/draw-A, show-C/draw-B, meaning the GPU is two frames behind the CPU, creating 33ms of latency (at 60Hz).

You can minimize this latency by disabling vsync, which means that the 'show/draw' blocks aren't rounded up to 16.6ms... But as you mentioned, this causes horrible jitter.
A constant 60fps/16.6ms per frame looks smooth. Having odd frames take 15ms and even frames take 1ms looks absolutely awful, movements aren't smooth, everything seems to be jerking back and forth as their perceived velocities are messed up by the varying presentation intervals...

As a thought experiment, think for a moment how our frame timers / delta time work in games - they're actually all wrong/hacky!!!
What most games do is - measure how much time has passed since last frame, advance all physics by this amount, draw all objects. The time it takes to do that update/draw work May or may not actually take "delta time" seconds to complete.
what we really want to be using is the time until THIS frame will be displayed, not the time taken by the PREVIOUS frame!
If It completes (and appears on the screen) in less time, then our game will appear to be running too fast. If it completes in more time, our game will appear to be in slow motion. However, the next frame will either slow down or speed up so that overall, game time stays in sync with real time.
Vsync is our friend here, because it tries to force all our frames to take the same amount of time (so previous time == current time, resulting in accurate predictions and smooth overall animation).
If you have a large number of buffered frames, and no vsync, then the length of time that each frame appears on the monitor for can be extremely variable. Some frames may be dropped altogether! This means our time predictions are very wrong (previous != current), so our game will alternate between too fast and too slow on a frame by frame basis, looking "jittery".

Tripple buffering with vsync is very nice for anti-jittering, as it allows you to even out your costs over a few frames. E.g. If you've got most frames taking 10ms, but one takes 20ms, the extra buffering can allow the GPU to still keep displaying one frame every 16ms. But as above, it results in an extra frames worth of latency between the user pressing a button and seeing the result, which is fine for many games, but not for a competitive FPS.

Another disadvantage is RAM usage. An extra (uncompressed) 1080p buffer is quite a decent number of megabytes!

Yes D3D is capable, you just create the extra backbuffers and use the appropriate present calls.

#5148254 Custom view matrices reduces FPS phenomenally

Posted by Hodgman on 19 April 2014 - 06:12 PM

whereas the GPU will need to do it per vertex.

If your graphics drivers are any good, it will only do the multiplication once. If your driver can't perform this optimization, switch gpu vendor.
I'd love to see proof of this. In my experience, if you ask the GPU to perform operations on uniforms per vertex/pixel, then the GPU will do so. The only "preshaders" that I've seen that are reliable are ones that modify your shader code ahead of time and generate x86 routines for patching your uniforms...
Anyway, even if this does work on 1 vendor, you're just playing into the hands of their marketing department by deliberately writing bad code that's going to (rightfully) run slow for two thirds of your users, and act as a marketing tool for one vendor :(

#5148020 Keyboards that don't Key

Posted by Hodgman on 18 April 2014 - 06:16 PM

Yeah PCs used to beep when you'd hold these impossible combinations, by my last 2 PCs haven't. Is that just because motherboard speakers have phased out?

Expensive typist or gaming keyboards won't have this flaw, but common ones will. Not too much you can do but allow key bindings to be customized.

#5148012 References or Pointers. Which syntax do you prefer?

Posted by Hodgman on 18 April 2014 - 05:36 PM

A common style that I've seen is const-references for in-params (which should act like pass-by-value, but are expensive to copy) and pointers for all out params.
The rationale behind this is that it makes out-params very obvious at the call-site, similarly where in other languages the caller must use an 'out' keyword.

It'd be nice if the syntax were a little cleaner in C++, but then it'd also be nice if owning pointers (std::unique_ptr) were a built-in feature like in Rust. C++ is not for people afraid of typing or syntax spew. :P

That would completely kill C++ as being a pay-for-what-you-use / opt-in language. Most embedded systems that I've worked on still use raw-pointers, or a smart pointer that acts like a raw one, but only performs leak detection during development.

#5147998 Speed - Texture Lookups and Structured Buffers

Posted by Hodgman on 18 April 2014 - 04:15 PM

Your buffer method has about double the floats, so double the memory bandwidth of the texture method. The fact that it also ran in around double the time is an indication that your shader is bottlenecked by memory bandwidth.
Try to reduce your memory requirements as much as possible - e.g. 3x8bit normals and 16 or 24 bit depth ;)

The more recent tiled/clustered deferred renderers improve bandwidth by shading more than 1 light at a time -- i.e. They'll read the gbuffer, shade 10 lights, add them and return the sum. Thus amortizing the gbuffer read and ROP/OM costs.

#5147855 Problem with 32-bit PNG

Posted by Hodgman on 18 April 2014 - 05:06 AM

With the last parameter to stbi_load, you're telling it that you only want a 3-channel result, regardless of the source data. That parameter should be zero to say that you want however many are in the file.

At the moment, you're branching based on how many channels were in the file, but your pixel data is always being converted to 3-channel for all cases (leading to GL trying to interpret RGB as RGBA -- giving you pixels containing RGBR GBRG BRGB...)

#5147850 Do you have any tips to share on making/creating a level editor?

Posted by Hodgman on 18 April 2014 - 04:40 AM

Sony recently open sourced a tonne of code for making editors here: https://github.com/SonyWWS/ATF

As for language, at the last 4 jobs I've had, it's been C++ for the engine and C# (mostly) for the tools because productivity is generally more important than optimization for tools.