Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 14 Feb 2007
Online Last Active Today, 07:15 PM

#5216787 Dynamic Octree, Loose Octree is the best solution ?

Posted by on 16 March 2015 - 12:35 AM

Best solution to what problem?

#5216743 What's the difference these books?

Posted by on 15 March 2015 - 05:51 PM

I've only read the second book -- it implements an entire (ray traced) rendering system (including all the filtering, sampling, lighting, etc algorithms required) in the "literate programming" style. You can get their source code on their site / from github.

#5216587 Stuttering problem - first attempt at fixed timestep game loop

Posted by on 15 March 2015 - 03:31 AM

You're measuring time in milliseconds, so your maximum measurement error is <1ms.
Your update frequency is 16ms, so the maximum error as a percentage is (1ms/16ms=) <6.25% of a frame, which is quite large. You could try using a more accurate timer and see if that helps.

To track down stutter issues in my loop, I ended up printf'ing a lot of variables every frame - the current frame number, number of times update was called, amount of time simulated, about of time actually elapsed, time left in the accumulator, time taken by the render function, etc.
Try printing this out, and taking note of the current (approximate) frame number when you notice a stutter, and see if you can spot any irregularities, such as stutters corresponding to a sudden spike in the number of updates per frame, etc.

#5216579 What are your opinions on DX12/Vulkan/Mantle?

Posted by on 15 March 2015 - 01:57 AM


What is an "IHV"?
Independent hardware vendors - Intel, AMD, nVidia, Qualcomm, PowerVR, etc

#5216403 Questions about GPGPU

Posted by on 14 March 2015 - 01:34 AM

The algorithms you use, and your own knowledge of GPUs will have 100x more performance impact than your choice of language.

#5216392 Why learn game programming?

Posted by on 13 March 2015 - 11:29 PM

I was thinking, why learn game programming when I can just jump into an engine like Unreal,Unity or game maker and create games really fast?
What advantage do I get from learning c++ and then using SDL to create a game, over using something like Unreal.

Unreal, Unity, etc games still require programming...

#5216221 Array of structs vs struct of arrays, and cache friendliness

Posted by on 13 March 2015 - 01:03 AM

It depends

There's no silver bullet. The rules is: look at the data, look at the processes, make sure the two fit together.
The AoS version you've shown with an array of x's, an array of y's, etc is popular when doing SIMD processing of types that don't natively fit with the SIMD size.
e.g. on desktop CPU's, we have the SSE instruction set, which can operate on 4 floats at a time.
A Vec3 only has three floats but we can define Vec3 as:
struct Vec3 { float x, y, z, unused; };
And then accelerate all our math with SSE. This isn't perfectly optimal though, as we're only using 3/4ths of the hardware potential.
With your x/y/z array example, you can load 4 x's, 4 y's and 4 z's into 3 registers, and then do math on them with 100% harware utilization.
Often you'll end up actually using SoAoS or AoSoA like this:

struct FourVecThrees
  float x[4];
  float y[4];
  float z[4];
FourVecThrees data[(numThings+3)/4];

P.S. this is what GPU's do these days, except with something more like SixtyFourVecThrees.

#5216182 What are your opinions on DX12/Vulkan/Mantle?

Posted by on 12 March 2015 - 06:47 PM

- Memory residency management. The presenters were talking along the lines of the developers being responsible for loading/unloading graphics resources from VRAM to System Memory whenever the loads are getting too high. This should be an edge case but it's still an entirely new engine feature.

Yeah it's going to be interesting to see what solutions different engines end up using here.
The simplest thing I can think of is to maintain a Set<Resource*> alongside every command buffer. Whenever you bind a resource, add it to the set. When submitting the command buffer, you can first use that set to notify windows of the VRAM regions that are required to be resident.

The fail case there is when that residency request is too big... As you're building the command buffer, you'd have to keep track of an estimate of the VRAM residency requirement, and if it gets too big, finish the current command buffer and start a new one.

- Secondary threads for resource loading/shader compilation. This is actually a really good thing that I'm excited for, but it does mean I need to change my render thread to start issuing new jobs and maintaining. It's necessary, and for the better good, but another task nonetheless.

If you're using D3D11, you can start working on it now.
If you're on GL, you can start doing it for buffers/textures via context resource sharing... But it's potentially a lot of GL-specific code that you're not going to need in your new engine.

- Root Signatures/Shader Constant management
Again really exciting stuff, but seems like a huge potential for issues, not to mention the engine now has to be acutely aware of how frequently the constants are changed and then map them appropriately.

Yeah if you can give frequency hints in your shader code, it might make your life easier.

When compiling a shader, I imagine you'd first try to fit all of its parameters into the root, and then fall back to other strategies if they don't fit.

The simplest strategy is putting everything required for your shader into a single big descriptor set, and having the root just contain the link to that set. I imagine a lot of people might start with something like that to begin with.

I don't have an update-frequency hinting feature, but my shader system does already group texture/buffer bindings together into "ResourceLists".
e.g. A DX11 shader might have material data in slots t0/t1/t2 and a shadowmap in t3. In the shader code, I declare a ResourceList containing the 3 material textures, and a 2nd ResourceList containing the shadowmap.
The user can't bind individual resources to my shader, they can only bind entire ResourceLists.
I imagine that on D3D12, these ResourceLists can actually just be DescriptorSets, and the root can just point out to them.
So, not describing frequency, but at least describing which bindings are updated together.

I'll also be adding in architecture for Compute Shaders for the first time, so I'm worried that I might be biting off too much at once.

Yeah it's haven't done a robust compute wrapper before either. I'm doing the same stateless job kinda thing as I've already done for graphics so far.
With the next generation APIs, there's a few extra hassles with compute -- after a dispatch, you almost always have to submit a barrier, so that the next draw/dispatch call will stall until the preceding compute shader is actually complete.

Same goes for passes that render to render-target actually. e.g. In a post-processing chain (where each draw reads the results from the previous one) you need barriers after each draw to transition from RT to texture, which had the effect of inserting these necessary stalls.

I think a lot of code is going to get much cleaner as a result of all this. A lot of really gross batching and state management/filtering code is just going to go away.

For simple ports, you might be able to leverage that ugly code :D
In the D3D12 preview from last year, they mentioned that when porting 3DMark, they replaced their traditional state-caching code with a PSO/bundle cache, and still got more than a 2x performance boost over DX11.

I think that the people who are going to have the hardest time of it are those who have the heaviest investment in what's become a traditional API usage over the past few years: lots of batching and instancing, in other words.

Stuff that's designed for traditional batching will probably be very well suited to the new "bundle" API.

I am a bit concerned about sync issues. Sync between CPU and GPU (or even the GPU with itself) can lead to some really awful, hard-to-track down bugs. It's bad because you might think that you're doing it right, but then you make a small tweak to a shader and suddenly you have artifacts.

Here's hoping the debuggers are able to detect sync errors. The whole "transition" concept, which is a bit more abstracted than the reality, should help debuggers here. Even if the debugger can just put its hands up and say "you did *something* non-deterministic in that frame", then at least we'll know our app is busted.

#5216166 Map Buffer Range Super Slow?

Posted by on 12 March 2015 - 05:22 PM

A buffer sized for about 3 frames worth of data is generally safe enough. You can map with unsynchronized, append, and when the buffer fills you just begin again at the start of the buffer without needing to fence or orphan. I'd still feel safer adding a fence just in case you hit a non-typical frame.

Never do this without fences! You cant assume a maximum of two frames latency unless you use a fence to enforce it.
You're making the gamble that your users won't be GPU-bottlenecked, and that their driver isn't going to attempt to help by adding extra command buffering, with the wager of you betting graphical corruption against the prize of not having to make a handful of api calls at the end of each frame. That's not a good wager.

The simplest solution it to place a fence at the end of every frame and have the CPU block on the previous frame's fence (max 1 frame latency) or the previous previous frame's fence (max 2 frames latency).

You can then safely size your buffers to be max_size_required_per_frame * (max_latency + 1), and otherwise safely make assumptions about max latency.

BTW, many games have done this for a long time, just to put a limit on buffered GPU frames, as more buffering results in more input lag. FPS or "twitchy" games often limit themselves to one frame's latency, which also has the bonus of reducing the memory requirements of your ring buffers.

#5216014 How do you stop your game from being stolen/torrented?

Posted by on 12 March 2015 - 03:39 AM

Steam gives you some DRM solutions built in. If you only sell via steam, you can just rely on those.

For an online game, you can authenticate all the users when they talk to your servers. This way, pirates will be locked out of the online portions of the game.
Quite a few devs are making their single-player games actually reliant on online servers, just to make piracy harder. e.g. One dev I work with actually doesn't ship most of their code with the game, instead an online server streams scripts to clients on demand...
That kind of thing really annoys players though...

#5216000 Saving old gamestates

Posted by on 12 March 2015 - 02:16 AM

Games like Quake, Half-Life, Counter Strike, don't just save the previous game state, but more like the previous 20+ of them!

Planetary Annihilation saves every single game state in the entire match, so that you can rewind to any point in time.

So yes, it's feasible.

#5215811 What IDEs are recommended for Lua and/or Python?

Posted by on 11 March 2015 - 04:34 AM

For Lua: I use both VS with Lua lang pack for writing it, because I already have VS open when writing my C++ code, and also use Tilde for writing/debugging it.

#5215785 Map Buffer Range Super Slow?

Posted by on 11 March 2015 - 12:36 AM

Also around orphaning (glBufferData with NULL) and the GL_MAP_WRITE_BIT | GL_MAP_UNSYNCHRONIZED_BIT | GL_MAP_INVALIDATE_BUFFER_BIT flag combo, are these valued as doing the same thing? In the past I have had some issues with graphical artifacts when only trying the GL_MAP_INVALIDATE_BUFFER_BIT flag, but cleared them up using orphaning so I'm unsure if I was doing it correctly

No, they're completely different things.

Orphaning allocates an entirely new buffer for you to write new data into, and any new draw commands will reference this new buffer.

Any existing draw commands which have already been submitted (but haven't yet been consumed by the GPU) will still use the old allocation, with the old data, so there's no chance of graphical corruption. After all those commands are executed by the GPU, the driver will garbage-collect this orphaned allocation.


Unsynchronized mapping just gives you a pointer to the existing allocation for that buffer, with zero synchronization or safety. You're making a sacred promise to the driver that you will not overwrite any part of the data that could potentially be used by existing draw commands. You need to implement your own ring-buffer or similar allocation strategy, and use GL fences/events to tell when it's safe to overwrite different parts of the buffer.


From your timing data, we can guess that the extra GPU memory allocation management involved in orphaning is costing you about 10μs per buffer per frame... which is pretty good!

#5215624 How do I know if I'm an intermediateprogramming level?

Posted by on 10 March 2015 - 04:31 AM

[Mod note: hid the last three posts which were the passive aggressive beginning of a flame-thrower war]

#5215580 Questions about GPGPU

Posted by on 09 March 2015 - 10:52 PM

I know compute shaders use HLSL, Nvidia has CUDA etc and so on. My question is can C++ be made to run on GPU?

Microsoft has a language extension called C++ AMP, which allows you to write C++ code where some parts run on the CPU and some parts run on the GPU.
It's designed for people writing C++ programs that crunch a large amount of data, who want an easy way to take advantage of GPU power available in their PCs.

It's not designed for use in games.

If yes, will it be faster than HLSL and CUDA ?