Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 14 Feb 2007
Offline Last Active Private

#5183912 Should I wait for OpenGL NG?

Posted by Hodgman on 29 September 2014 - 05:15 PM

GLNG is a complete unknown at this point; it could be out next year, or in five years.
If you're ok with attaching your project to that kind of waiting game, then sure, wait...

You say you don't know GL - do you know any other graphics APIs?
Graphics APIs are a lot like programming languages - learning your first is hard, but learning new ones after that is easy.
If you haven't learned one before, then jump into GL or D3D11 now, so that when GLNG actually exists you'll be able to pick it up quickly.

#5183682 New trends in gaming: voxels are the key to real-time raytracing in near-term...

Posted by Hodgman on 29 September 2014 - 02:26 AM

Also, every year or so someone will bring up a technology where the company, such as Euclideon, keeps claiming it will allow "infinite detail" or "unlimited detail" using voxel-based rendering.
These are basically precomputed, voxel-based octrees that were all the rage in the 1970s. Storage speed and transmission speed have both increased, but still it is only mildly useful in games.  There were many different algorithms in the '70s and '80s over it. Marching Cubes, a moderately efficient voxel isosurface algorithm, was released and patented in 1987. The patent hurt the research field rather painfully until it expired in 2005.

laugh.png  And in today's news...
Holy shit, scanning a religious place, those guys might have got a new logo but WTF their marketing team is missing the basics.

As a side note there's a company near here doing more or less the same thing... except they will give you pretty nice meshes and tons of metadata on request.
Who done what?

The "Unlimited Idiocy" people uploaded a new video, with the same condescending & misleading voice-over from their CEO, so it's time for the idiotic hype train to arrive again.

I'll C&P my response to that article from FB:
They've decided to target the GIS industry, where their tech actually makes sense, after over a decade of failure as a games middleware company. Not looking forward to all the red-herring cries of *HYPE* and "FAKE!" that flood the Internets whenever these snake-oil salesmen poke into the gaming world... Those are red herrings because yes their tech is legit, but no it's not actually that useful for most people. If it was, they wouldn't have failed to sell it to gamedevs for all these years.

If your art generation pipeline is based around laser scanning, your geometry is completely static, you're not already making use of your CPU for gameplay code or whatever, you don't care about using the GPU for rendering (maybe you moved your gameplay code there already
) pre-baked lighting and shading is adequate, you have terabytes of storage available, and sub-30Hz "interactive" frame-rates are ok with you... then yeah, hype4dayz...

#5183267 Source control with git hub/lab?

Posted by Hodgman on 27 September 2014 - 04:12 AM

Really? A new branch for every change? Never heard anyone doing that.

That's pretty much the basic rule when using git. You could even go as far as saying "master is for merging, not for developing". Just make it your goal that master is always in a decent state, builds and isn't a messy construction site.
We're using Gerrit at work, which requires all changes that get pushed to be reviewed before they get merged to master (plus, a build is automatically started to verify the project still compiles... in the future, we might make unit tests still passing a requirement as well).

We generally work on on our own local master branch day to day, but origin/master (the master branch on the central repo) is kept in a working state.
When you're happy with the state of your own local master branch, we run a script that pushes to a temporary remote branch and notifies the build server. That server then compiles your code and runs the game through a bunch of automatic tests on every platform. If all the compilation and testing passes, then the server pushes your changes into origin/master.
That's basically just a fancy system to ensure that no one pushes to origin/master unless their code actually compiles and has been tested first. You could do the same with pure discipline wink.png

Also, if you want to be nice to your co-workers, you clean up your own local master before pushing. e.g. if you've done a series of small commits relating to the same feature, you might use git rebase --interactive to squish a few of them together.
We only really use branches if you're working on a long running task, which can't be committed in part because it would breaks things, and multiple people have to collaborate on finishing it.

#5183132 code for doing something while the game is off

Posted by Hodgman on 26 September 2014 - 08:32 AM

Or run a server. The "while closed" logic runs on your server. The game (client) retrieves data from your server when the user plays it.

#5182800 New game in development

Posted by Hodgman on 24 September 2014 - 10:09 PM

Recruitment threads must be posted in the classifieds section.

#5182766 What kind of performance to expect from real-time particle sorting?

Posted by Hodgman on 24 September 2014 - 06:50 PM

Nope, the point of using bitonic sort is that it can be completely parallel. E.g. In the Wikipedia network diagram, each horizontal line could be it's own thread, performing (synchronized) comparisons at the vertical lines, resulting in a sorted buffer at the end-

Here's an explanation of the algorithm from a GPU point of view, but it's old, so their example implementation is a pixel shader, not a compute shader-

I would assume that he nVidia and AMD sample collections would probably include a compute-shader implementation somewhere. If not, a quick google brings up some CUDA/OpenCL ones you could look at.

As for the bucket idea - you could try a counting sort / radix sort, which can also be parallelized.

#5182653 Handling Uniform Locations?

Posted by Hodgman on 24 September 2014 - 08:36 AM

I have some old hardware that doesn't have the explicit uniform locations extension (even though it is still good hardware), and i would like to support it. It is a pain to use glGetUniformLocation, it causes so much redundant code to be written. Also the fact that uniforms and such can get optimized out, thus return an invalid value (iirc), which you then have to check for to avoid triggering an opengl error, which just piles onto the redundant code. I was wondering if anyone had any tips or can share how they handle this elegantly?

Just pretend that you're using cbuffers / UBO's anyway, then emulate them using glUniform. Instead of creating actual openGL UBO instances, just malloc some memory instead to emulate them.
Make a struct containing a uniform location, an offset (bytes into a cbuffer structure), and the type (e.g. vec4, vec2...).  
e.g. struct VariableDesc { int location, offset, type };
For each shader, for each cbuffer, for each variable in the cbuffer, attempt to make one of these "VariableDesc" structures (failing if you get location of -1). You'll end up with a array of VariableDesc's per each shader per each cbuffer.
When you want to draw something with a particular shader (and a set of bound cbuffer/UBO instances), iterate through these arrays. For each VariableDesc item, read the data at the specified offset, call the gl function of the specified type, passing the specified location. e.g.
for i=0, i!=shader.numCBuffers; ++i
  for j=0; j!=shader.variableDescs.count; ++j
    void* data = ((char*)cbuffer[i]) + shader.variableDescs[j].offset;
      case TypeVec4f: glUniform4fv( shader.variableDescs[j].location, 1, data );
Now you can keep your engine simple, pretending that you're using UBOs everywhere, while emulating support for them on crappy old GL2-era GPUs without hard-coding any glUniform calls.

#5182579 Debugging dip in FPS

Posted by Hodgman on 23 September 2014 - 11:25 PM

Are you using mipmaps?

#5182563 Marvelous Designer: Opinions? Workflow Advices?

Posted by Hodgman on 23 September 2014 - 09:32 PM

Then there is the whole thing with the cloth pattern creation. To say the whole thing is giving me headaches is an understatement. I have no idea about cloth sewing in RL, so while creating a basic skirt or shirt might be easy in MD, as soon as it gets more complicated, I start to struggle.

At my last job, the artists ordered actual physical clothes as reference, then cut them apart along the seams to see how they were constructed, and then got stuck into using MD to recreate similar clothing wink.png (and then finished up in Maya/ZBrush/etc).

#5182034 Is it possible to apply a variable number of shadow maps in one pass?

Posted by Hodgman on 22 September 2014 - 12:05 AM

You can also pack a large number of shadow maps into a single texture. It's common to create a large (e.g. 2048 x 2048) depth map, and then divide it up into a large number of smaller regions for different shadow maps.

e.g. within that single 2048 texture you could have 2x 1024 maps + 4x 512 maps + 8x 256 maps + 32x 128 maps biggrin.png (or any other combination -- more lights = less pixels per light, less lights = more pixels per light)...


You could then have an array of view-proj matrices for each light, and an array of texture coordinate min/max values for each light, specifying which part of the large shadow buffer belongs to each light.

#5182033 DX12 - Documentation / Tutorials?

Posted by Hodgman on 21 September 2014 - 11:59 PM

  • But if you never use more then physical available VRAM in a scene, why would you ever need to page data in and out during that scene?
  • Is it not only when new things come into the scene and old stuff leaves the scene we gotta put them in and out of gpu memory? 
  • Does this memory management stuff really increase framerate? Or is it just loading times that gets better?
  • Also, what data is actually "paged out", unless the compute shader make some data for the CPU, what is there to "copy out"?

1) On an embedded system / game console -- yep, if you stay below the limits, nothing will be paged in/out.
On a desktop PC, you're not the only user of the GPU. Every single other process that's running is likely using the GPU as well -- Windows is using the GPU to draw the desktop, etc. So, you're actually using a "virtual GPU" -- Windows lets you think you're the only one using, and behind the scenes the OS has a fancy manager that combines all the "virtual GPU" objects together and lets them all share a single physical GPU. At any time, Windows might have to evict some of your data from VRAM so that it can put some it's own data (or data from another App) in there instead.

2/4) Ignoring the above situation, where on PC we're using a virtualized GPU, shared with many processes... It's only when a new resource is created, or:
When the total resources used by your game is bigger than the available VRAM *and* the set of resources that are referenced by this frame's command buffer contains resources that aren't currently in VRAM -- in this situation, those resources need to be moved into VRAM. In order to do this we probably have to make space by kicking some other resources out of VRAM -- so we have to find some resources that are not present in the current set, and then memcpy them back to main RAM first.

The data that's paged in and out of VRAM could be anything -- the things paged in are whatever is required for this frame, and the things paged out could be anything else currently present in VRAM.

3) The stuff in (1)/(2) is just about implementing virtual memory on the GPU. This is completely unrelated to performance, it's just a convenience feature of modern computers that allows developers to not care how much physical RAM actually exists.
e.g. on a PC that only has 1GB of physical RAM, you can still write a program containing the statement: new char[2*1024*1024*1024] (allocate 2GB), and it will work, thanks to virtual memory.

Windows (and hence Direct3D/OpenGL) do the same thing on the GPU. If you go over the physical limits, they continue to work, because you're using virtual memory and virtual devices.


On D3D11 and earlier, the stuff in (1)/(2) is completely automatic -- as it's building a command buffer, every time you bind a texture/buffer/etc, D3D internally adds it to a Set<ID3D11Resource*>. When submitting the command list, D3D passes this set of resources down to Windows, so it knows which virtual allocations have to be physically present that frame.

In D3D12, all these nice automatic features are being removed, so it's going to be our responsibility to create this set of required resources ourselves.



This is getting confusing though, because there's actually two things being discussed here. Above is how virtual VRAM works. On D11 it's automatic, on D12 we'll just have to create the Set<ID3D11Resource*> ourselves (or more like struct VramRegion{ size_t offset; size_t size; }; Set<VramRegion> ...;).


Now forget about virtual memory. Ideally, we ignore the fact that we're on Windows (using a virtual GPU) and we try to ensure that we don't use more RAM than is physically available. In this situation, virtual memory isn't a concern. This is the situation that console game developers are in (or full-screen games on PC with Windows GPU compositing disabled and no background processes using the GPU).


All of the new custom memory management stuff means that you don't have to rely on virtual memory. If the GPU only has 1GB of physical RAM, you can allocate 1GB of RAM and not one byte more. If you want to move stuff in and out of RAM, then instead of relying on windows doing it, you can implement your own schemes.


It's extremely common for games to implement texture-streaming systems - a level might have 2GB of texture data, but the GPU budget for textures is only 256MB. As you walk around the level, the game engine is changing the resolution of different textures, streaming new data from disc to VRAM depending on which part of the level you're in.

Same with vertex data and model LOD's, etc, etc...

With manual memory management, this kind of stuff is much easier to implement, as well as being more efficient...

e.g. Normally, you create a texture and then start drawing objects with it. Internally, the driver has to insert "Wait" commands before your draw-calls, which will stall the GPU if the texture transfer hasn't yet completed... This really isn't what the game engine wants.

Now, the game engine can explicitly tell the driver that it wants to asynchronously begin transferring the texture, and it would like to receive a notification when this transfer is complete. The engine can then draw objects with a low-res texture in the meantime, and then switch to drawing with the high-res texture once it's actually been transferred. This removes the potential GPU stalls.

e.g. #2 - with regular ID3D11Texture's etc, you have no control over where in memory your textures get allocated. With manual management, the game engine can pre-reserve a huge block to use for texture streaming. When textures are no longer needed, those areas of the block can be marked as being 'free', and then asynchronous background transfers can be used to implement compaction-based defragmentation of VRAM. A large amount of memory is usually lost to fragmentation, being wasted -- by having control over how things are allocated, and being able to write your own defragmentor allows you to reclaim this waste and effectively have more available RAM.

#5181715 How do u build game engines ?

Posted by Hodgman on 20 September 2014 - 09:44 AM

Instead of using an all-in-one game-engine library, you use many libraries like FMOD, PhysX, Direct3D, Lua, RakNet, Rocket, etc...

#5181710 DX12 - Documentation / Tutorials?

Posted by Hodgman on 20 September 2014 - 09:17 AM

Conservative rasterization should allow for some film-quality AA solutions to be developed. It would be possible to store per-pixel, a list of every triangle that in any way intersects with that pixel's bounds. You could then resolve that list in post, sorting and clipping them against each other to get the actual area covered by each primitive.

#5181680 Temporal coherence and render queue sorting

Posted by Hodgman on 19 September 2014 - 10:53 PM

How do you deal with sorting of transparent polygons that belong to the same sub-mesh?

Games usually just ignore this problem altogether!
Occasionally, I've seen people pre-sort their triangles for some small number of directions (e.g. 8) and generate 8 different index buffers for those orderings. When submitting the mesh, you compare the view direciton to those 8, and use the index buffer offset of the closest match.

#5181528 Poor STL threads performance

Posted by Hodgman on 19 September 2014 - 07:25 AM

Creating and destroying threads are costly operations. You ideally want to have a small number of permanent threads (about the same number has you have hardware-threads) and keep them busy over the life of your app.