Jump to content

  • Log In with Google      Sign In   
  • Create Account

Banner advertising on our site currently available from just $5!

1. Learn about the promo. 2. Sign up for GDNet+. 3. Set up your advert!

Matias Goldberg

Member Since 02 Jul 2006
Offline Last Active Today, 12:18 AM

#5215030 What are your opinions on DX12/Vulkan/Mantle?

Posted by Matias Goldberg on 06 March 2015 - 03:50 PM

I feel like to fully support these APIs I need to almost abandon the previous APIs support in my engine since the veil is so much thinner, otherwise I'll just end up adding the same amount of abstraction that DX11 does already, kind of defeating the point.

But it depends. For example if you were doing AZDO OpenGL, many of the concepts will already be familiar to you.
However, for example, AZDO never dealt with textures as thin as Vulkan or D3D12 do so you'll need to refactor those.
If you weren't following AZDO, then it's highly likely that the way you were using the old APIs is incompatible with the new says.

Actually there are way to do kindof multithreading in OpenGL 4 : (...). There is also glBufferStorage + IndirectDraw which allows you to access a buffer of instanced data that can be written like any others buffer, eg concurrently.
But it's not as powerful as what Vulkan or DX12 which allow to issue any command and not just instanced ones.

Actually DX12 & Vulkan are exactly following the same path glBufferStorage + IndirectDraw did. It just got easier, made thiner, and can now handle other misc aspects from within multiple cores (texture binding, shader compilation, barrier preparation, etc).

The rest was covered by Promit's excellent post.

#5214737 Vulkan is Next-Gen OpenGL

Posted by Matias Goldberg on 05 March 2015 - 08:36 AM

subjecting yourself to the tortures that OpenGL driver writers had to endure for so long (and still will unless they got promoted).
The OpenGL API is significantly flawed, which is specifically why these kinds of major upgrades have been requested for so long(’s Peak).


That might be fun as a pet project but otherwise I don’t see the point(...)

IMO the point is that instead of having one GL implementation per vendor; we could have just one running on top of Vulkan. So if it doesn't work in my machine due to an implementation bug, I can at least be 90% certain it won't work in your machine either.
In principle it's no different from ANGLE which translates GL calls and shaders into DX9.
However ANGLE is limited to ES2/WebGL-like functionality and DX9 is a high level API with high overhead; while running on top of Vulkan could deliver very acceptable performance and support the latest GL functionality.

#5214490 Vulkan is Next-Gen OpenGL

Posted by Matias Goldberg on 04 March 2015 - 08:57 AM

THIS. A lot of people don't seem to get these are very low level APIs with a focus on raw memory manipulation and baking of objects/commands that are needed very frequently. You destroyed a texture while it was still in use?

Come on, time has changed. Current game engines uses multithreading and multithreading is one of the best ways to kill your game project, still people are able to code games smile.png

It's not really the same. Multithreading problems can be debugged and there's a lot of literature and tools to understand them.
It's much harder to debug a problem that locks up your entire system every time you try to analyze it.

I'm currently at the state of handling many things by buffers and in the application itself and that with OGL2.1 (allocate buffer, manage double/triple buffering yourself, handling buffer sync yourself etc.). Most likely I use only a few % of the API at all. I think that a modern OGL architecture (AZDO, using buffers everywhere including UBOs etc) will be close to what you could expect from vulkan and that if they expose some vulkan features as extensions (command buffer), then switching over to vulkan will not be a pain in the ass.

If you're already doing AZDO with explicit synchronization then you will find these new APIs pleasing indeed. However there are breaking changes like how textures are being loaded and bound. Since there's no hazard tracking, you can't issue a draw call that uses a texture until the it is actually in GPU memory. Drivers were also handling residency for you, but since now they don't, out of GPU errors can be much more common unless you write your own residency solution. Also how textures are bound is going to change.
Then, in the case of D3D12, there's PSOs, which fortunately you should be already emulating them for forward compatibility.

Indeed, professional developers won't have much problems; whatever annoyance they may have is obliterated by the happiness from the performance gains. I'm talking from a rookie perspective.

#5214486 Litterature about GPU architecture ?

Posted by Matias Goldberg on 04 March 2015 - 08:39 AM

Perhaps this is a bit of shameless self-promotion, but I talked a bit about memory operations on modern hardware, it may be of your interest.

They're a bit outdated, but the ATI Radeon 2000 programming guide and Depth In Depth from Emil Persson explain a lot of background concepts that are still relevant today (Hi Z, Z Compression, Early Z, Fast Z Clear, dynamic branching and divergence).
Seeing his two recent talks for modern archs is also useful to find the differences.

#5214367 Vulkan is Next-Gen OpenGL

Posted by Matias Goldberg on 03 March 2015 - 11:03 PM

Remember, Vulkan is going to be a huge pain in the ass compared to GL. The Vulkan API is _much_ cleaner, yes, but it also eschews all the hand-holding and conveniences of GL and forces you to manage all kinds of hardware state and resource migration manually. Vulkan does not _replace_ OpenGL; it simply provides yet another alternative.

The same is true in Microsoft land: D3D11.3 is being released alongside D3D12, bringing the new hardware features to the older API because the newer API is significantly more complicated to use due to the greatly thinner abstractions; it's expected that the average non-AAA developer will want to stick with the older, easier APIs.

THIS. A lot of people don't seem to get these are very low level APIs with a focus on raw memory manipulation and baking of objects/commands that are needed very frequently. You destroyed a texture while it was still in use? BAM! Graphics corruption (or worse, BSOD). You wrote to a constant buffer while it was still in use? Let the random jumping of objects begin! You manipulated the driver buffers and had an off-by-1 error? BAM! Crash or BSOD. Your shader has a loop and is reading the count from unitialized memory? BAM! TDR kicks in or system becomes highly unresponsive.
You need to change certain states more frequently than you thought? Too bad, turns out you need to make some architectural modifications to do what you want efficiently.

It's hard. But I love it, with great power comes great responsability. None of this is a show-stopper for people used to low level programming. But it is certainly not newbie friendly like D3D11 or GL were (if you considered those newbie friendly). Anyway, a lot of people learned hardcore programming back in the DOS days when it was a wild west. So may be this is a good thing.

#5213319 Render Queue Design

Posted by Matias Goldberg on 27 February 2015 - 08:41 AM

You seem to be missing the base theory on which L. Spiro built his posts/improvements.

The article Order your draw calls around from 2008 should shed light on your questions.

#5213316 does g_p2DTex->SetResource(); moves GPU memory?

Posted by Matias Goldberg on 27 February 2015 - 08:23 AM

It just changes pointers.


But what happens inside of D3D11 is much more complex actually. The driver may have decided to page out Texture B from GPU memory because you were not using it (and probably it was running out of VRAM). If that's the case, setting Texture B means the driver will copy the data back from system RAM to VRAM.

And if it's really really really running out of space; it may page out Texture A to make room for Texture B (though it is extremely rare that a driver will page out a texture for another when both are going to be used in the same frame, in this case the driver will probably signal an out of GPU memory error; but if tex A was used in the previous frame and tex B in the next one, this might happen)


Also on a lot of hardware out there switching texture is a "relatively costly" CPU-side driver overhead as the driver needs to prepare all the texture descriptors that have changed. On some hardware this is quite cheap (almost free), on other hardware this has a cost as all their hardware texture registers have to be reset.


All of this is a lot of overhead. While GPU-side this is just switching pointers, internally:

  • The driver needs to track how often textures are being used; and decide to page out the ones that have remain unused for some time.
  • The driver needs to check if the texture needs to be paged in.
  • For some hardware, the driver may need to set all texture descriptors again (not just the ones that have changed) and bring the GPU to a temporary "mini-halt".

OpenGL4 with bindless texture extension gets rid of all this driver overhead thing because it places the burden of managing texture residency on yourself (however **only** DX11-level-hardware from NVIDIA and AMD support bindless, Intel cards can't support it due to hw limitations); and DX12 promises to place the burden on the developer too (which is a good thing for us performance squeezers).


While we wait for the future to arrive, texture arrays are the next best thing; which allow you to choose between textures in the shader and only call SetResource very infrequently; while indirectly controlling residency (if you pack 16 textures together in the same array, the driver has to page them in/out as a whole pack). Though it has its disadvantages too (textures must share same pixel format, same resolution, have lower granularity for paging in/out).

#5213020 DXVA-HD Question

Posted by Matias Goldberg on 25 February 2015 - 10:35 PM

But the one thing that I can't find is how to specify the input file.

You don't.
The DXVA interface doesn't deal with file formats like mp4/mkv. You need to open the file yourself, demux it, read the video stream, and send it to the DXVA interface for decoding. Basically you have the engine but not a car or the wheels. You can use the engine to power a boat.


If, for example, your project is about replaying live streaming, then you don't need to deal with mp4 files or demuxing. You send the raw stream in your own format via UDP and send it directly for decoding once it arrives on your client PC.

For reference I'd recommend you taking a look at Media Player Homecinema's source code. It is open source and the best video player I've seen for Windows.

#5212911 Appending to an append buffer several times

Posted by Matias Goldberg on 25 February 2015 - 07:36 AM

You can, but keep an eye on performance. The more you write to an UAV, the less scalable the performance will be across multithreading hardware, which means performance may be greatly affected with each additional use if the GPU can't hide the latency.

#5212298 Succesful titles from non AAA studios (recent)

Posted by Matias Goldberg on 22 February 2015 - 12:26 PM

To answer OP's question... Flappy Bird.

Now I better run before I get shot and a war starts.

#5211726 Hiding savedata to prevent save backup

Posted by Matias Goldberg on 19 February 2015 - 12:39 PM

1. Just name the save "sound.bak" or something. Really simple but also very easy to "crack"!

Just mask it as an asset exploting a file format which allows putting more stuff at the end of the stream while regular file viewers will ignore your save data (i.e. png, jpg, pdf) like AngeCryption does (see slides).
Just make sure you don't really depend on that asset in case the file saving goes corrupt.

2. Save the data so some silly folder like "C:/appdata/flashdata/fakecompany/sound.bak". But ugly to create folder on the users computer and what if this folder is cleaned out (since its not supposed to be affiliated with the game)? Then the user will loose the progress.

If you do that, your program enters malware territory.

3. Save a timestamp to the savefile and keep a registry of the timestamps somewhere. If the savefile is replaced they will mismatch and you can refuse to load that savegame. But if the player backups the registry then? Which means i have to "hide" the registry file as well.

What happens if the clock goes kaputt? Quite common if the battery died. You'll just annoy your users.
Timestamps aren't reliable.

Also be aware that the process of safely saving a file (that is, following good practices) inherently involves performing an internal backup: (assuming no obfuscation) You first rename your Save.dat as Save.dat.old; then write your new Save.dat; and finally delete Save.dat.old
If the system crashes or power goes off, you first check if there's Save.dat.old and verify Save.dat is valid and doesn't crash if loaded. Once Save.dat is known to be ok, delete Save.dat.old; otherwise delete Save.dat and rename Save.dat.old as Save.dat
This way your user won't lose their entire progress, just the last progress they did (the power went off while saving... after all).

Take in mind that solutions that rely on writing to two or more locations to verify the save hasn't been tampered; you have to be very careful that writing to all those files ends up as an atomic operation, otherwise your "anticheat" technology will turn against your honest users who just experienced a system crash or a power outage and now have a valid save file with a corrupt verification file.

Why prevent cheating on single player games? Cheating is part of the fun. Otherwise TAS Video communities wouldn't prosper.

#5211441 Strange CPU cores usage

Posted by Matias Goldberg on 18 February 2015 - 08:30 AM

If you check the docs from the libs you're using, audio stuff in SDL is multithreaded.


Starting with Windows Vista, all audio is software based; unlike Win XP which could have hardware acceleration. This could easily explain the higher cpu usage.

Just check with a profiler or with ProcessExplorer which threads are active.

#5209331 glTexSubImage3D, invalid enum

Posted by Matias Goldberg on 07 February 2015 - 05:52 PM

Then you've been using GL wrongly or out of touch with the driver team (also looking for the twitters from the devs is a good idea). Often they've fixed my bug reports within a week and included the fix in the next driver update.
Yes, sRGB textures got broken in of their releases and got fixed in the next driver release; which was a long time ago by now. I've been doing very bleeding edge OpenGL 4.4 and lots of low level management and haven't gotten into problems that haven't been fixed after being reported.

#5208465 SDL2 and Linux [RANT]

Posted by Matias Goldberg on 03 February 2015 - 03:10 PM

Roots was correct, my anger was in over excess considering it is free software.


However good part of that anger was fueled by the fact that one major bug (maximizing, resizing and restoring) was not only reported in 2012, but also had multiple patches proposed that were never applied. This made me question the will of the developers to push the sw forward on the Linux platform.

Add that to the other bugs, and my anger went off charts. I mean, a program that hangs if the video drivers aren't really, really up to date (i.e. we first try to create a GL context, if that fails try to do it again with a lower spec) can't be deployed (the amount of bug reports would be too high); which means I would have to seriously reconsider using SDL.


However considering two of those major bugs got fixed (which strongly affect end-user deployment) were fixed within a day after this post, restores my faith on the software; living up to its good reputation.

#5208403 Best practices for packing multiple objects into buffers

Posted by Matias Goldberg on 03 February 2015 - 08:46 AM

Hold on. Hold on. I see a lot of outdated advice.


In 2014 OpenGL 4 changed: AZDO was introduced. It fundamentally changed how the API should be managed; most of which can still be used in 3.0 hardware (as long as the driver is up to date with the necessary extensions).

Unfortunately, it still contains a lot of backwards compatibility baggage; hopefully GLNext will address that issue.


There are no more immutable / dynamic / streaming distinctions. GL_ARB_buffer_storage was introduced in GL 4.4; which is the new way of creating buffer objects in GL using the function glBufferStorage instead of glBufferData. The "immutability" doesn't mean that the contents of the buffer can't be changed, it just means that the size of the object (and access flags) can't be changed afterwards (just like in D3D11...); which is something that glBufferData allowed.


The new access flags are much more low level. It is not streaming / static / dynamic anymore. It's just:

  1. Whether it can be written to by the CPU using glMap*. (CPU -> GPU)
  2. Whether it can be read from by the CPU using glMap*. (CPU <- GPU)
  3. Persistent mapping flags (only available to GL4/D3D11 hardware; on GL3/D3D10 hw you can still use GL_UNSYNCHRONIZED_BIT when mapping as an inferior but still very good workaround)

A buffer that has no read and write flags will 99% likely to be allocated on the dedicated GPU memory; current recomendations is that you should keep very few pools of these (i.e. 1 big pool of 128MB for all your data: vertices, indices, texture buffer objects, uniform buffer objects, indirect buffer objects, etc; beware that if you make it too big, you may run out of GL memory due to fragmentation; just like regular malloc practices in a resource constraint device).

Whereas adding CPU access flags may force the driver to allocate the memory in a place where both the GPU & CPU can access directly (most likely if you just use CPU->GPU access flags), or only the CPU can access and then later the driver copies it to/from the real GPU (somewhat likely if you include CPU <- GPU flags, and almost certainly if you use both CPU <-> GPU access flags). These buffers should be kept small (between 4MB and 32MB each).


You can use these buffers with CPU access flags as intermediate buffers to upload data to your data in video memory, or as dynamic memory to write every frame. The difference depends on how you place your fences (i.e. for your "dynamic" buffers, you want to use just one fence for all dynamic buffers; one per frame) and how much memory you reserve (see AZDO slides, dynamic content uses a triple buffer scheme; so you will allocate 3x as much as need); but the access flags passed to glBufferStorage are exactly the same.


Thanks to persistent mapping (or GL_UNSYNCHRONIZED_BIT), you will control the filling of the buffers with CPU access flags manually using fences.

The last section of the ARB_buffer_storage spec contains example code of how to use a buffer with write access flags to upload data to a buffer with no CPU access flags; in other words; mimicking what you would do in D3D11 with a "staging buffer" to fill a "default buffer".

Note however that you can use persistent mapping to read from the GPU, but you can't use GL_UNSYNCHRONIZED_BIT to read from the GPU; that's the only gotcha.


By keeping all your static/immutable meshes in one big object (basically, most of your data); you can use a single glMultiDrawIndirect (MDI for short) to render all meshes that use the same shader and vertex format. Even if MDI isn't present (i.e. GL3 hardware), you can still use instancing and avoid switching VAOs for most of the time (you only need to switch VAOs if the vertex format is different, or if the mesh uses a different buffer object).

MDI can't be used to render two meshes whose data lives in different buffer objects.


This is basically low level management, which means it's not newbie friendly; and I haven't seen tutorials yet; so expect to bang your head against the wall a couple of times; but it pays off in the long term, and this is where modern GL is heading.


apitest is an excellent reference code on modern GL programming practices. It shows how to efficiently wait for a fence and render these types of buffers.