Jump to content

  • Log In with Google      Sign In   
  • Create Account

Matias Goldberg

Member Since 02 Jul 2006
Offline Last Active Yesterday, 11:43 PM

#5276213 Vulkan is Next-Gen OpenGL

Posted by on 17 February 2016 - 04:30 PM

Anybody know of any open source graphics engines going to be built around these new apis? The most popular open source engine I know is Ogre, and that is build around DX9. I would love to see a graphics engine that is build specifically around these new apis.

Ogre 2.1 is built around AZDO OpenGL, D3D11, and mostly prepared for D3D12 & Vulkan.

Our biggest issue is some old code for Textures that needs a big refactor, which is currently the biggest issue when implementing the D3D12 & Vulkan RenderSystems.

#5275755 Do game developers still have any reason to support Direct3D 10 cards?

Posted by on 15 February 2016 - 09:47 AM


The D3D9 to D3D10 shift was a very peculiar one. It wasn't just performance, D3D10 introduced a few improvements that were very handy and easy to support for adding "extras" (note: these extras could've been easily backported to D3D9, but MS just wasn't interested).
For example:
1) Z Buffer access, 2) Z Out semantic, 3) sending to multiple RTTs via the geometry shader, 4) access to individual MSAA samples, 5) separate alpha blending, 6) dynamic indexing in the pixel shader, 7) real dynamic branching in the pixel shader.

You're misremembering how groundbreaking D3D10 was smile.png
Uptake of D3D10 was excruciatingly slow, as it didn't really introduce any killer new feature (geometry shaders did not live up to the hype), and came with the lack of XP-compatability, which was a big deal at the time. In my experience, a lot of people seem to have stuck with D3D9 until D3D11 came out. Most of the stuff you mention is accessible from the D3D9 API:
1) D3D9-era GPUs had multiple different vendor-specific extensions for this, which was painful. D3D10-era GPUs all support the "INTZ" extension (side note: you see a LOT of games that use the D3D9 API, but list the GeForce 8800 as their min-spec, which is the first NV D3D10 GPU -- my guess is because being able to reliably read depth values is kinda important)
2/5/7) Are in the D3D9 core API.
3) is a D3D10 API feature, but performance wasn't that great...
4) is a D3D10.1 API feature (and compatible GPU), but wasn't put to great use until proper compute shaders appeared in D3D11 smile.png
6) is emulatable in D3D9 but requires you to use a tbuffer instead of a cbuffer (as cbuffers weren't buffers in D3D9).
I've actually been toying with the idea of doing a D3D9/WinXP build of my current game, as a 10-years-too-late argument against all the D3D10 hype of the time laugh.png
You can actually do a damn lot with that API, albeit a lot less efficiently than the D3D11 API does it! I'd be able to implement a lot of the D3D11-era techniques... but with quite significant RAM and shader efficiency overheads. Still, would be fun to see all those modern techniques running badly on Windows XP (or an XP app, emulated on Wine!!).


Actually I did. I was using those screenshots about Saint Rows & Bioshock DX10-only features he posted:

  • Reflections. I guess they used GS to write to multiple RTTs at once. Otherwise it doesn't make sense to be DX10-only (from a technical point of view). While GS didn't live nowhere to their hype, that doesn't mean people didn't try. Probably they didn't gain performance. But porting it to DX9 would mean creating two codepaths (one for single pass using a GS, another for multi pass w/out GS). Note however, hybrids did actually improve performance. A hybrid would use instancing to multiply the geometry, a Geometry Shader to output to multiple RTTs, and still being multipass. Instead of writing to all 6 faces in one pass, write to 3 faces in 2 passes, or 2 faces in 3 passes. This kind of leverage allowed to find a sweet spot in performance improvement.
  • Ambient occlusion: Clearly they're doing a dynamic loop in the pixel shader which would explain why they'd need DX10. Or maybe they wanted Z Buffer access and didn't bother with the INTZ hack.
  • DirectX10 detail surfaces: I'm suspecting they mean multiple diffuse/normal textures overlayed on top of each other, taking advantage of array textures. Or maybe they enabled some Geometry Shaders somewhere for extra effects, like in a wall or something.

All of these features can definitely be done on DX9. But on the lazy side, you have to admit they're much easier to implement on DX10 (or like you said, doing it in DX9 would require more RAM or some other kind of overhead).

Like you said, DX10 wasn't that groundbreaking; but the features (that could've easily been backported to DX9, but weren't; save for vendor hacks like the INTZ one) that were added allowed games to include "turn on / turn off" kind of effects when running in DX10 mode.

#5275703 Do game developers still have any reason to support Direct3D 10 cards?

Posted by on 15 February 2016 - 12:12 AM

The D3D9 vs D3D10/11 and D3D10 vs D3D11 is not exactly the same.
Supporting multiple DX versions means we need to aim for lowest common denominator. This cripples performance optimizations that are not possible because of the oldest path (unless we'd spent an disproportionate amount of resources to maintain two completely different code paths).
This means a game well-designed to run D3D11 will be significantly more efficient than a game that aims to run on D3D11, 10 & 9.
The D3D9 to D3D10 shift was a very peculiar one. It wasn't just performance, D3D10 introduced a few improvements that were very handy and easy to support for adding "extras" (note: these extras could've been easily backported to D3D9, but MS just wasn't interested).
For example: Z Buffer access, Z Out semantic, sending to multiple RTTs via the geometry shader, access to individual MSAA samples, separate alpha blending, dynamic indexing in the pixel shader, real dynamic branching in the pixel shader.
All stuff that made certain postprocessing FXs much easier. Therefore it was possible to offer DX10-only effects like you see in Bioshock that can be turned on and off, just to "spice up" the experience when you had a recent GPU running on Vista.
But moving from D3D10->D3D11... there weren't many features introduces, but those few features... oh boy they were critical. Let's take Assassin's Creed Unity for example: its frustum culling and dispatch of draws lives in a compute shader! We're not talking about an effect you can turn on and off. We're talking about the bare bones of its rendering infrastructure depending on a feature unavailable to D3D10 cards. Supporting D3D10 cards may mean as well to rewrite 70% or more of its entire rendering engine; which also likely will affect the asset pipeline and the map layout.


There are only a few D3D11-only things that can be used to spice up the graphics while still turning them off for D3D10, tessellation comes to mind.

#5275583 Do game developers still have any reason to support Direct3D 10 cards?

Posted by on 13 February 2016 - 04:44 PM

Since in Ogre 2.1 we aim to support both DX10 & DX11 hardware, I've gotta say DX10 hardware's DirectCompute limitations are currently giving me a big PITA.

AMD's Radeon HD 2000-4000 hardware didn't even get a driver upgrade to support DirectCompute. So even if you limit yourself to the structured buffers available to DX10 hardware, these cards won't even run these compute shaders (despite the hardware being completely capable of doing so). I don't know what about Intel DX10 GPUs, but I suspect it's the same deal.


AFAIK only NVIDIA DX10 GPUs got the upgrade.

#5275464 MSVC generating much slower code compared to GCC

Posted by on 12 February 2016 - 04:23 PM


The first version calls std::vector<>::size() every iteration. The second does so only once and stores the value in a local variable.

I would have thought that something as trivial as size() would have gotten inlined out? Though at least the implementation I'm looking at computes the size by creating the beginning and end iterators and subtracting them, so maybe that isn't much of a savings anyway.


It's not trivial at all. Consider the following:

m_sum = 0;
for( size_t i=0; i != m_vec.size(); ++i )
  m_sum += m_vec[i];

Unless the full body of someFunc is available at the compilation stage (and even if it does, someFunc must not do something that is not visible to the compiler), the compiler literally can't know if someFunc() will alter m_sum or if it will push or remove elements from m_vec; hence m_vect.size() must be fetched from memory every single loop, so does m_sum.

However if it were changed to:

size_t tmpSum = 0;
const size_t vecSize = m_vec.size();
for( size_t i=0; i != vecSize; ++i )
  tmpSum += m_vec[i];

m_sum = tmpSum;

Now the compiler knows for sure neither tmpSum nor vecSize will change regardless of what happens inside someFunc (unless someFunc corrupts memory, of course) and can keep their values in a register instead of refetching on every single loop iteration.


It's far from trivial, in fact most of the time the word is "impossible" to optimize. This is where the arguments of "let the compiler do its job. Don't worry about optimization, they're really good" falls short. Yes, compilers have improved tremendously in the last 15 years, but they aren't fortune tellers. They can't optimize away stuff that might change the expressed behavior (I say expressed because what's intended may have more relaxed requirements than what's being expressed in code)

#5275351 request HLSL support for sqrt() for integers

Posted by on 11 February 2016 - 05:05 PM


IEEE compliance doesn't guarantee determinism across different GPUs

No, IEEE compliance does in fact guarantee "determinism" across different GPUs.


No. It does not.

You need a guarantee the generated ISA performs all calculations in the exact same order, which you don't get with IEEE conformance, not even if you use the same HLSL compiler. IEEE also doesn't guarantee the intermediate precision in which calculations are made.

As you can see, I was talking about graphics, not simulation, where e.g. one ulp of color difference "most likely" will not matter.

One ulp of different in Dolphin "Most likely" broke emulation of Pokemon Snap. "Most likely" was also causing bizarre glitches.

#5275224 Clinical studies on overlooking stupid bugs

Posted by on 10 February 2016 - 11:11 PM

QtCreator will show everything in purple that is covered by half-open bracket or parenthesis like that when the cursor touches the offending bracket.
It's very useful. See that "mMaxTexUnitReached( 0 ))" has no matching '(':

Edit: I saw your other post. With practice you'll quickly learn to recognize that when your IDE's autoformatting is doing something you don't expect (like over- or under-indenting your lines), it probably means you've just introduced a syntax error.

#5274910 Cache misses and VTune

Posted by on 08 February 2016 - 02:55 PM

I'm sorry, but the data you've shown is exactly what's supposed to happen.


You're completely random-accessing and looping through 253MB of data, which obviously does not fit in the cache, and VTune is telling you that you're DRAM bound. This is exactly what will happen if the first iteration indexes the float[5] and float[26600000]; and the next iteration indexes the float[99990] and the float[7898]. The cache is effectively useless, and all the bottlenecks will be in the DRAM.


What do you expect it to tell you?

#5274800 Query Timestamp inconsistency

Posted by on 07 February 2016 - 06:04 PM

If you are on Windows 7, make sure to disable Aero.

Also for best results perform these queries in exclusive fullscreen.


The compositor's presentation can seriously skew your measurements.

#5274589 Any alternatives to automatic class instantiation via macro?

Posted by on 05 February 2016 - 09:04 PM

I agree with everyone... on desktop.


Unfortunately Android and iOS came to crash the party where there is no main, and the former enters into Native Code via a Java loader that loads an so library with a set of arbitrary-named JNI function calls, and the latter enters the system by overloading AppDelegate.


Considering these two bastards if they need to be supported, the macro idea looks suddenly more appealing; although I personally still prefer letting the user write these JNI loaders or iOS AppDelegates himself, instead of trying to do it for him (specially when the user needs to release resources or be notified of low memory conditions).

If a macro tries to do it for the user, when something goes wrong there's always that weird feeling that it's the macro's overloaded method fault (i.e. "I bet main system isn't informing me of low memory conditions even though the app is receiving them")

#5274581 Why are there no AAA games targeted towards the young adult audience?

Posted by on 05 February 2016 - 08:01 PM

This theory also seems to apply towards why there aren't many games that cover political or sociological themes.

The first Assassin's Creed games were strongly loaded with political and sociological themes.
I still remember fondly the long discussions about politics, religion, morality and ethics between Altair and Al Mualim (even though I met a lot of people who disliked those moments... "boring" they said).

The second game is about a teenager seeking revenge for the unjust sentence to death of half of his family (quite common in that era), involving real world events like Lorenzo Di Medici's attempt of murder, the Pazzi conspiracy, the speculation of poisoning of the Doge of Venice Giovanni Mocenigo, the Borgia's family drama, and well... someone summarized it for me. It also covers topics like thievery, extreme poverty, and prostitution.

Some people may have played AC II as just a dude that kills people with cutscenes inbetween; but it's actually strongly charged with a lot of content if you pay attention to the story.

#5274389 Why are there no AAA games targeted towards the young adult audience?

Posted by on 04 February 2016 - 09:36 PM

According to Wikipedia, young adult is between 14-20 years old.


I was under the impression most games target that audience already.

Also according to Wikipedia, YA literature often treats topics such as depression, drug & alcohol abuse, identity, sexuality, familial struggles and bullying.


Perhaps you meant to ask why aren't there more games covering these topics. Which is a very different type of question. If that's the case, beware the target market is mostly the same as current games, so they would be against a lot of strong, established competition.

#5274064 [Debate] Using namespace should be avoided ?

Posted by on 03 February 2016 - 10:19 AM

using namespace has little to no way of being disabled once it is declared. Which is why at header level can be a PITA.

At .cpp file level it sounds more sane. But if you try "Unity builds" to speed up compilation, using namespace at .cpp file level comes back to bite you. Which makes "using namespace" more friendly at an enclosed scope, e.g.:

void myFunc( int a )
     using namespace std; //Only available at myFunc level.


Typing std is not a big deal, so I try to avoid it as much as possible. Furthermore it "using" pollutes autocomplete.


There are legitimate cases where it's appropriate, but use with discretion, with care.

#5273997 Multi-threaded deferred setup

Posted by on 02 February 2016 - 09:54 PM

For "read once" (ie, not read again on the next frame) dynamic data such as constants it's not worth copying it over to the GPU. Just leave the data in the UPLOAD heap and read it from there.

Actually on GCN performing a Copy via a Copy queue allows GCN to start copying the data from bus to the GPU using its DMA engines while it does other work (like rendering the current frame); which might result in higher overall performance (particularly if bound by the bus or latency is an issue).


However it hurts all other GPUs which don't have a DMA Engine (particularly Intel integrated GPUs and AMD APUs which don't need this transfer at all and takes away precious bandwidth)

#5273783 Shader Permutations

Posted by on 01 February 2016 - 09:37 PM

You may be interested in how we tackled it in Ogre 2.1 with the Hlms (see section 8 HLMS).

Basically, 64 bits will soon look like not enough flags to handle all the permutations. But like Hodgman said, many of these options are mutually exclusive; or most of the combinations aren't used.


The solution we went for was, at creation time, to create a 32-bit hash to the shader based on all the options (which are stored in an array), and store this hash in the Renderable.

Then at render time we pull the right shader from the cache using the "final hash". The final hash is produced by merging the Renderable's one with the Pass hash. A pass hash contains all settings that are common to all Renderables and may change per pass (i.e. during the shadow map pass vs another receiver pass vs extra pass that doesn't use shadow mapping for performance reasons).

You only need to access the cache when the hash between the previous and next Renderable changes, which why it is a good idea to sort your Renderables first.


Source 2 slides suggest a similar thing to map their PSOs (see slide 13-23; PPT for animated version).


While a 64-bit permutation mask works well for simple to medium complex scenes, it will eventually fall short; specially if you need to adapt to very dynamic scenarios or have lots of content. However implementing a 64-bit permutation mask is a good exercise to get a good idea of the pros and cons of managing shaders.