Jump to content

  • Log In with Google      Sign In   
  • Create Account

Banner advertising on our site currently available from just $5!

1. Learn about the promo. 2. Sign up for GDNet+. 3. Set up your advert!

Matias Goldberg

Member Since 02 Jul 2006
Offline Last Active May 23 2015 03:57 PM

#5229515 no Vsync? why should you

Posted by Matias Goldberg on 17 May 2015 - 07:16 PM

Believe me I've tried it with a fixed game/ physics update time of 1/60th of a second, but just didn't work out (after spending days).

Then you didn't try hard enough. Read the Fix your timestep article again, and try again.

Basically almost every game runs on a fixed time step.
Variable framerate can end up in bugs that only reproduce in very specific framerate scenarios, like people going through buildings / invisible walls, game suddenly freezing because an FPS spike (high or low) caused a NaN or similar impossible-causing scenario, bullets that never hit their target, scripted events that never trigger (imagine beating a level but the game never acknowledged the player won, so you're stuck!). The list of bugs that can happen due to variable framerate steps is never ending and insanely hard to debug and fix.

It's a really, really bad idea.

#5229368 no Vsync? why should you

Posted by Matias Goldberg on 16 May 2015 - 04:13 PM

In that case the game will not as expected (collision detection annoying issues).

You should not be simulating your game using variable framerate. Fix your timestep.

#5229209 Sorting draw calls by distance

Posted by Matias Goldberg on 15 May 2015 - 02:10 PM

Aras has already a blog post dedicated to the OP's question.

#5228263 Back Face Culling idea

Posted by Matias Goldberg on 10 May 2015 - 02:40 PM

I didn't make the link between that article which talks about points in the positive triangle area, and direction of front face and back face.

Ooops. Sorry. Right blog, wrong link. See Fill rules.


Edit: The baricentric conspiracy article is still important to understand what's going on. Through the math exposed there, the author arrives to the code:

if (w0 >= 0 && w1 >= 0 && w2 >= 0)
                renderPixel(p, w0, w1, w2); 

To determine whether a point is inside a triangle.

w0, w1 & w2 are calculated using:

int w0 = orient2d(v1, v2, p); //F12
int w1 = orient2d(v2, v0, p); //F20
int w2 = orient2d(v0, v1, p); //F01

He obtains the formula from his baricentric conspiracy math derivation, where he defines F12, F20 & F01 using a counter-clockwise scheme.

If a point is inside a triangle but the vertices are given in clockwise order, then w0, w1 & w2 will all be negative. This is a damn good mathematical property.

Counterclockwise triangles can be checked doing "if (w0 >= 0 && w1 >= 0 && w2 >= 0)"; while clockwise triangles can be checked doing "if (w0 <= 0 && w1 <= 0 && w2 <= 0)"

Another way to switch this, is to alter the definition of w0, w1, & w2 so that you use F21, F02 and F10 instead; then check for all positive for clockwise triangles, and all negative for counterclockwise ones.


In other words, checking whether a triangle is counterclock- or clock-wise is a matter of checking whether all three coefficients are positive or negative (or toggling the math formula to flip this identity). No cross product involved, no dot product involved. No need for a triangle normal at all. In fact, this can be solved using integer arithmetic and no need for floating point!


Since during rendering we only care that triangles are in one order, we don't need to do anything. We just leave "if (w0 >= 0 && w1 >= 0 && w2 >= 0)" and the vertices in the wrong winding order will automatically be left out.

#5228008 Back Face Culling idea

Posted by Matias Goldberg on 08 May 2015 - 02:58 PM

Whaa..? That's not how rasterizers detect triangle orientation. See 'Oriented edges section' from Fab. Giesen. It basically exploits a mathematical property and is nearly free.

#5227954 Terrible Broken Age sales...!? (steamspy)

Posted by Matias Goldberg on 08 May 2015 - 08:09 AM

but probably kickstarter backers should not count as people who bought broken age, since they didnt bought it, they kickstarterted it and all that money went probably into development so it is not an earning or profit for the creators (double fine).

Here's a shocker for you: people who don't use kick starter fund the development with their own on pocket (or someone else's if using the traditional publishing model) and and need to recover their costs with the sales. Some don't even manage to break even despite selling millions of copies (*cough* Tomb Raider *cough*)

Leaving the backers out is not fair.

#5227559 help~~how to recreate direct3d device in dx11 like dx9

Posted by Matias Goldberg on 06 May 2015 - 12:45 PM

To what has been said, because this error is often caused by a driver hang, driver crash, or GPU taking too long (and sometimes, faulty hardware); this problem often goes away by asking the user to update their GPU drivers.

#5227374 3D SCANNING: Need advice on getting into the industry

Posted by Matias Goldberg on 05 May 2015 - 02:18 PM

As with anything, this kind of things move forward by having the right contacts.


If you're serious about this and know nobody to get you in touch with the right people, prepare a good demo / showcase, and buy some booth space at the next GDC, SIGGRAPH, PAX and similar conferences (GDC is the one with more traction for the kind of stuff you're looking for) where game developers are likely to attend, and showcase your technology, arrange some meetings with them at those conferences (everyone meets everybody there; be sure to do this early because people's agendas get filled quickly!); convince game devs that your technology is good, easy to use, and cheap. If you can show how to scan an object live at the booth you'll definitely get the attention you want.


Then cross your fingers and good luck.



PS. Take in mind that things like scanning real guns & vehicles usually needs some license for using them in-game due to lookalikes & copyright; that is the first problem you'll encounter when you meet with game dev executives (only a few companies like EA are used to licensing and using real world content).

They will also need to be convinced that the cost of your device and operating it (that includes retouches from an artist to make it game friendly, i.e. lower the vertex count, remove artifacts) is lower than the cost of hiring an artist to model it from scratch (or that somehow the two can be complementary).

#5226528 How to learn advanced CryEngine type graphics technologies

Posted by Matias Goldberg on 30 April 2015 - 10:15 AM



Ambiguous slides are great because then you need to think about the problem a bit.

Eh, sometimes you can clearly see the author just doesn't wants to spill his precious beans. When I see that I don't see it like "Oooh, learning opportunity!" I see it more as "Oh, what an asshole!". If you do research and you publish it, you explain it properly, otherwise don't publish at all since it becomes just a matter of showing off at that point.



Even showing off is better than nothing. Sometimes you just want to get some new ideas not how they should be implemented.


Just found this blog that talks about more on this.



He's referring to many of research papers, often coming from universities, that do this (present a new technique with no code or means to reproduce the results) making it useless for any practical purpose.

CryTek's slides are a bit different because they're just explaining 'this is what we did'. It is already proven to work (it's in their games) and obviously they won't give you everything in silver plate to reap off their efforts w/ copy pasting. They give enough to figure the technique on your own.

Of course the more they share, the better.

#5225959 Map SRVs of dynamic buffers with NO_OVERWRITE

Posted by Matias Goldberg on 27 April 2015 - 05:28 PM

In addition to what mhagain said (it only works for resources created with CreateBuffer); D3D11.1 requires Windows 8.1

If you're on Windows 8, 7 or lower, you will get this error.

#5224694 do float operations give different results in different GPUs?

Posted by Matias Goldberg on 21 April 2015 - 09:18 AM

I've never tried using them, but I think the HLSL compiler has an option to force IEEE float compliance, aimed at scientific computing people.

Even when using this option, it won't change the fact that the driver will generate ISA whatever way it wants.

The IEEE flag is there to force the fxc compiler optimize without ignoring the existance of NaNs, infinity, denormals, and floating point errors introduced by things like fused multiply add != multiply followed by add, or bizarre floating point behaviors like a + b != b + a (i.e. the "strict" compiler option from MSVC in C++).

This is indeed useful at scientific applications, but the determism problem remains.


Different drivers can produce different results due to different ISA, and different GPUs will differ in their results.

#5224325 Why high level languages are slow

Posted by Matias Goldberg on 19 April 2015 - 10:53 AM

With exceptions, most of the posts in this thread are entirely missing the point; mostly derrailing into GC problems, language wars or compiler optimizations.

That is not what the article is about.

First, compilers work at the instruction level. Like Mike Acton showed at cppcon2014; there can easly be as much as a 10:1 ratio or more between memory access patterns and code execution.
Compiler optimizers work on the "1" from the ratio, while completely unable to do anything about the optimizations in the "10" space, where one could easily gain 5x to 10x optimizations by taking advantage of it.

This is a problem shared by ASM, C, C++, C#, Java, Lua and many other programming languages. The difference is, asm, C & C++ allow you to easily do something about the problem with minimal effort.
On C# & Java there must be a significant effort or you fight the language by ignoring recommended or intended programming patterns.

Furthermore, recommended/intended programming patterns encourage cache trashing, inability to hide latency, and bandwidth saturation. One simple example is the lack of the const modifier.
C/C++ allows to return const pointers to get read-only access to a memory address. Sure, you can const cast this pointer and break the assumption, but const casting is something that comes with a "DO IT AT YOUR OWN RISK" label. You're breaking a promise.

Java & C# do not have such concept, and encourage instead to return a cloned copy. The advantage is that no one can break the promise because modifications to that clone will not affect the original memory region. The disadvantage is that the memory copy blows the cache & contributes to bandwidth saturation. Many language designers assume the general case is infinite RAM (even C!) so they don't care about cloning memory regions (memory exhausting is rarely a problem on modern systems with >4GB machines); however they ignore the fact that bandwidth & cache is very expensive. Even though memory management has always been a problem (as pointed out by another poster), the problem had historically been around the fact that memory was limited and in PC you would hit the HDD when you ran out. However the current problem is an entire different beast. Hitting the HDD is very rare, but memory bandwidth is a relatively new problem (because it was rarely a bottleneck), memory latency is often the visible cause of performance slowdowns.
I remember reading the explanation on why C# couldn't use const memory regions, and the explanation was pretty compelling. However it doesn't change the fact that the feature isn't there and this causes performance degradation.

Even though they may not even use 1GB of RAM, they may be saturating the BW or trashing the cache. And suddenly performance goes to hell.

Another problem of storing in the heap is that a mutex needs to be performed every time. Even though there are mitigation solutions, this is a problem which C# does compulsively.

Resorting to C/C++ for "compute bound" applications while doing everything else in C# is also missing the point. Again, this is not a problem of being compute bound. Because an extremely good optimizing compiler can produce efficient code out of C# & Java that can result in good compute performance (that may even surpass a not so good C++ compiler!). The problem is about memory.
A code that generally executes in C# results in what profiling veterans call "the cluster bomb". If you want to find out why the program is running slow, launching it in a profiler discovers that the program doesn't spends a high percentage of its time in a particular routine, but rather the inefficiencies are distributed across the entire codebase, adding up wasted time incrementally via branch missprediction, hidden memory allocations, unnecessary memcpys, virtual functions, having the CPU waiting for data to arrive etc. Worst case scenario ever and a PITA to solve.

C & C++ isn't perfect either. Most virtual functions in a C++ project can be resolved at compile time, though the compiler can't do anything about it because of the guarantees the language demands (what if an external application imports or hooks from/into the EXE/DLL and overloads the class? bam! you can't take away the vtable) and the lack of language tools to help the compiler take away the virtual (there is no way to tell easily the compiler "clone this vector<Base> N times, one for each derived class, and when we add each pointer, we add it to the specific vector, so when we iterate we iterate everything in order, without resolving any virtual table at all").
Also DOD (Data Oriented Design) feels like fighting OOP (Object Oriented Programming), but it shouldn't be like this. Because OOP is about the relationship between constructs called "objects", while DOD is how these objects lay out the data in memory and how the code operates in it.
However most languages (except for HLSL & GLSL, perhaps may be ISPC?) I know of intrinsically tie the code with memory layout and code execution & execution flow, which makes DOD & OOP almost antagonic.
I'm still waiting for a language that takes my beautiful OOP structured code and spits out code that is DOD friendly. I'm convinced this is possible.

However, the thing with most high level languages is that writing anything else than the desired method is hard of difficult. This is a good thing thing. It makes coding easier, homogenizes code across developers, induces to less bugs, and even a monkey could use it. The problem is that currently these "desired methods" do not take cache, BW or even lock contention in the heap into account.
The difference between a veteran & rookie developer in high level languages often boils down to big O notation (the chosen algorithm). But the problem is that once you take away the algorithmic differences, a veteran has a hard time optimizing the code further, because the language won't let them or make it painfully difficult.

#5221991 SPIR-V Macro Compiling

Posted by Matias Goldberg on 07 April 2015 - 09:43 PM

Errr... SPIR-V is not a language, it's an IR/IL that aims to be simple to reduce the chance or driver errors. Such functionality you describe would be implemented by the compiler that produces the SPIR-V; as it is pointless to burden the driver with that.
You can come up with your own language that compiles into SPIR-V if you want.

#5221838 GS output data that's not needed by the PS - where's it tossed?

Posted by Matias Goldberg on 07 April 2015 - 08:07 AM

Stop thinking in terms of registers (which is an abstraction that no longer applies to modern GPUs) and start thinking in terms of memory.

The GS will store data you will later not be using, so you're wasting memory and bandwidth.


When the PS loads the data, it is larger and thus it won't fit in the cache as tight as possible.


Whether any of this is an issue depends on whether you're ALU, bandwidth or cache pollution bottlenecked.

#5221415 Vertices and indices in the same buffer?

Posted by Matias Goldberg on 04 April 2015 - 06:57 PM

On modern OpenGL 4: You can use glBufferStorage to create a single chunk of memory that can be used for: vertex, index, constants, UAV and texture buffers. The first binding point the memory is used with can be used as a hint to the driver for performance, but it's just a hint (and afaik... ignored by modern implementations).

On D3D12: See OpenGL 4.

On D3D11: You can use same buffer for vertex & index buffers, but constants have their own. UAV and texture buffers have their own binding point as well.


On WebGL: Every buffer can only be used for one specific type for security reasons (the browser can assume what contents the buffer will contain and perform a lot of validation; as long as you don't modify the buffer again, the validation triggers when you upload data, if WebGL allowed multiple binding points, the validation would have to trigger every time it's bound differently or with a different range which could cripple performance).