Jump to content

  • Log In with Google      Sign In   
  • Create Account

Matias Goldberg

Member Since 02 Jul 2006
Offline Last Active Yesterday, 06:06 PM

#5281393 Vulkan is Next-Gen OpenGL

Posted by on 15 March 2016 - 04:03 PM

You can build the debug layer yourself; thus fixing the DLL issue.

#5281380 How can I manage multiple VBOs and shaders?

Posted by on 15 March 2016 - 02:17 PM

On my computer, I can draw 125 000 times the same cube, while unbinding-rebinding it for EVERY draw call (It's stupid, but I did it for benchmarking purpose) with still 50 frames per second.

The driver will detect you're rebinding it and ignore it. What you measured was the cost of the call instructions and checking if the VBO was bound, and not the actual cost of changing bound VBOs.
A modern computer can definitely not handle 125.000 swaps at 50hz at all.

#5280899 Selling a game on Steam- Steam and VAT cuts...?

Posted by on 12 March 2016 - 09:18 AM

I guess which is why you want to use a retailer like Steam rather than running your own store. Ain't nobody got time to research the implications of selling something to 300 jurisdictions...

It would be easy if it weren't for the EU that screwed up. Traditionally, "civilized" countries applied the VAT from imports (as seen from the country of the buyer) by withholding at the payment issuer (e.g. typically the credit card). Even if you pay with PayPal, PayPal gets the money via a credit card transaction, a bank transfer, or another PayPal transfer.


Only the last one is hard to withheld (paying w/ PayPal using funds from a previous PayPal transaction), and even then, legislation on "decent" countries is to make the buyer liable for paying the VAT. After all, the VAT is a tax imposed on the consumer, not the seller (but normally the seller is the one who deposits the VAT funds to the Tax agency since it's the most practical thing to do; but not in this case).

But no... the EU wanted to force the buyer to register with their tax administration agencies even if the seller never set a foot there, or despite these EU countries don't even have jurisdiction to enforce such thing. I ranted about it in detail on my website.

You know there is a threshold to vat right? Check with an accountant, but I'm sure that you don't even have to deal with it at all until your turnover per year is greater than £20K...

Yes, but the problem with EU's new legislation is that you have to check the threshold of every single european country, and watch out if there's a country without thresholds.

It's a mess.

#5280773 Selling a game on Steam- Steam and VAT cuts...?

Posted by on 11 March 2016 - 02:18 PM

It's strange that VAT/GST are supposed to be paid on the increase in value - e.g. the difference in retail and wholesale, but my quick research on EU VAT says that for digital goods this isn't taken into account.

You're right about that. However it's the seller who must increase the value. If the seller didn't, then it is assumed the price already included the raise.

#5280639 Some questions on Data Oriented Design

Posted by on 10 March 2016 - 11:56 PM

It seems that Data Oriented Design is commonly brought up as the antithesis to Object Oriented Design. To me, OOD has always been about encapsulation, and it seems having that here would only be an even greater advantage. To combine the two, couldn't you just shift your thinking from objects as the data, to being views of the data? That way you could reorganize your internal data structures around every day, and as long as your OOP interface remains stable the rest of your code will be perfectly happy.

You're not wrong at this. In fact I always say OOP and DOD are not necessarily opposites. One is about the relationship between constructs called "objects" while the other is about how data is layed out in memory and processed.

However, there is no such language currently that separates the two (in fact, current language makes them almost antagonistic in most cases); and there are several challenges a language would have to solve if it would want to make reconile OOD & DOD; probably would translate in having different trade offs we see now. Nothing is free.

Can anyone here give some advice on how/where the best ways/places to apply this model would be, while still maintaining clarity and sanity in the rest of the code base?

Others have done a good job explaining it so I won't repeat it. Basically boils down to hotspots where performance becomes the highest priority; and not being an idiot. I once optimized the performance of a 3D visualization tool by 3x just by changing some per-vertex data they were using every frame from being stored inside an std::map to an std::vector.


There's one more thing I will add: DOD is applied best to known problems. Problems that are well understood and/or you've already written an implementation. To apply DOD you need to know the data and its usage patterns. To know that, you already need to have had something working to analyze. Or have read it somewhere.

While the general notions can apply to OOP (i.e. keep data contiguous, don't use convoluted indirections, avoid excessive allocations) applying the fun stuff of DOD is usually a bad idea for an algorithm you know little about, have little experience, or is in prototype stages.

#5280179 Selling a game on Steam- Steam and VAT cuts...?

Posted by on 08 March 2016 - 09:39 AM

Like braindigitalis said, VAT is complex and you need to get an accountant. Steam may pay VAT, but you may also have to pay VAT.


Don't add the percentages. A 21% VAT and Valve taking a 30% cut doesn't mean they take 51% and pay you 49%. It doesn't work like that.

If they pay 21% in VAT and take a cut of 30%; all out of 9.99, that means:

9.99 / 1.21 = 8.25619

8.25619 * 30% = Valve takes usd 2.48.

8.25619 * 70% = Valve pays you usd 5.77.


usd 5.77 is the 57% of 9.99; not the 49%.

Note however, you may have to also pay some VAT for those 5.77; but if that's the case that also will probably mean Steam has to pay you more than $5.77 (because something called VAT credit raises. Wikipedia has an example of how it works. Everyone in the chain pays VAT; but it also depends on legislation. i.e. exports to other countries may not be taxed by VAT, but imports are).


That is assuming the final price was usd 9.99, and Steam didn't instead increase the price to usd 12.08 (9.99 * 1.21). I don't know how they operate in that regard.


There's a lot of "if"s involved. Get an accountant.

#5280171 The GPU comes too late for the party...

Posted by on 08 March 2016 - 08:54 AM

> On Optimus systems, the Intel card is always the one hooked to the monitor.
> If you call get() query family for getting timestamp information, you're forcing the NV driver to ask Intel drivers to get information (i.e. have we presented yet?).

Even if it askes the Intel driver, it doesnt seem to slow it down, considering that I submit hundreds of queries per frame. And why should it delay the rendering ?

Because the NV driver needs to wait on the Intel driver. In a perfect world the NV driver would be asynchronous if you're asynchronous as well. But in the real world, where Optimus devices can't get VSync straight, I wouldn't count on that. At all. There's several layers of interface communication (NV stack, DirectX, WDDM, and Intel stack) and a roundtrip is going to be a deep hell where likely a synchronous path is the only one being implemented.


> Edit: Also if it's Windows 7, try disabling Aero. Also check with GPUView (also check this out).

It is not a matter of performance, but a matter of delay and synchronisation. All rendring on the GPU falls more or less exactly in the SwapBuffers call, as if the driver waits too long before submitting the command queue and get eventually forced by calling SwapBuffers.

Aero adds an extra layer of latency. Anyway, not the problem since you're in Windows 10.

I have Windows 10 btw. I looked into GPUView, but it seems to track only DirectX events.

GPUView tracks GPU events such as DMA transfers, page commits, memory eviction, screen presentation. All of which is API agnostic and thus works with DirectX and OpenGL (I've successfully used GPUView with OpenGL apps). IIRC it also supports some DX-only events, but that's not really relevant for an OGL app.

#5280090 The GPU comes too late for the party...

Posted by on 07 March 2016 - 07:33 PM

On Optimus systems, the Intel card is always the one hooked to the monitor.

If you call get() query family for getting timestamp information, you're forcing the NV driver to ask Intel drivers to get information (i.e. have we presented yet?).


Does this problem go away if you never call glQueryCounter, glGetQueryObjectiv & glGetInteger64v(GL_TIMESTAMP)?


Calling glFlush could help you force the NV drivers to start processing sooner, but it's likely the driver will just ignore that call (since it's often abused by dumb programmers).


I also recall NV driver config in their control panel having a Triple Buffer option that only affected OpenGL. Try disabling it.


Edit: Also if it's Windows 7, try disabling Aero. Also check with GPUView (also check this out).

#5279772 [D3D12] What is the benefit of descriptor table ranges?

Posted by on 05 March 2016 - 10:04 PM

In addition to everything Hodgman said, remember DX12 had to support three architectures from NVIDIA, one architecture from AMD, and like three from Intel (and they might have considered some mobile GPUs in their Nokia phones as well).

These cards differed vastly in their texture support. What doesn't make sense in one hw architecture, makes sense in another (Intel & Fermi, I'm looking at youuuuu)

#5279547 Design question: Anti-aliasing and deferred rendering.

Posted by on 04 March 2016 - 03:47 PM

You can't go from a non-MSAA GBuffers & Light passes go to MSAA tonemapping. It just doesn't work like that and makes no sense.


Overview of MSAA is a good introduction on how MSAA works. MSAA resolve filters is also a good read.


You have to start with MSAA GBuffers, avoid resolving them, and resolve the tonemapped result. This is what makes Deferred Renderers + MSAA so damn difficult. SV_Coverage + stencil tricks can help saving bandwidth.

#5279366 Fastest way to draw Quads?

Posted by on 03 March 2016 - 05:53 PM


A wavefront can't work on several instance? InstanceID is stored in a scalar register?

The instanceID is in a vector register on AMD GPUs, so you can have multiple instances within the same wavefront.


InstanceID is in a VGPR, yes. Multiple instances could be in the same wavefront? Probably. Multiple instances are in the same wavefront? Not in practice.

I don't know if there are limitations within the rasterizer (e.g. vertices pulled from a wavefront are assumed to be from the same instance), but simple things like:

Texture2D tex[8];
tex[instanceId].Load( ... );

would cause colossal divergency issues. Analyzing a shader to check for these hazardous details to see if instances should share wavefronts or not is more expensive than just sending each instance to its own group of wavefronts. Being vertex bound is pretty rare nowadays unless you hit a pathological case (such as having 4 vertices per draw)

#5279127 Fastest way to draw Quads?

Posted by on 02 March 2016 - 07:36 PM

Is there a rationale for the performance penalty encountered for small mesh vs drawIndexed method by the way?

GCN's wavefront size is 64.
That means GCN works on 64 vertices at a time.
If you make two DrawPrimitive calls of 32 vertices each, GCN will process them using 2 wavefronts, wasting half of its processing power.

It's actually a bit more complex, as GCN has compute units, and each CU has 4 SIMD units. Each SIMD unit can execute between 1 and 10 wavefronts. There's also some fixed function parts like the rasterizer which may have some overhead when involving small meshes.

Long story short, it's all about load balancing, and small meshes leave a lot of idle space; hence the sweetspot is around 128-256 vertices for NVIDIA, and around 500-600 vertices for AMD (based on benchmarks).

#5278753 Fastest way to draw Quads?

Posted by on 29 February 2016 - 03:38 PM

See Vertex Shader Tricks by Bill Bilodeau regarding point sprites. Hint: it's none of the ways you mentioned.

#5278460 How to find the cause of framedrops

Posted by on 27 February 2016 - 12:50 PM

Hi Oogst!


Learn to use GPUView (also check this out). It's scary at first, but it's an invaluable tool at discovering stutters like the one you're facing.

#5278336 "Modern C++" auto and lambda

Posted by on 26 February 2016 - 12:06 PM

auto x = 7.0; //This is an insane strawman example
auto x = 7.0f; //This is an insane strawman example
auto x = 7.; //This is an insane strawman example
auto x = 7; //This is an insane strawman example
There's absolutely no reason to use auto where


Just to be clear, I had to pull this example because Scott Meyers is literally recommending to use auto everywhere, including literals. And his books are widely read across freshman trying to learn C++.


Besides, you misinterpreted the example. It doesn't have to be literals. I could do the same with:

auto x = time; //This is double
auto x = acceleration; //This is a float
auto x = time * acceleration; //This is...? (will be a double, probably a float is much better fit)
auto x = sizeBytes; //This is an unsigned integer. 64-bits
auto x = lengthBytes; //This is a signed integer

Except now it's not obvious at all. The last one (lengthBytes) can end up inducing undefined behavior. While the first three could cause precision or performance issues because I have no idea if I'm working with doubles or floats unless I spend my effort checking out each variable's type; which obviously the one who wrote it didn't care because he decided to use auto.



for( Foo x : m_foos )

...the exact same flaw would occur.
Further, I would suggest by default one should use const-reference, and only go to non-const reference or value (or r-value) is the code needs it (for example, if it's just integers).
If you use good coding practices, then "auto x : m_foos" stands out just as badly as "Foo x : m_foos".

Yes, but it is far more obvious that it is a hard copy. It's self evident. People who write "for( auto x : m_foos )" most of the time actually expect "auto &x" or didn't care. I mean, isn't the compiler smart enough to understand I meant auto& x and not auto x? Isn't it supposed automatically deduce that? This is the kind of BS. I had to fix in an actual, real C++ project for a client who was having performance problems. Their programmers didn't realize auto x was making deep copies.


Like frob said, that's the fault of people who try to pretend C++ is not a strongly typed language; since in (almost?) every weakly type language out there, "auto x : m_foos" means a reference and not a hard copy.

Obviously making auto a reference by default instead of a hard copy is not the solution. That would create another storm of problems. But perhaps people should stop recommending to use auto everywhere, or stop pretending C++ is not a strongly typed language.