nVidia's bindless graphics

Started by
6 comments, last by Aks9 12 years, 10 months ago
Linky.

The principle is simple: offer a direct C-style pointer dereferencing system inside the shader for immediate data access that requires only a one-time setup and some rare runtime configuration, effectively getting rid of all bind functions, which often fall inside tight loops and result in increasingly frequent cache misses. This also relieves the limitation on the number of objects accessible from a shader by directly uncapping it.

Neat stuff. Only problem is, it's nVidia-specific.

I do use nVidia (exclusively even) and I respect their position in the industry more than that of ATI's, but I'm not an nVidia fanboy and, even though CUDA and bindless graphics are fantastic things in their own right, I will not give up compatibility with ATI to write shaders that I have no idea how or if they can be ported later on. At least that is my initial reaction. However, bindless graphics does promise up to 7.5x speed increase (the truth is somewhat closer to 2-2.5x in most less specific situations), which is actually pretty fantastic. I'm not ready to forfeit my current approach and sell my soul to a single company, but I did find this pretty compelling and would like some input on what you guys think what this may result in (IMO the obvious solution is an increasing "C-ification", which will most likely expand to sheer C++ features in shaders at one point, gradually blurring the gap between the two; seeing as shaders were born from assembly, like programming in general, this doesn't seem like an overly weird outcome; however, another question is when will ARB get around to this - the article is from 2009 and today I'm finding it hard to locate any information on ATI's or ARB's stance on the subject).
Advertisement
There's a 2 year old thread on it here, but not much discussion.

Seeing that 2 years later, it's not often mentioned (except in the context of "remember that neat nVidia extension with pointer syntax?"), I'm guessing it's not seen very wide adoption.

It's essentially a way to make OGL drivers simpler, by pulling typically driver-owned work out into the application layer instead. Going from history, simplifying OGL drivers doesn't seem to be high on the ARB's agenda.

Also, note that those 7x speed-up figures are in the CPU-side cost of setting up draw calls.
The scene that I'm working on at the moment only spends about 1ms of CPU time submitting draw-calls to the driver (~3% of the total CPU frame time), so it's not something I really care about optimising further.
I played around with this some time ago. Interesting idea, but since it depends quite a bit on a particular GPU memory architecture, this is a nightmare to push to ARB status (read: won't happen).

I am in a similar situation as Hodgman, the performance improvements I got on a quick real-world test were close to zero, which basically stopped any further consideration of an NV-only path right there. It aims to solve a bottleneck which doesn't exist in my engine (I'm heavily shader limited). YMMV.
I was very excited when I saw Bindless for the first time. I thought it was the greatest extension in modern OpenGL era.

Since the application I developed at that time was CPU bound with thousands of GL draw calls per frame, Bindless looks like a salvation. Of course, the speedup was up to 2x (not 7.5x, of course; it was a synthetic benchmark), but it was great. Being so excited, I started to use Binless in every possible situation, and I was wandering why it wasn't implemented even broader. For example, there should be a version of glBindBufferBase() function that accepts uint64EXT address instead of buffer ID. That was the only function in my code dealt with VBO IDs (so I had to track both addresses and IDs just because of TF). I asked NV development team to broad vertex_buffer_unified_memory extension with just one function in order to have "a complete solution", but without success. They decided to keep the scope rather limited, and stuck to initial version. It was a while ago, and since now nothing has been changed.

In short:

- I think bindless is a great concept, and I hope that in some future version of OpenGL binding will totally disappear.

- If you are confined to NV hardware you have nothing to loose if you are using Bindles; furthermore, you'll benefit if your application is CPU bound.

- I think neither AMD nor ARB are willing to accept Bindless in a foreseeable future

- Furthermore, even NV is stuck at initial version (yes, there are some minor changes since April 2009, but some functions still missing)

- In any case, I'll continue to use Bindless in my research applications (but not in commercial unless they are targeting customers that also can be forced to buy NV hardware, or a hardware is included in a complete solution :cool: ).


Neat stuff. Only problem is, it's nVidia-specific.


Like the others have said, it will never make it to ATI/AMD or ARB.

And if you do like the extension, it is up to you if you use it or not. You just need different paths in your renderer to deal with the case where the extension doesn't exist. If you don't want to write different code paths, that's also up to you.
Sig: http://glhlib.sourceforge.net
an open source GLU replacement library. Much more modern than GLU.
float matrix[16], inverse_matrix[16];
glhLoadIdentityf2(matrix);
glhTranslatef2(matrix, 0.0, 0.0, 5.0);
glhRotateAboutXf2(matrix, angleInRadians);
glhScalef2(matrix, 1.0, 1.0, -1.0);
glhQuickInvertMatrixf2(matrix, inverse_matrix);
glUniformMatrix4fv(uniformLocation1, 1, FALSE, matrix);
glUniformMatrix4fv(uniformLocation2, 1, FALSE, inverse_matrix);

I hope that in some future version of OpenGL binding will totally disappear.


That was the hope/plan with Longs Peak from about 3 or 4 years ago.. as you can probably tell that didn't happen and I honestly wouldn't pin my hopes on it happening in the next 5 years either...
I personally thought the whole idea is much more useful for direct data access, rather than binding. Nowadays, the bind count vs. rastered pixel ratio has become rather steep, which diminishes the relative number of bind calls considerably, especially with the use of things like VAOs, grouped instancing, etc. However, like I think Hodgman mentioned in the thread he linked above, the ability to pack arbitrary data into a stream and have essentially unhindered memory access on an ARB level would be a huge benefit and, IMO most importantly, would severely simplify many aspects of shader design.

I personally thought the whole idea is much more useful for direct data access, rather than binding.


That's right! Having a direct access enables writing shaders in more CPU like function manner.


But you'll soon discover that pointer based glBindBufferBase() function is missing if you try to use transform feedback (I have used TF along with Bindless to simulate missing computation shader). Of course, you could use shader_buffer_store, but it is not part of Bindless pair (shader_buffer_load & vertex_buffer_unified_memory), and requires SM5 hardware.

Indeed, those three extensions are very useful, and I hope that NVIDIA will broaden their specifications.

This topic is closed to new replies.

Advertisement