Sign in to follow this  
AvCol

Fixed linked list buffer size defeating the purpose?

Recommended Posts

AvCol    102
I was looking up depth peeling type algorithms for rendering shadow volumes in fog and then noticed that this is really the same sort of thing you do for OIT. So I had a closer look at AMD's slides from last year's GDC explaining what they did in the mecha demo.

The thing I don't get about it is that when you are creating a buffer resource in direct x you have to declare a size to the buffer right? Sure enough in AMD's slides they say when declaring the UAV to store the linked list nodes the buffer "must be large enough to store all fragments".

But isn't the whole point of the linked list so that it *doesn't* use a fixed amount of memory? So why use linked lists if they use a fixed amount of memory anyway?

Share this post


Link to post
Share on other sites
SHilbert    652
[quote name='AvCol' timestamp='1305636812' post='4811917']
I was looking up depth peeling type algorithms for rendering shadow volumes in fog and then noticed that this is really the same sort of thing you do for OIT. So I had a closer look at AMD's slides from last year's GDC explaining what they did in the mecha demo.

The thing I don't get about it is that when you are creating a buffer resource in direct x you have to declare a size to the buffer right? Sure enough in AMD's slides they say when declaring the UAV to store the linked list nodes the buffer "must be large enough to store all fragments".

But isn't the whole point of the linked list so that it *doesn't* use a fixed amount of memory? So why use linked lists if they use a fixed amount of memory anyway?
[/quote]

I don't know the details of their particular use case, but I would guess they are using a linked list because you can do constant time insertions or deletions anywhere in the list.

Share this post


Link to post
Share on other sites
AvCol    102
[quote name='SHilbert' timestamp='1305654079' post='4812006']
I don't know the details of their particular use case, but I would guess they are using a linked list because you can do constant time insertions or deletions anywhere in the list.
[/quote]

Well the thing is the GPU has no allocator so something like deletions is impossible without quickly running out of memory. Insertions are possible but jumping from node to node on a gpu is far too slow. This is why when sorting each pixel's list they copy it over to a temp array first.

Their implementation is thus: They have one continuous block of memory as an HLSL structured buffer that stores some structure plus an index into that buffer. They then have a second buffer which simply stores an index for each pixel into the first buffer. I will call these the linked list buffer and the pointer buffer. Their allocator if it can be called that just points to the next unoccupied space in the linked list buffer (i.e it is just one atomic counter). Whenever a new fragment is drawn (anywhere on the screen) it needs to be added to the linked list buffer. This is done in three steps 1) Set the new node's link (the new node is at linked_list_buffer[allocator_counter] ) to the value in the pointer buffer for this pixel 2) Set the pointer buffer for this pixel to the allocator's counter 3) Increment the allocator counter.

As you can see their allocator can not track where free memory is so deletions would make memory run out very quickly. Besides, as I mentioned earlier jumping from node to node and modfiying their links is too slow on the GPU so they recommend to never do this.

What they are using it for is an A-buffer. They could have just as easily declared a fixed 32 floats per pixel but opted to use a linked list like structure instead, and I am wondering why. Originally I thought it was because they were relying on the fact that a linked list would use less memory 99% of the time as not each pixel has 32 triangles drawn on it every frame, but then I found out you had to declare a fixed amount of memory anyway,

In fact, you have to reserve more memory to get the same amount of accuracy cause you have to store the link, not to mention that each pixel needs its own start node pointer (just an index into the big pre-allocatied buffer).

Now I doubt that that's the whole story cause the guys at AMD are probably pretty smart, I am just wondering what they know that I do not.I don't see any advantage to using a gpu linked list.

Share this post


Link to post
Share on other sites
SHilbert    652
[quote name='AvCol' timestamp='1305669796' post='4812120']
[quote name='SHilbert' timestamp='1305654079' post='4812006']
I don't know the details of their particular use case, but I would guess they are using a linked list because you can do constant time insertions or deletions anywhere in the list.
[/quote]

Well the thing is the GPU has no allocator so something like deletions is impossible without quickly running out of memory. Insertions are possible but jumping from node to node on a gpu is far too slow. This is why when sorting each pixel's list they copy it over to a temp array first.

Their implementation is thus: They have one continuous block of memory as an HLSL structured buffer that stores some structure plus an index into that buffer. They then have a second buffer which simply stores an index for each pixel into the first buffer. I will call these the linked list buffer and the pointer buffer. Their allocator if it can be called that just points to the next unoccupied space in the linked list buffer (i.e it is just one atomic counter). Whenever a new fragment is drawn (anywhere on the screen) it needs to be added to the linked list buffer. This is done in three steps 1) Set the new node's link (the new node is at linked_list_buffer[allocator_counter] ) to the value in the pointer buffer for this pixel 2) Set the pointer buffer for this pixel to the allocator's counter 3) Increment the allocator counter.

As you can see their allocator can not track where free memory is so deletions would make memory run out very quickly. Besides, as I mentioned earlier jumping from node to node and modfiying their links is too slow on the GPU so they recommend to never do this.

What they are using it for is an A-buffer. They could have just as easily declared a fixed 32 floats per pixel but opted to use a linked list like structure instead, and I am wondering why. Originally I thought it was because they were relying on the fact that a linked list would use less memory 99% of the time as not each pixel has 32 triangles drawn on it every frame, but then I found out you had to declare a fixed amount of memory anyway,

In fact, you have to reserve more memory to get the same amount of accuracy cause you have to store the link, not to mention that each pixel needs its own start node pointer (just an index into the big pre-allocatied buffer).

Now I doubt that that's the whole story cause the guys at AMD are probably pretty smart, I am just wondering what they know that I do not.I don't see any advantage to using a gpu linked list.
[/quote]

I'd be interested in a link to these slides, if they're publically available -- it sounds like an interesting topic. I could probably get them myself from the GDC Vault if I could manage to look up my password. Having not seen them, though, my guesses are:
[list=1][*]There may still be less RAM used (or at least touched per frame) overall anyway, because it may be the case that most pixels need far less than 32 linked list nodes (although you'd think they'd still have to prepare for the worst case!);[*]It may be more cache friendly, because if you store 32 slots per pixel, one after another, and only 20% are used, you have a lot of unused RAM in between your used RAM. Also, with fixed buffers your writes end up splayed all over the place, instead of one after the other; although linked lists display this behavior when reading back.[/list]
I'm putting my bets on it being more compact or cache efficient for their particular case, because it does seem like an odd structure to use if they didn't have a good reason.

Share this post


Link to post
Share on other sites
AvCol    102
[quote]
There may still be less RAM used (or at least touched per frame) overall anyway, because it may be the case that most pixels need far less than 32 linked list nodes (although you'd think they'd still have to prepare for the worst case!);
[/quote]

I was thinking this too, and I share the exact same complaint. To what I mean to use this algorithm for (light volumes or volumetric shadows if you will) it would look very silly if all of a sudden some pixels started being lit when they shouldn't be. I can see how OIT would be more forgiving to such missing information though.

You're right about increased cache coherency during the writing stage too, I hadn't considered that.

Here is the direct link to the slides:

[url="http://developer.amd.com/gpu_assets/OIT%20and%20Indirect%20Illumination%20using%20DX11%20Linked%20Lists_forweb.ppsx"]http://developer.amd.com/gpu_assets/OIT%20and%20Indirect%20Illumination%20using%20DX11%20Linked%20Lists_forweb.ppsx

[/url]You can find more of AMD's stuff on their conferences page

[url="http://developer.amd.com/documentation/presentations/Pages/default.aspx"]http://developer.amd...es/default.aspx[/url]

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this