Compute Shader Invocations

Started by
4 comments, last by Yours3!f 11 years, 6 months ago
Hello,

I am new to OpenGL and currently working on a particle system which makes use of the compute shader. I've got two questions. The first is about the compute shader itself. I create the particles and store them in shader storage buffer so I can access their position in the compute shader. Now I want to create a thread for every particle, which computes its new position. So I dispatch an one dimensional work group.

#define WORK_GROUP_SIZE 128
_shaderManager->useProgram("computeProg");
glDispatchCompute((_numParticles/WORK_GROUP_SIZE), 1, 1);
glMemoryBarrier(GL_SHADER_STORAGE_BARRIER_BIT);

Compute shader:

#version 430
struct particle{
vec4 currentPos;
vec4 oldPos;
};

layout(std430, binding=0) buffer particles{
struct particle p[];
};

layout (local_size_x = 128, local_size_y = 1, local_size_z = 1) in;
void main(){
uint gid = gl_GlobalInvocationID.x;

p[gid].currentPos.x += 100;
}



But somehow not all particles are affected. I am doing this the same way it was done in this example but it doesn't work.
http://education.sig...eShader_6pp.pdf

When I want to render 128.000 particles, then the code above would dispatch 128.000/128=1.000 1-dimensional work groups and each of them would have the size of 128. Doesn't it thus create 128*1.000 = 128.000 threads which execute the code in the compute shader above and thus all particles are affected? Each thread would have a differen ID at gl_GlobalInvocationID.x because all work-groups are 1-dimensional Am I missing something?

My other question is relating to glDrawArrays().
The vertex shader receives all the vertices from the shared-storage-buffer and passes them through to the geometry shader, where I emit 4 particles to create a quad on which I map my texture in the fragment shader. The structure which is stored in the shared-storage-buffer for every particle looks like this:

struct Particle{
glm::vec4 _currPosition;
glm::vec4 _prevPosition;
};

When I draw the scene I do the following:

glBindBuffer(GL_ARRAY_BUFFER, BufferID);
glVertexAttribPointer(0, 4, GL_FLOAT, GL_FALSE, sizeof(glm::vec4), 0);
glEnableVertexAttribArray(0);
glEnableClientState(GL_VERTEX_ARRAY);
glDrawArrays(GL_POINTS, 0, _numParticles*2);
glDisableClientState(GL_VERTEX_ARRAY);
glBindBuffer(GL_ARRAY_BUFFER, 0);


Somehow when I just call glDrawArrays(GL_POINTS, 0, _numParticles) not all particles are rendered. Why does this happen?
I would suggest the number of the vec4-vectors in the particle-struct is the reason but I am not sure. Could somebody explain it please? smile.png

Regards,
StanLee
Advertisement
When I want to render 128.000 particles, then the code above would dispatch 128.000/128=1.000 1-dimensional work groups and each of them would have the size of 128. Doesn't it thus create 128*1.000 = 128.000 threads which execute the code in the compute shader above and thus all particles are affected? Each thread would have a differen ID at gl_GlobalInvocationID.x because all work-groups are 1-dimensional Am I missing something?[/quote]
well, the way you'd usually want to do this is exploiting the GPU's Local Data Sharing capability (LDS), this means that you can efficiently work in each working group. In case of GPU's, when I used OpenCL the best work group size (when working with images) was about 16x16x1. This is because each work group has limited amount of memory (about 4KB I think, but this may vary). The max size of work groups was 256 in every direction (x, y, z), but you had to pay attention that your global work size had to be divisible by the local work group sizes.
So to answer your question the GPU does not dispatch that many threads (because it doesn't have that many compute cores), however it will dispatch some, and then it will work it's way gradually through the data.
In this case your local work group size IMO should be 256 to take maximum advantage.
Your global work group size should be it's multiple ie. 500 x 256 = 128000
this way your local work group id would be 0...255, 0, 0
and your global work group id should be 0...127999, 0, 0

so I believe changing this:
glDispatchCompute((_numParticles/WORK_GROUP_SIZE), 1, 1);
to this would solve the issue:
glDispatchCompute(_numParticles, 1, 1);

I'm not sure about the glDrawArrays issue.
Thanks for your reply! :)

I've tested your suggestion but unfortunately nothing moves at all. :/

So I decided to update all positions in only one thread just for testing purpose. I changed the compute shader:

layout (local_size_x = 1, local_size_y = 1, local_size_z = 1) in;
void main(){
uint gid = gl_GlobalInvocationID.x;

if(gid == 0){
for(int i = 0; i < maxParticles; ++i){
p.currentPos.x += 100;
}
}
}

And called:

glDispatchCompute(1, 1, 1);


But somehow the same problem occurs and not all particles are moving. I just don't get it. Is there somewhere a good documentation about the new compute shader in OpenGL?

Regards Stan
There are the specs. It's such a new feature that the only drivers i know about (Nvidia) are still "very beta". I have two examples here: https://github.com/p...er/experimental which also are particle systems. They worked on the first version of the 4.3 driver i havent tested them on the current one.
oh well :)

I cant really help you beyond this point as I dont have a nvidia video card. But as japro said try the specs they're usually very helpful.
You can also try to implement the whole thing on the cpu side in c++, then try moving gradually to the gpu. Meaning try moving 1 particle, then 2, then all of them.
If nothing works you still have an option to use OpenCL until stable drivers and programming examples, tutorials etc. come out.
just found this today, take a look at it smile.png
source code in description
[media]
[/media]

This topic is closed to new replies.

Advertisement