Jump to content
  • Advertisement
Sign in to follow this  
StanLee

Computing Matrices on the GPU

This topic is 772 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi,

 

I'm in the situation that I need to create a lot of model-view-projection (MVP) matrices (around 100 and more) out of a normal and position vector which are stored in a texture.

I already implemented it in a way that I download these two textures and then create the matrices on the client side, but this is very very slow, as predicted. But the impact on the performance is really too big.

 

How to do this entirely on the GPU? I don't even need the matrices on the client side. They are later used for rendering.

 

My idea so far is to use a SSBO (Shader Storage Buffer Object) to store my matrices. I take every sample point on my texture (which corresponds to one matrix) and put it into a VBO which is then rendered to a screenquad. This should result in a fragment shader thread for every sample point. Then I sample the position and the normal from my textures in the fragment shader, construct a MVP matrix and store it in the SSBO. To get the right index for the SSBO every sample point is provided with its corresponding SSBO-index to the vertex shader. 

 

The question is: Is this the fastest way possible to generate the matrices? Are there other faster possibilities? I also thought about using 2D Textures to store the matrices, but this would imply 4 texture accesses later to get one matrix. (considering a 4 channel texture) But I don't know if this is faster than using SSBO's.

 

Regards,

Stan

Share this post


Link to post
Share on other sites
Advertisement

I am using the version 4.5. Unfortunately the proprietary framework I am working with does not support compute shaders yet. 

Share this post


Link to post
Share on other sites

Why not compute the matrices in the vertex shader of whatever you're rendering with the set of them?

 

Pass (handwaving) a matrix number as a vertex attribute and use that to create the MVP needed to transform the vertex.

 

Most recent hardware can access textures in the vertex shader -- there's a value to read to see if its possible on the hardware in play.

Share this post


Link to post
Share on other sites

But wouldn't this mean two texture accesses (normal and position texture) plus matrix construction per vertex thread? I have models with a vertex count of 200.000 and more.

It would surprise me if this were faster than my proposed method. 

Share this post


Link to post
Share on other sites

I don't have experience doing this on the GPU but are you currently threading your work on the CPU. 100 objects doesn't seem like much.

Share this post


Link to post
Share on other sites

I don't have experience doing this on the GPU but are you currently threading your work on the CPU. 100 objects doesn't seem like much.

 

I'd suggest that threading isn't even necessary here.  This really reads a lot like a misguided attempt to save memory by just storing 6 float sper object rather than all 16 of a 4x4 matrix, and that in this case it may very well be a more useful optimization to just burn the extra memory in exchange for more efficient ALU.  100 objects is, quite frankly, chickenfeed: 1996 class hardware could deal with that easily enough.

Share this post


Link to post
Share on other sites

 

I don't have experience doing this on the GPU but are you currently threading your work on the CPU. 100 objects doesn't seem like much.

 

I'd suggest that threading isn't even necessary here.  This really reads a lot like a misguided attempt to save memory by just storing 6 float sper object rather than all 16 of a 4x4 matrix, and that in this case it may very well be a more useful optimization to just burn the extra memory in exchange for more efficient ALU.  100 objects is, quite frankly, chickenfeed: 1996 class hardware could deal with that easily enough.

 

It's not about saving memory. As I have stated I need to create MVP matrices based on normals and positions which are stored on two different textures. So doing this on the CPU implies downloading the textures from server side, extracting the position and normal and then computing the matrices, which are then send to the server again. I am already doing this, but downloading textures every frame is too much of a performance killer. 

Share this post


Link to post
Share on other sites

 

It's not about saving memory. As I have stated I need to create MVP matrices based on normals and positions which are stored on two different textures

 

Explain what you mean. What are these 2 textures? I assume they are dynamic since you said memory isn't an issue, otherwise you would just run through and create MVP matrices once.  So I assume because of that, you mean these are 100 dynamic objects and I assume because you are using textures that these are on the GPU and you are updating the positions and normals each frame on the GPU?

 

So I ask what is the end goal, what technique are you doing (GPU particles with objects)...? 

 

 

I already implemented it in a way that I download these two textures and then create the matrices on the client side, but this is very very slow, as predicted. But the impact on the performance is really too big.

 

extracting the position and normal and then computing the matrices, which are then send to the server again.

 

What exactly are you trying to do? So far I think you are approaching something the wrong way, because you are talking about 2 matrix multiplies per object (64 ops per matrix multiply? or so) 100 objects = 12,000 instructions.

 

Also, if you are reading textures from the GPU, the CPU will stall. You need to look at pixel buffer objects.

Edited by dpadam450

Share this post


Link to post
Share on other sites

Also,

 

 

I have models with a vertex count of 200.000 and more.

 

What is your product? 200K is absurd. You have many other issues right now with the GPU. You are probably drawing such tiny triangles that your GPU is basically dying. If that 200K object isn't like an entire city block and is something say the size of a car, then your triangle density is causing your GPU to lose a ton of performance.

Edited by dpadam450

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!