Jump to content
  • Advertisement
Sign in to follow this  
matt77hias

DirectCompute thread groups

This topic is 382 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Is [numthreads(1, GROUP_SIZE, GROUP_SIZE)]

as efficient as [numthreads(GROUP_SIZE, GROUP_SIZE, 1)] ?

CUDA confused me by disabling their z dimension.

Edited by matt77hias

Share this post


Link to post
Share on other sites
Advertisement

Personally i assume there is no hardware for those kind of dimensional thread partitioning at all, and it's just something that should make things easier for us.

I did not look at any ISA output to prove that, but i know that putting the thread ID into a register is faster than constantly reading it from the built in API variable (Vulkan and AMD), so i doupt there are 3 hardware registers holding 3 indices for nothing all the time.

Anyone knows?

 

Share this post


Link to post
Share on other sites

Some GCN hardware have a halfed wave spawn rate if you use the Z dimension, not sure if it is still true or not. GCN again, there is an input vgpr per dimension and no combined one, at least on PS4 ( taken from a compute ISA s14 = s_tgid_x s15 = s_tgid_y v0 = v_thread_id_x v1 = v_thread_id_y ).

 

You could look at the ISA in Pix for AMD to confirm all that on PC.

Edited by galop1n

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!