Sign in to follow this  

Pixel processing, lock-step and pixel sleeping

This topic is 4093 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I've been reading some of these pixel processing threads, and have a few questions. 1) If nVidia's region size is something like 64x64 (this is unverified right?), that means that 4096 pixels can be in flight at once? And there's only one region going through the pipes at a time? 2) So these pixels are all being processed by the same shader and are in lock-step (instruction pointer at the same place?). Also, does lock-step mean one instruction is processed for a pixel, then it moves onto the next pixel? Or can several instructions execute first? 3) 1 pixel taking a different branch would mean all 4096 have to execute both branches? 4) If a pixel needs to wait for texture access, it goes to sleep. All pixels in the region will eventually need to receive their texel before the entire region can go to the next instruction? (Cause of lock-step) Sorry if these seem basic, but these answers will definitely help clear things up for me.

Share this post


Link to post
Share on other sites
I'll take a stab at some of these, although I'm certainly no architecture expert.

Quote:
Original post by Unfadable
1) If nVidia's region size is something like 64x64 (this is unverified right?), that means that 4096 pixels can be in flight at once? And there's only one region going through the pipes at a time?

64x64 is pretty close to the actual amount on current cards. In any case having an exact number is much less important than knowing the general branching effiency possible on various architectures (for choosing implementations, etc).

Quote:
Original post by Unfadable
2) So these pixels are all being processed by the same shader and are in lock-step (instruction pointer at the same place?). Also, does lock-step mean one instruction is processed for a pixel, then it moves onto the next pixel? Or can several instructions execute first?

It is a data-parallel architecture - i.e. N processors are doing the same instruction on N different pieces of data at the same time. Thus if control paths differ, they must be pushed onto a stack and revisitted later.

Quote:
Original post by Unfadable
3) 1 pixel taking a different branch would mean all 4096 have to execute both branches?

Effectively yes. This will be somewhat architecture dependent, but think of it that way.

Quote:
Original post by Unfadable
4) If a pixel needs to wait for texture access, it goes to sleep. All pixels in the region will eventually need to receive their texel before the entire region can go to the next instruction? (Cause of lock-step)

The large granularity of NVIDIA's current cards is somewhat related to how they handle texture read latency hiding by my understanding. There are always a big number of pixels kept "in flight" so that latency can be hidden effectively. That said this will still operate on blocks of pixels... whether or not it is 64x64 I don't know, but the concept is what is important here.

Please note that ATI's architecture works in a similar way, it just has a smaller granularity. Also they use a ring bus memory controller to manage texture accesses and latency hiding.

There was a good post recently where Eric Lengel was talking about some of this with respect to reverse engineering the NVIDIA command buffer... probably useful to look up for more information.

Share this post


Link to post
Share on other sites

This topic is 4093 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this