• Advertisement
Sign in to follow this  

What are ROPs & TMUs?

This topic is 3032 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hey guys, I just trying to understand GPU specs, for which I need to know what are ROPs (Render Output Unit) and TMUs (Texture Memory Units). Someone can give me a quick overview of their purpose? Also the Texture Filtering Units and Texture Addressing Units are related to TMUs? I read the wikipedia articles, but I am still not sure about their role in the graphics pipeline (being a shader programmer I know it from a higher level perspective). Thanks!

Share this post


Link to post
Share on other sites
Advertisement
The ROP is the unit that performs the final procedure in displaying your screen. When pixels are computed, for example, via shader, they must be blended according to their associated depth and associated blend functions. It's the ROP's job to perform this blending and then place the pixel value at its location "on the screen" (which is actually just a place in memory effectively).

TMUs I know about are Texture Mapping Units, and they're fairly old technology. I can't find anything on Texture Memory Units. I can tell you that nowadays textures are loaded into GPU memory instead of RAM. Otherwise, I'm not sure what you're asking about.

Share this post


Link to post
Share on other sites
Texture Mapping Units are those which fetch a colour for you when you sample a texture at a given position. TMUs do the conversion from the texture's pixel format to float4, perform the filtering with neighbouring texels according to the sampler settings, and possibly do special functions like the native shadow mapping depth comparision. Each of them also has a cache on its own to speedup consecutive accesses to nearby positions, as far as I know,

Share this post


Link to post
Share on other sites
Quote:
Original post by dbzprogrammer
The ROP is the unit that performs the final procedure in displaying your screen. When pixels are computed, for example, via shader, they must be blended according to their associated depth and associated blend functions. It's the ROP's job to perform this blending and then place the pixel value at its location "on the screen" (which is actually just a place in memory effectively).


But then, something that I don't understand is what is the purpose of having more Pixel Shader Units that ROPs, as you won't be able to write the results because the ROPs are going to be a bottleneck. For example if you have 24 pixel shaders (in a non unified architecture) and 8 ROPs, you can produce 24 pixels per clock, but only write 8, so the pixel shaders will need to wait for the ROPs to became available. What I am missing?

Share this post


Link to post
Share on other sites
Quote:
Original post by Schrompf
Texture Mapping Units are those which fetch a colour for you when you sample a texture at a given position. TMUs do the conversion from the texture's pixel format to float4, perform the filtering with neighbouring texels according to the sampler settings, and possibly do special functions like the native shadow mapping depth comparision. Each of them also has a cache on its own to speedup consecutive accesses to nearby positions, as far as I know,


Ok, thanks!

From Xenos GPU wikipedia page:
Quote:
16 texture filtering units (TF) and 16 texture Addressing unit (TA)
16 filtered samples per clock
Maximum texel fillrate: 8 gigatexel per second (16 textures x 500 MHz)
16 unfiltered texture samples per clock


Maybe then TFs and TAs are just specialiced TMUs? Maybe TFs are the ones able to perform bilinear, trilinear and anisotropic reads, and TAs the ones that perform unfiltered reads (POINT,NEAREST).

Share this post


Link to post
Share on other sites
Quote:
Original post by IrYoKu1
But then, something that I don't understand is what is the purpose of having more Pixel Shader Units that ROPs


Because the most complex operation ROPs can do is read 'A' from target output, read B'' from shaders output, and compute (A * D + B * E) (where D & E may be any value, i.e. alpha blending), and the most simple operation they can do is not to do anything (pixel was discarded), second to that comes writing "B", that is just writing what the shader computed.
In other words, ROP's operations aren't complex. One or two additions, and two multiplications as maximum.

In the other hand, pixel shaders units can be VERY busy because there's nearly no limit to what they can do (usually around 30-50 instructions, many texture lookups, and a couple of special instructions like log pow or sqrt).

Suppose the following example:
* 16 Shader units (let's call it SU) & 8 ROPs
* ROPs are idle. SU are computing lots of square roots, additions, multiplications.
* The 16 SUs finished their job. Go to next 16 pixels. ROPs start their job
* 16 SUs are working on pixels 16...31, ROPs are working on pixels 0...7
* 16 SUs are working on pixels 16...31, ROPs are working on pixels 8...15
* ROPs are idle again, 16 SUs still busy
* The 16 SUs finished their job. Go to next 16 pixels. ROPs start their job again.

See? You have 16 SUs, 8 ROPs and yet the bottleneck is in the SUs.
Ideally, the shader operations are balanced enough so that as soon as the SUs are done, the ROPs just finished with the last group of pixels; so that all components are never idle.

The reverse situation would happen when your Pixel shader is completely trivial, something like returning a constant colour, doing a texture lookup without applying lighting.

Cheers

Share this post


Link to post
Share on other sites
Quote:
Original post by IrYoKu1
Maybe TFs are the ones able to perform bilinear, trilinear and anisotropic reads, and TAs the ones that perform unfiltered reads (POINT,NEAREST).


No. TFs perform the bi/tri-linear, anisotropic or none filtering for you, so that you don't have to read multiple adjacent pixels from a texture and weight (filtering) in the shader units.

Texturing addressing units are in charge of mapping texels to pixels.
For example, you submit tex2D( mySampler, float2( 0.5, 0.5 ) ) but if the texture resolution is 512x512, what you intended is to read the pixel at pos (255, 255) (or 256, 256 depending on the alignment)
They also are in change of performing the texture addressing mode you asked for: Wrap, clamp, or border.
Otherwise, before reading in the pixel shader you would have to do something like this:


//Wrap
UV = ((UV * textureResolution) % textureResolution) / textureResolution;
tex2D( mySampler, UV );

//Clamp
UV = max( min( UV, 1.0f ), 0.0f );
tex2D( mySampler, UV );

//Border
if( UV > 1.0f || UV < 0.0f )
color = borderColor;
else
color = tex2D( mySampler, UV );



Those precious (and ugly) operations are done automatically for you in dedicated, specialized hardware.
Same happens with filtering.

Cheers
Dark Sylinc

Share this post


Link to post
Share on other sites
So, in that case TF units and TMUs are just different names for the same thing, right?

(taking into account Schrompf definition of TMU:
Quote:

Texture Mapping Units are those which fetch a colour for you when you sample a texture at a given position. TMUs do the conversion from the texture's pixel format to float4, perform the filtering with neighbouring texels according to the sampler settings, and possibly do special functions like the native shadow mapping depth comparision. Each of them also has a cache on its own to speedup consecutive accesses to nearby positions, as far as I know,

)

[Edited by - IrYoKu1 on October 30, 2009 4:41:52 PM]

Share this post


Link to post
Share on other sites
Just found that TMU=TF+TA

Quote:

Each TMU contains 4 TA (Texture
Addressing) and 8 TF (Texture Filtering)
• TA:TF = 1:2, allowing full-speed FP16 bilinear
filtering and free 2xAF


From this slides:
http://http.download.nvidia.com/developer/cuda/seminar/TDCI_Arch.pdf

If somebody knows of a document that explains all this architecture stuff, please post it...

Share this post


Link to post
Share on other sites
Texture Addressing:
http://msdn.microsoft.com/en-us/library/ee422486(VS.85).aspx

The point is that a filterung unit is only able to compute one bilinear sample per cycle. If you are using trilinear or AF the filtering units are looping a few times to generate the sample and the TAUs are idling. So the IHVs started to use less TAUs than TFUs because most texture samples are not only bilinear anymore. With TAU/TFU = 0.5 you get trilinear or 2x bilinear AF for free (1 TAU cycle). 2x trilinear AF still needs 2 TAU cycles.

But the TAUs are getting cheaper compared to the TFUs (new compression modes etc) and the shaders nowadays are using a lot of data that should not be filtered. So the last generations are using again a TAU/TFU = 1.0.

Share this post


Link to post
Share on other sites
Quote:
Original post by CV
The point is that a filterung unit is only able to compute one bilinear sample per cycle. If you are using trilinear or AF the filtering units are looping a few times to generate the sample and the TAUs are idling. So the IHVs started to use less TAUs than TFUs because most texture samples are not only bilinear anymore. With TAU/TFU = 0.5 you get trilinear or 2x bilinear AF for free (1 TAU cycle). 2x trilinear AF still needs 2 TAU cycles.

But the TAUs are getting cheaper compared to the TFUs (new compression modes etc) and the shaders nowadays are using a lot of data that should not be filtered. So the last generations are using again a TAU/TFU = 1.0.


How you learned this stuff? Can you recommend me a book/url to learn this low-level GPU architecture/organization details? The closest thing I found is a chapter in GPU Gems 2 that talks about the architecture of the GeForce 6X generation.

Thanks!

Share this post


Link to post
Share on other sites
Quote:
Original post by IrYoKu1
How you learned this stuff? Can you recommend me a book/url to learn this low-level GPU architecture/organization details? The closest thing I found is a chapter in GPU Gems 2 that talks about the architecture of the GeForce 6X generation.
Thanks!


There's no "here's our secrets" book simply because there's a lot of NDA and trade secrets there. You know the picture, but you don't exactly know the internals of each video card unless you work at NVIDIA or ATI.

With that said, you can always read the tips each HW vendor makes, which tells you a lot of how they work (sometimes you just figure out with a little trial & error and putting the pieces together).
Also reading old stuff helps a lot. "TMUs" were very new in fixed function DX7 hardware, where it was cutting edge being able to blend two or three textures in a single pass because the "card has multiple texturing units". There used to be a lot of specific HW designed for each task, but they had a problem: they weren't programmable, not flexible.
Now with shaders everything's integrated and borders between different HW components are less blurry. Even shaders limits are blurry. Once we talked about "pixel shader units" & "vertex shader units" but today gpus come with "unified shaders" which are able to perform both stuff.

Useful links which can explain how hardware works (even if it doesn't directly talk about HW):
NVIDIA GPU
Programming Guide

Depth in
depth

How MSAA works
ATI explaining how dynamic flow control works in their GPU cards.

Oh, and you may have noticed, visiting NVIDIA's and ATI's developer sites and reading all their papers helps, a lot.

Cheers
Dark Sylinc

Share this post


Link to post
Share on other sites
Quote:
Original post by IrYoKu1How you learned this stuff?

To be honest... I don't know.
I was always interested in new architectures (cpu/gpu) and tried to understand the design decisions. Mostly it's just a hobby. (having access to all console sdk's might help a bit sometimes ;) )

Newer architectures are a lot more complicated, so it's a good idea to start with older ones.

Here are some more resources:
anandtech.com (e.g. http://www.anandtech.com/video/showdoc.aspx?i=2870)
ixbtlabs.com
chip-architect.org (old cpu architectures)
Beyond3d forum
3DCenter forum (if you understand german)
CUDA SDK documentation
AMD STREAM SDK documentation
Conference Papers/Presentations
hardware design lectures at university



Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement