What are ROPs & TMUs?

Started by
13 comments, last by IrYoKu1 14 years, 5 months ago
Hey guys, I just trying to understand GPU specs, for which I need to know what are ROPs (Render Output Unit) and TMUs (Texture Memory Units). Someone can give me a quick overview of their purpose? Also the Texture Filtering Units and Texture Addressing Units are related to TMUs? I read the wikipedia articles, but I am still not sure about their role in the graphics pipeline (being a shader programmer I know it from a higher level perspective). Thanks!
Advertisement
The ROP is the unit that performs the final procedure in displaying your screen. When pixels are computed, for example, via shader, they must be blended according to their associated depth and associated blend functions. It's the ROP's job to perform this blending and then place the pixel value at its location "on the screen" (which is actually just a place in memory effectively).

TMUs I know about are Texture Mapping Units, and they're fairly old technology. I can't find anything on Texture Memory Units. I can tell you that nowadays textures are loaded into GPU memory instead of RAM. Otherwise, I'm not sure what you're asking about.
We should do this the Microsoft way: "WAHOOOO!!! IT COMPILES! SHIP IT!"
Texture Mapping Units are those which fetch a colour for you when you sample a texture at a given position. TMUs do the conversion from the texture's pixel format to float4, perform the filtering with neighbouring texels according to the sampler settings, and possibly do special functions like the native shadow mapping depth comparision. Each of them also has a cache on its own to speedup consecutive accesses to nearby positions, as far as I know,
----------
Gonna try that "Indie" stuff I keep hearing about. Let's start with Splatter.
Quote:Original post by dbzprogrammer
The ROP is the unit that performs the final procedure in displaying your screen. When pixels are computed, for example, via shader, they must be blended according to their associated depth and associated blend functions. It's the ROP's job to perform this blending and then place the pixel value at its location "on the screen" (which is actually just a place in memory effectively).


But then, something that I don't understand is what is the purpose of having more Pixel Shader Units that ROPs, as you won't be able to write the results because the ROPs are going to be a bottleneck. For example if you have 24 pixel shaders (in a non unified architecture) and 8 ROPs, you can produce 24 pixels per clock, but only write 8, so the pixel shaders will need to wait for the ROPs to became available. What I am missing?

Quote:Original post by Schrompf
Texture Mapping Units are those which fetch a colour for you when you sample a texture at a given position. TMUs do the conversion from the texture's pixel format to float4, perform the filtering with neighbouring texels according to the sampler settings, and possibly do special functions like the native shadow mapping depth comparision. Each of them also has a cache on its own to speedup consecutive accesses to nearby positions, as far as I know,


Ok, thanks!

From Xenos GPU wikipedia page:
Quote:16 texture filtering units (TF) and 16 texture Addressing unit (TA)
16 filtered samples per clock
Maximum texel fillrate: 8 gigatexel per second (16 textures x 500 MHz)
16 unfiltered texture samples per clock


Maybe then TFs and TAs are just specialiced TMUs? Maybe TFs are the ones able to perform bilinear, trilinear and anisotropic reads, and TAs the ones that perform unfiltered reads (POINT,NEAREST).
Quote:Original post by IrYoKu1
But then, something that I don't understand is what is the purpose of having more Pixel Shader Units that ROPs


Because the most complex operation ROPs can do is read 'A' from target output, read B'' from shaders output, and compute (A * D + B * E) (where D & E may be any value, i.e. alpha blending), and the most simple operation they can do is not to do anything (pixel was discarded), second to that comes writing "B", that is just writing what the shader computed.
In other words, ROP's operations aren't complex. One or two additions, and two multiplications as maximum.

In the other hand, pixel shaders units can be VERY busy because there's nearly no limit to what they can do (usually around 30-50 instructions, many texture lookups, and a couple of special instructions like log pow or sqrt).

Suppose the following example:
* 16 Shader units (let's call it SU) & 8 ROPs
* ROPs are idle. SU are computing lots of square roots, additions, multiplications.
* The 16 SUs finished their job. Go to next 16 pixels. ROPs start their job
* 16 SUs are working on pixels 16...31, ROPs are working on pixels 0...7
* 16 SUs are working on pixels 16...31, ROPs are working on pixels 8...15
* ROPs are idle again, 16 SUs still busy
* The 16 SUs finished their job. Go to next 16 pixels. ROPs start their job again.

See? You have 16 SUs, 8 ROPs and yet the bottleneck is in the SUs.
Ideally, the shader operations are balanced enough so that as soon as the SUs are done, the ROPs just finished with the last group of pixels; so that all components are never idle.

The reverse situation would happen when your Pixel shader is completely trivial, something like returning a constant colour, doing a texture lookup without applying lighting.

Cheers
Ok thanks! I finally understand it
Quote:Original post by IrYoKu1
Maybe TFs are the ones able to perform bilinear, trilinear and anisotropic reads, and TAs the ones that perform unfiltered reads (POINT,NEAREST).


No. TFs perform the bi/tri-linear, anisotropic or none filtering for you, so that you don't have to read multiple adjacent pixels from a texture and weight (filtering) in the shader units.

Texturing addressing units are in charge of mapping texels to pixels.
For example, you submit tex2D( mySampler, float2( 0.5, 0.5 ) ) but if the texture resolution is 512x512, what you intended is to read the pixel at pos (255, 255) (or 256, 256 depending on the alignment)
They also are in change of performing the texture addressing mode you asked for: Wrap, clamp, or border.
Otherwise, before reading in the pixel shader you would have to do something like this:

//WrapUV = ((UV * textureResolution) % textureResolution) / textureResolution;tex2D( mySampler, UV );//ClampUV = max( min( UV, 1.0f ), 0.0f );tex2D( mySampler, UV );//Borderif( UV > 1.0f || UV < 0.0f )   color = borderColor;else   color = tex2D( mySampler, UV );


Those precious (and ugly) operations are done automatically for you in dedicated, specialized hardware.
Same happens with filtering.

Cheers
Dark Sylinc
So, in that case TF units and TMUs are just different names for the same thing, right?

(taking into account Schrompf definition of TMU:
Quote:
Texture Mapping Units are those which fetch a colour for you when you sample a texture at a given position. TMUs do the conversion from the texture's pixel format to float4, perform the filtering with neighbouring texels according to the sampler settings, and possibly do special functions like the native shadow mapping depth comparision. Each of them also has a cache on its own to speedup consecutive accesses to nearby positions, as far as I know,

)

[Edited by - IrYoKu1 on October 30, 2009 4:41:52 PM]
Just found that TMU=TF+TA

Quote:
Each TMU contains 4 TA (Texture
Addressing) and 8 TF (Texture Filtering)
• TA:TF = 1:2, allowing full-speed FP16 bilinear
filtering and free 2xAF


From this slides:
http://http.download.nvidia.com/developer/cuda/seminar/TDCI_Arch.pdf

If somebody knows of a document that explains all this architecture stuff, please post it...

This topic is closed to new replies.

Advertisement