[Software rasterizer] UV coordinate range vs. texel data index range

Started by
6 comments, last by JNT 10 years, 3 months ago

Hi,

I recently noticed a small problem with my software rasterizer. Basically, I previously converted UV coordinates with the range [0.0, 1.0] to a data location (for sampling a texture's texels) using the following code:


int x = int(u * (textureWidth - 1));
int y = int(v * (textureHeight - 1));

I could then use the XY values to access the texel data that is stored linearly, like so:


texel32 Texture::getTexel(int x, int y) { return texels[y * textureWidth + x]; }

Pretty standard stuff. However, when mapping a quad with the entirety of a low resolution texture (2x2 texels) using the following UV coordinates (ranging the full [0.0, 1.0]) I noticed that this does not work as I expected. In essence, the quad is mapped predominantly by only a single texel, rather than all four sharing equal space on the quad. I know exactly why this happens (because the texels at the edges are sampled *only* when the UV coordinates reach *exactly* 1.0 while coordinates below 1.0 are converted to 0 when casting to int), but I can't figure out a good solution. I tried using the full width and height of the texture during conversion. This does map the entire range of the texture to the quad, but I have to make sure to wrap the texture coordinates for when the coordinates reach exactly 1.0, at which point I otherwise make an invalid data access:


int x = int(u * textureWidth) % (textureWidth - 1);
int y = int(v * textureHeight) % (textureHeight - 1);

However, I still get a sliver of texels at the edges of the quad where the coordinates wrap around to 0.0 from 1.0. Technically, this is the correct result, but it is not the result that I am expecting. I am expecting a quad mapped with the full texture, and nothing else.

I tried interpolating UV coordinates within the range of (start, end) rather than [start, end], but the results are unreliable using the implementation I tried. Another potential solution is to clamp the texture coordinates to the edges of the texture, but that would introduce branching (?) on a per-pixel basis, and that can't be good for performance.

My question is, is there an elegant solution for converting UV coordinates with the range [0.0, 1.0] to absolute coordinates with the range [0.0, N) where N is a texture dimension?

PS: If you are having problems visualizing the visual artifacts caused by the code above I could post some images later.

Advertisement
Just this should be fine:
int x = int(u * textureWidth);
int y = int(v * textureHeight);
Yep, you've got to implement wrap/clamp/border addressing modes for the general case.
However, these shouldn't be needed at all for the specific case where, for example, you're rendering a 256x256 image to a 256x256 render-target using a full-screen quad.
In this specific case, your pixels / shading-fragments are located at 0.5, 1.5, 2.5 ... 255.5 --- so when interpolating beween a tex-coord of 0.0 and 1.0, the left-most pixel uses the coord of 0.001953125 and the right-most pixel uses a coord of 0.998046875, and these map to the integers of 0 and 255.

I know exactly why this happens (because the texels at the edges are sampled *only* when the UV coordinates reach *exactly* 1.0 while coordinates below 1.0 are converted to 0 when casting to int), but I can't figure out a good solution.

you are truncating while casting to int, to solution is to round, simplest way is to

int x = int(u * (textureWidth - 1)+0.5f);
int y = int(v * (textureHeight - 1)+0.5f);

I know exactly why this happens (because the texels at the edges are sampled *only* when the UV coordinates reach *exactly* 1.0 while coordinates below 1.0 are converted to 0 when casting to int), but I can't figure out a good solution.

you are truncating while casting to int, to solution is to round, simplest way is to


int x = int(u * (textureWidth - 1)+0.5f);
int y = int(v * (textureHeight - 1)+0.5f);

Actually, this only shifts the texture by one half of a texel in either of the dimensions (unless you do something more than *just* rounding). That results in some goofy looking texels at the borders of the quad:

ybzHNEV.png

Just this should be fine:
int x = int(u * textureWidth);
int y = int(v * textureHeight);
Yep, you've got to implement wrap/clamp/border addressing modes for the general case.
However, these shouldn't be needed at all for the specific case where, for example, you're rendering a 256x256 image to a 256x256 render-target using a full-screen quad.
In this specific case, your pixels / shading-fragments are located at 0.5, 1.5, 2.5 ... 255.5 --- so when interpolating beween a tex-coord of 0.0 and 1.0, the left-most pixel uses the coord of 0.001953125 and the right-most pixel uses a coord of 0.998046875, and these map to the integers of 0 and 255.

This sounds logical to me since it is a similar solution to what I have tried before. I'll get back to this thread with the results or further questions.

I know exactly why this happens (because the texels at the edges are sampled *only* when the UV coordinates reach *exactly* 1.0 while coordinates below 1.0 are converted to 0 when casting to int), but I can't figure out a good solution.

you are truncating while casting to int, to solution is to round, simplest way is to




int x = int(u * (textureWidth - 1)+0.5f);
int y = int(v * (textureHeight - 1)+0.5f);

Actually, this only shifts the texture by one half of a texel in either of the dimensions (unless you do something more than *just* rounding). That results in some goofy looking texels at the borders of the quad:

ybzHNEV.png


yes of course, that's what you've asked for. your 2x2 quad will now look equally mapped.

you've pointed out the other proper way already by yourself

int x = Clamp(int(u * textureWidth),0,textureWidth - 1); 
int y = Clamp(int(v * textureHeight),0,textureHeight - 1);

while Clamp is

int Clamp(int a,int min,int max){return a<min?min:a>max?max:a;}

I know exactly why this happens (because the texels at the edges are sampled *only* when the UV coordinates reach *exactly* 1.0 while coordinates below 1.0 are converted to 0 when casting to int), but I can't figure out a good solution.

you are truncating while casting to int, to solution is to round, simplest way is to




int x = int(u * (textureWidth - 1)+0.5f);
int y = int(v * (textureHeight - 1)+0.5f);

Actually, this only shifts the texture by one half of a texel in either of the dimensions (unless you do something more than *just* rounding). That results in some goofy looking texels at the borders of the quad:

ybzHNEV.png


yes of course, that's what you've asked for. your 2x2 quad will now look equally mapped.

you've pointed out the other proper way already by yourself


int x = Clamp(int(u * textureWidth),0,textureWidth - 1); 
int y = Clamp(int(v * textureHeight),0,textureHeight - 1);

while Clamp is


int Clamp(int a,int min,int max){return a<min?min:a>max?max:a;}

I wouldn't consider that equally mapped. Actually, mapping using int x = int(u*width)%(width-1) looks better than rounding, but is still not quite correct. Regardless, to clarify my original post, I want the texture *uniformly* mapped across the quad without wrapping. Clamping seems very expensive to do if I'm sampling a texel at each pixel. Hodgman's solution seems resonable as it interpolates the range (start, end) rather than [start, end] and should therefore not need to introduce branching.

The code I posted (and you tried) is how GPUs will implement nearest neighbourhood texture addressing. This means your issue is actually with how you're addressing your pixels (which affects how you interpolate your vertex coords). You're not alone here -- D3D9 suffers from the same issue, where all their pixel addresses are off by 0.5 from where they should be :/

With clamping / wrapping - these don't necessarily have to use branches. Clamping (as shown above with the ternary "?:" operator) should be compiled to a CMOV instruction instead of a branch, and wrapping can use modulo.
Texture fetching is very likely to be bottle-necked by memory access speeds / cache misses, so these extra handful of addressing instructions hopefully won actually impact performance much.

The code I posted (and you tried) is how GPUs will implement nearest neighbourhood texture addressing. This means your issue is actually with how you're addressing your pixels (which affects how you interpolate your vertex coords). You're not alone here -- D3D9 suffers from the same issue, where all their pixel addresses are off by 0.5 from where they should be :/

With clamping / wrapping - these don't necessarily have to use branches. Clamping (as shown above with the ternary "?:" operator) should be compiled to a CMOV instruction instead of a branch, and wrapping can use modulo.
Texture fetching is very likely to be bottle-necked by memory access speeds / cache misses, so these extra handful of addressing instructions hopefully won actually impact performance much.

I see. Here's the code I am currently working with (it's been the same rasterization algorithm that I have used for ages):


void mglSWMonoRasterizer::RenderScanline(int x1, int x2, int, mmlVector<2> t1, mmlVector<2> t2, unsigned int *pixels)
{
    const int width = framebuffer->GetWidth();
    if (x1 >= width || x2 < 0 || x1 == x2) { return; }
    const mmlVector<2> dt = (t2 - t1) * (1.0f / (x2 - x1));
    t1 += dt * 0.5f; // UV sample offset
    if (x1 < 0) {
        t1 += dt * (float)(-x1);
        x1 = 0;
    }
    if (x2 > width) { x2 = width; }
    for (int x = x1; x < x2; ++x) {
        pixels[x] = *m_currentTexture->GetPixelUV(t1[0], t1[1]);
        t1 += dt;
    }
}

void mglSWMonoRasterizer::RenderTriangle(mmlVector<5> va, mmlVector<5> vb, mmlVector<5> vc)
{
    mmlVector<5> *a = &va, *b = &vb, *c = &vc;
    if ((*a)[1] > (*b)[1]) { mmlSwap(a, b); }
    if ((*a)[1] > (*c)[1]) { mmlSwap(a, c); }
    if ((*b)[1] > (*c)[1]) { mmlSwap(b, c); }

    const mmlVector<5> D12 = (*b - *a) * (1.f / ((*b)[1] - (*a)[1]));
    const mmlVector<5> D13 = (*c - *a) * (1.f / ((*c)[1] - (*a)[1]));
    const mmlVector<5> D23 = (*c - *b) * (1.f / ((*c)[1] - (*b)[1]));
    const mmlVector<2> UVSampleOffset = mmlVector<2>::Cast(&D13[3]) * 0.5f;

    const int width = framebuffer->GetWidth();
    const int height = framebuffer->GetHeight();
    
    int sy1 = (int)ceil((*a)[1]);
    int sy2 = (int)ceil((*b)[1]);
    const int ey1 = mmlMin2((int)ceil((*b)[1]), height);
    const int ey2 = mmlMin2((int)ceil((*c)[1]), height);
    
    unsigned int *pixels = framebuffer->GetPixels(sy1);
    
    const float DIFF1 = ceil((*a)[1]) - (*a)[1];
    const float DIFF2 = ceil((*b)[1]) - (*b)[1];
    const float DIFF3 = ceil((*a)[1]) - (*c)[1];
    
    if (D12[0] < D13[0]){
        mmlVector<5> start = (D12 * DIFF1) + *a;
        mmlVector<5> end = (D13 * DIFF3) + *c;
        // UV sample offset
        mmlVector<2>::Cast(start+3) += UVSampleOffset;
        mmlVector<2>::Cast(end+3) += UVSampleOffset;
        if (sy1 < 0){
            start += D12 * ((float)-sy1);
            end += D13 * ((float)-sy1); // doesn't need to be corrected any further
            pixels += width * (-sy1); // doesn't need to be corrected any further
            sy1 = 0;
        }

        for (int y = sy1; y < ey1; ++y, pixels+=width, start+=D12, end+=D13){
            RenderScanline((int)start[0], (int)end[0], y, mmlVector<2>::Cast(start+3), mmlVector<2>::Cast(end+3), pixels);
        }
        
        start = (D23 * DIFF2) + *b;
        // UV sample offset
        mmlVector<2>::Cast(start+3) += UVSampleOffset;
        if (sy2 < 0){
            start += D23 * ((float)-sy2);
            sy2 = 0;
        }

        for (int y = sy2; y < ey2; ++y, pixels+=width, start+=D23, end+=D13){
            RenderScanline((int)start[0], (int)end[0], y, mmlVector<2>::Cast(start+3), mmlVector<2>::Cast(end+3), pixels);
        }
    } else {
        mmlVector<5> start = (D13 * DIFF3) + *c;
        mmlVector<5> end = (D12 * DIFF1) + *a;
        // UV sample offset
        mmlVector<2>::Cast(start+3) += UVSampleOffset;
        mmlVector<2>::Cast(end+3) += UVSampleOffset;
        if (sy1 < 0){
            start += D13 * ((float)-sy1); // doesn't need to be corrected any further
            end += D12 * ((float)-sy1);
            pixels += width * (-sy1); // doesn't need to be corrected any further
            sy1 = 0;
        }
        
        for (int y = sy1; y < ey1; ++y, pixels+=width, start+=D13, end+=D12){
            RenderScanline((int)start[0], (int)end[0], y, mmlVector<2>::Cast(start+3), mmlVector<2>::Cast(end+3), pixels);
        }
        
        end = (D23 * DIFF2) + *b;
        // UV sample offset
        mmlVector<2>::Cast(end+3) += UVSampleOffset;
        if (sy2 < 0){
            end += D23 * ((float)-sy2);
            sy2 = 0;
        }

        for (int y = sy2; y < ey2; ++y, pixels+=width, start+=D13, end+=D23){
            RenderScanline((int)start[0], (int)end[0], y, mmlVector<2>::Cast(start+3), mmlVector<2>::Cast(end+3), pixels);
        }
    }
}

I have commented all the places where I offset the UV coordinates, but this still only works sporadically, and I seem to overshoot and wrap around the texture when the quad is oriented in different ways. I really have no clue why this is.

How would one go about avoiding this situation alltogether? Do you know of any resources that cover proper rasterization (with emphasis on "proper")? The basics are simple, but there are small details with rasterization (such as air-tight sub-pixel precision and texturing) that are often only dealt with in passing, or most often not at all, in the resources that I have consulted.

This topic is closed to new replies.

Advertisement