Explaning Image downsample

Started by
24 comments, last by Khatharr 11 years ago

That's the pixel on the next row down. Stride is the width of the image times the size of each pixel (4 bytes).

So if the width is 100, the stride is 400 bytes and

pDest[0] is the first byte of the top left pixel of the 2x2 square being considered and pDest[0 + stride] = pDest[400] which is the first byte of the pixel on the next row down of the source image.

"Most people think, great God will come from the sky, take away everything, and make everybody feel high" - Bob Marley
Advertisement

Stride is used to pad the width of an image so that its width in memory aligns to a power of two, which makes the pixels easier to reference with bit-math.

Sometimes stride is the 'true length' of a row, sometimes it's the 'added length' of a row. For instance, if you have a texture with width 100 there's a good chance that it's stored as a width 128 texture. The stride is either 128 or 28, depending on what API you're working with. When the image is rendered only the 100 pixel section gets drawn, but having the width stored as 128 means that you can use bit-math to select a row very quickly in the hardware.

Edit - Ah, yeah. In some cases the stride is stored in byte length rather than pixel length as well. It should be documented somewhere in what you're working with, but basically pixel + stride + 1 means the pixel that's one row down and one column to the right from where you're at. In this case it's clearly defined as 'int stride = pixelsPerRow * sizeof(uint32_t);' in the code itself.

As a term, whenever you see 'stride' you should just realize that it's dealing with the 'real length' of a row rather than the apparent length.

void hurrrrrrrr() {__asm sub [ebp+4],5;}

There are ten kinds of people in this world: those who understand binary and those who don't.

Note that the code posted doesn't actually calculate the stride width, it just multiplies the number of pixels per row by 4, so any calculation based on a stride that isn't the width must be done outside the function shown in the first post.

Not all strides are powers of 2, depends on the graphic file format (bmp files use next highest multiple of 4 for the stride, for example).

"Most people think, great God will come from the sky, take away everything, and make everybody feel high" - Bob Marley

Thanks for replies.

so the evaulation of that statement r = pSrc8[0] + pSrc8[4] + pSrc8[stride+0] + pSrc8[stride+4];

is : r = first red byte + 4th red byte of 4th pixel, red byte at row 800 and red byte 404 at row 800 ?

It's 'red byte I'm on' + 'red byte to my right' + 'red byte below me' + 'red byte below me and to my right'.

Adding the stride moves you down one row.

If you think of a 10x10 grid, the stride is 10. If the grid is represented an array of 100 bytes then index 4 is the 5th cell in the top row. index 4 + stride is index 14, which is the 5th cell in the second row.

One way to simplify this kind of process is to use a union to represent a color value:


union uColor {
  unint32_t u32;
  struct {
    unsigned char alpha;
    unsigned char green;
    unsigned char blue;
    unsigned char red;
  };
};

You can create a color:

uColor col;

then reference the uint value:

col.u32

or a specific channel:

col.red

You cast an array of texels as uColor* then use the union to easily access the channels without trying to work with two index scales between uint and uchar.

void hurrrrrrrr() {__asm sub [ebp+4],5;}

There are ten kinds of people in this world: those who understand binary and those who don't.
To downsize an image with an even higher quality result, i.e. more true to the orignal image in terms of overall brightness levels, you can perform dithering as well.
E.g. take Red colour chanel values of four pixels. Lets say they are 100, 101, 102, and 103.
Sum them up, you get 406.
Divide by 4 you get 101, remainder 2.
Because the remainder is 2, the rounding, by adding two befire dividing, produces a final result of 102. The problem is that the actual correct answer is 101.5

Big deal right? Nobody cares! You can't represent the 0.5 anyway, and rounding to the nearest integer beats always rounding down, or always rounding up!
That's all largely true, but well actually the person adding 2 cares enough to at least recognise the problem, but do they know the best solution...
Consider what happens if the majority of the remainders cause rounding to go up. E.g. What if all the pixels were the same colour? Overall the image brightness has changed. Yes it's barely noticeable in most cases, and 99% of people wont care about it, but it's there.
Wouldn't it be better if we took half of those pixels and rounded those down instead? Then the overall brightness would be the same.
Better still, we could add 0, 1, 2, or 3 before dividing by 4 each 1/4th of the time, which helps for the cases where the remainder is odd.

This tends to be done one of several ways:
In a patterened approach - pattern dithering.
In a randomised way, adding anything from 0-3 randomly before the division - random dithering.
Or, where the error term is accumulated as we travel across the pixels in each row - Floyd-Steinberg style dithering.

These are all techniques that subtley increases the quality of the resulting image. It certainly isn't so important in real-time rendering, and it is less important the higher the bit-depth. But when working with say 256-colour images in an image editing program for example, this stuff really makes a difference. Most people will be using an image processing application that will probably happen to use one of the above techniques anyway, so it's all done for you. If you're the one writing such an application, you might need to know this stuff in order to produce images of the same quality as other applications. I just thought I'd share it anyway.
"In order to understand recursion, you must first understand recursion."
My website dedicated to sorting algorithms

Thanks for all the input.

What should be added to the above function to downsample to any size ?

Thanks for all the input.

What should be added to the above function to downsample to any size ?

Ah, that's quite a different task.
As you know, the above code only downsamples by a factor of four exactly. To downsample by a different amount involves picking a strategy such as "bilinear interpolation", or "bicubic interpolation".
Perhaps try looking up those terms.
"In order to understand recursion, you must first understand recursion."
My website dedicated to sorting algorithms

One way to simplify this kind of process is to use a union to represent a color value:

union uColor {
unint32_t u32;
struct {
unsigned char alpha;
unsigned char green;
unsigned char blue;
unsigned char red;
};
};
You can create a color:

uColor col;

then reference the uint value:

col.u32

or a specific channel:

col.red

You cast an array of texels as uColor* then use the union to easily access the channels without trying to work with two index scales between uint and uchar.

Actually, that's undefined behaviour. You are not allowed to read from a member that wasn't written to directly. Also, casting a pointer to unsigned char to a structure is also undefined behaviour, since there is no guarantee by C++ that such cast is valid (e.g. due to alignment).

...

Actually, that's undefined behaviour. You are not allowed to read from a member that wasn't written to directly. Also, casting a pointer to unsigned char to a structure is also undefined behaviour, since there is no guarantee by C++ that such cast is valid (e.g. due to alignment).

It's not defined by C++. It is defined by the compiler. I've never come across a compiler that had a problem doing it correctly, even without pragmas. Endianness may be a concern if you port it, but that's easily solved. In short, if you're iterating through units and casting them as byte arrays then you're already in the 'grey area' even though it's something that has to be done all the time. May as well make it easier to work with.

I should probably have used uint8_t there, though, since I used uint32_t.

void hurrrrrrrr() {__asm sub [ebp+4],5;}

There are ten kinds of people in this world: those who understand binary and those who don't.

This topic is closed to new replies.

Advertisement