Explaning Image downsample

Recommended Posts

Hi All,

I tried to figure out the meaning of that function, I couldn't, even tried on paper, but didn't get it. Would someone explain that to me ? If that's possible with Ascii text or some images would be so much appreciated.

static void inline resizeRow(uint32_t *dst, uint32_t *src, uint32_t pixelsPerRow)

{
uint8_t * pSrc8 = (uint8_t *)src;
uint8_t * pDest8 = (uint8_t *)dst;
int stride = pixelsPerRow * sizeof(uint32_t);
int x;
int r, g, b, a;

for (x=0; x<pixelsPerRow; x++)
{
r = pSrc8[0] + pSrc8[4] + pSrc8[stride+0] + pSrc8[stride+4];
g = pSrc8[1] + pSrc8[5] + pSrc8[stride+1] + pSrc8[stride+5];
b = pSrc8[2] + pSrc8[6] + pSrc8[stride+2] + pSrc8[stride+6];
a = pSrc8[3] + pSrc8[7] + pSrc8[stride+3] + pSrc8[stride+7];
pDest8[0] = (uint8_t)((r + 2)/4); // average with rounding
pDest8[1] = (uint8_t)((g + 2)/4);
pDest8[2] = (uint8_t)((b + 2)/4);
pDest8[3] = (uint8_t)((a + 2)/4);
pSrc8 += 8; // skip forward 2 source pixels
pDest8 += 4; // skip forward 1 destination pixel
}
}

Edited by Ahmed Egyptian

Share on other sites

It's halving the size of 2 rows of the source image by adding together the r, g, b and a components of each 4x4 pixel square in the 2 rows of the source image, adding 0.5 to each value as well (so adding 2 to the total), then dividing by 4, to get the destination colour.

Looks like the for loop should go to pixelsPerRow - 1 though, otherwise it will read pixels off the right hand side on the last iteration.

Presumably the function is called in a loop for every other row of the source image.

Share on other sites
ApochPiQ    23000
Downsampling is basically just taking an image at a higher resolution and redrawing it at a lower resolution.

For example, suppose I have a 10x10 pixel image, and downsample it to 5x5. There are a number of ways to do this. First, I could simply skip every other pixel:

Source image: 0 1 2 3 4 5 6 7 8 9

Destination image: 0 2 4 6 8

And then I skip a row, and repeat this process on the next row.

This is going to make things look a little bad, though. A better approach is to average the pixels:
Source image:

A B
C D

Destination pixel at 0,0 is A+B+C+D / 4
It looks like your code is doing the second method.

Share on other sites

I think the loop should probably go from 0 to pixelsPerRow / 2 or else x should increment by 2 every iteration...

Share on other sites

Thanks for your answer, I have made an imaginary bitmap, so I got 4*4 pixels assuming 1 Pixel is 3 bytes RGB, and I added them, but how come from two rows I get 4*4 pixels?

Would someone please show me some pics ?

Regarding also those lines:

                pDest8[0] = (uint8_t)((r + 2)/4); // average with rounding
pDest8[1] = (uint8_t)((g + 2)/4);
pDest8[2] = (uint8_t)((b + 2)/4);
pDest8[3] = (uint8_t)((a + 2)/4);
pSrc8 += 8; // skip forward 2 source pixels
pDest8 += 4; // skip forward 1 destination pixel


He skipped 4 bytes, but at the second iteration he writes again at 0,1,2,3 at the dest, should be 4,5,6,7 ?

Share on other sites

That's because it's a pointer not an array... if you do

pSrc[offset] that's just shorthand for *(pSrc + offset) but you add 4 to pSrc each iteration so if you have

char myCharArray[10000] = { some data };

char* pSrc = myCharArray;

pSrc[0] = 10;

pSrc += 4;

pSrc[0] = 11;

that's the same as writing 10 to myCharArray[0] and 11 to myCharArray[4].

EDIT: Oops, mixed up order of pSrc and pBase in first attempt.

EDIT2: Changed pBase into myCharArray... easier to understand...

Share on other sites

Thanks for your explanation, you know what I have worked a lot in C,C++, and I never know that info!! I'm gonna review pointers now..

If its ok for you, would you please just draw a simple bitmap to get the algorithm ?

I would like also to modify it so that it works for any scale, not just by two.

Share on other sites

I'm terrible at drawing ;)

It's not easy to make it work for scales other than a simple division of the width either... you need to use a different way of weighting the pixel values since the sample points you take aren't centred on the pixels of the source image. Best to use a graphics library to do the downsizing for you (if you need to do it at runtime) or if you just need to resize a lot of images once use an image filtering program (or something like photoshop).

Share on other sites

I have used opencv function cv::resize and it is slow on ARM... that's why I wanna write my own, and convert it to ARM Neon...

Share on other sites

What is meant by stride + 0 ,  stride + 1 ?

Share on other sites

That's the pixel on the next row down. Stride is the width of the image times the size of each pixel (4 bytes).

So if the width is 100, the stride is 400 bytes and

pDest[0] is the first byte of the top left pixel of the 2x2 square being considered and pDest[0 + stride] = pDest[400] which is the first byte of the pixel on the next row down of the source image.

Share on other sites
Khatharr    8812

Stride is used to pad the width of an image so that its width in memory aligns to a power of two, which makes the pixels easier to reference with bit-math.

Sometimes stride is the 'true length' of a row, sometimes it's the 'added length' of a row.  For instance, if you have a texture with width 100 there's a good chance that it's stored as a width 128 texture. The stride is either 128 or 28, depending on what API you're working with. When the image is rendered only the 100 pixel section gets drawn, but having the width stored as 128 means that you can use bit-math to select a row very quickly in the hardware.

Edit - Ah, yeah. In some cases the stride is stored in byte length rather than pixel length as well. It should be documented somewhere in what you're working with, but basically pixel + stride + 1 means the pixel that's one row down and one column to the right from where you're at. In this case it's clearly defined as 'int stride = pixelsPerRow * sizeof(uint32_t);' in the code itself.

As a term, whenever you see 'stride' you should just realize that it's dealing with the 'real length' of a row rather than the apparent length.

Edited by Khatharr

Share on other sites

Note that the code posted doesn't actually calculate the stride width, it just multiplies the number of pixels per row by 4, so any calculation based on a stride that isn't the width must be done outside the function shown in the first post.

Not all strides are powers of 2, depends on the graphic file format (bmp files use next highest multiple of 4 for the stride, for example).

Share on other sites

Thanks for replies.

so the evaulation of that statement   r = pSrc8[0] + pSrc8[4] + pSrc8[stride+0] + pSrc8[stride+4];

is :    r = first red byte + 4th red byte of 4th pixel, red byte at row 800  and red byte 404 at row 800 ?

Edited by Ahmed Egyptian

Share on other sites
Khatharr    8812

It's 'red byte I'm on' + 'red byte to my right' + 'red byte below me' + 'red byte below me and to my right'.

Adding the stride moves you down one row.

If you think of a 10x10 grid, the stride is 10. If the grid is represented an array of 100 bytes then index 4 is the 5th cell in the top row. index 4 + stride is index 14, which is the 5th cell in the second row.

One way to simplify this kind of process is to use a union to represent a color value:

union uColor {
unint32_t u32;
struct {
unsigned char alpha;
unsigned char green;
unsigned char blue;
unsigned char red;
};
};


You can create a color:

uColor col;

then reference the uint value:

col.u32

or a specific channel:

col.red

You cast an array of texels as uColor* then use the union to easily access the channels without trying to work with two index scales between uint and uchar.

Share on other sites
iMalc    2466
To downsize an image with an even higher quality result, i.e. more true to the orignal image in terms of overall brightness levels, you can perform dithering as well.
E.g. take Red colour chanel values of four pixels. Lets say they are 100, 101, 102, and 103.
Sum them up, you get 406.
Divide by 4 you get 101, remainder 2.
Because the remainder is 2, the rounding, by adding two befire dividing, produces a final result of 102. The problem is that the actual correct answer is 101.5

Big deal right? Nobody cares! You can't represent the 0.5 anyway, and rounding to the nearest integer beats always rounding down, or always rounding up!
That's all largely true, but well actually the person adding 2 cares enough to at least recognise the problem, but do they know the best solution...
Consider what happens if the majority of the remainders cause rounding to go up. E.g. What if all the pixels were the same colour? Overall the image brightness has changed. Yes it's barely noticeable in most cases, and 99% of people wont care about it, but it's there.
Wouldn't it be better if we took half of those pixels and rounded those down instead? Then the overall brightness would be the same.
Better still, we could add 0, 1, 2, or 3 before dividing by 4 each 1/4th of the time, which helps for the cases where the remainder is odd.

This tends to be done one of several ways:
In a patterened approach - pattern dithering.
In a randomised way, adding anything from 0-3 randomly before the division - random dithering.
Or, where the error term is accumulated as we travel across the pixels in each row - Floyd-Steinberg style dithering.

These are all techniques that subtley increases the quality of the resulting image. It certainly isn't so important in real-time rendering, and it is less important the higher the bit-depth. But when working with say 256-colour images in an image editing program for example, this stuff really makes a difference. Most people will be using an image processing application that will probably happen to use one of the above techniques anyway, so it's all done for you. If you're the one writing such an application, you might need to know this stuff in order to produce images of the same quality as other applications. I just thought I'd share it anyway. Edited by iMalc

Share on other sites

Thanks for all the input.

What should be added to the above function to downsample to any size ?

Share on other sites
iMalc    2466

Thanks for all the input.

What should be added to the above function to downsample to any size ?

Ah, that's quite a different task.
As you know, the above code only downsamples by a factor of four exactly. To downsample by a different amount involves picking a strategy such as "bilinear interpolation", or "bicubic interpolation".
Perhaps try looking up those terms.

Share on other sites
rozz666    896

One way to simplify this kind of process is to use a union to represent a color value:

union uColor {
unint32_t u32;
struct {
unsigned char alpha;
unsigned char green;
unsigned char blue;
unsigned char red;
};
};
You can create a color:

uColor col;

then reference the uint value:

col.u32

or a specific channel:

col.red

You cast an array of texels as uColor* then use the union to easily access the channels without trying to work with two index scales between uint and uchar.

Actually, that's undefined behaviour. You are not allowed to read from a member that wasn't written to directly. Also, casting a pointer to unsigned char to a structure is also undefined behaviour, since there is no guarantee by C++ that such cast is valid (e.g. due to alignment).

Edited by rozz666

Share on other sites
Khatharr    8812

...

Actually, that's undefined behaviour. You are not allowed to read from a member that wasn't written to directly. Also, casting a pointer to unsigned char to a structure is also undefined behaviour, since there is no guarantee by C++ that such cast is valid (e.g. due to alignment).

It's not defined by C++. It is defined by the compiler. I've never come across a compiler that had a problem doing it correctly, even without pragmas. Endianness may be a concern if you port it, but that's easily solved. In short, if you're iterating through units and casting them as byte arrays then you're already in the 'grey area' even though it's something that has to be done all the time. May as well make it easier to work with.

I should probably have used uint8_t there, though, since I used uint32_t.

Edited by Khatharr

Share on other sites
rozz666    896

...

Actually, that's undefined behaviour. You are not allowed to read from a member that wasn't written to directly. Also, casting a pointer to unsigned char to a structure is also undefined behaviour, since there is no guarantee by C++ that such cast is valid (e.g. due to alignment).

It's not defined by C++. It is defined by the compiler. I've never come across a compiler that had a problem doing it correctly, even without pragmas. Endianness may be a concern if you port it, but that's easily solved. In short, if you're iterating through units and casting them as byte arrays then you're already in the 'grey area' even though it's something that has to be done all the time. May as well make it easier to work with.

I should probably have used uint8_t there, though, since I used uint32_t.

Which compiler are you refering to? gcc casting a pointer of bytes to a pointer unions still breaks the strict aliasing rule. As for union "casting" have a look here: http://stackoverflow.com/questions/1812348/a-question-about-union-in-c/1812359#1812359

Share on other sites
Khatharr    8812

?

Maybe GCC has changed since I used it. I started using that kind of union for PSP dev with cygwin and g++ and later used it in VS2K8 and 2010.

This code runs under 2010 without complaint. I don't have g++ since my reformat, does this cause errors/warnings?

#include <stdio.h>

union uColor {
unsigned int uint;
struct {
unsigned char alpha;
unsigned char blue;
unsigned char green;
unsigned char red;
};
//unsigned char ary[4];
};

int main() {
unsigned pixels[100];
uColor* cols = (uColor*)pixels;
for(int i = 0; i < 100; ++i) {
cols[i].uint = 0;
cols[i].alpha = 255;
}
printf("%08x\n", cols[12].uint);
return 0;
}



Share on other sites
rozz666    896

This code runs under 2010 without complaint.

Indeed, VC implements nonstandard behaviour: http://msdn.microsoft.com/en-us/library/ms177255.aspx

quote name='Khatharr' timestamp='1364186046' post='5046427']

I don't have g++ since my reformat, does this cause errors/warnings?

[/quote]

It works, but it doesn't matter. Undefined behaviour may sometimes work.
In your case, I would just use std::memcpy. It's safe and VC will optimise it (into MOVs).

Share on other sites
Khatharr    8812

The thing is, I don't really see a problem with the cast. As long as the union is being formed correctly by the compiler it shouldn't cause any kind of problem. Just because it's not defined by the C++ standard doesn't necessarily mean it's anathema to all who dare.

In fact...

http://www.cplusplus.com/doc/tutorial/other_data_types/

says...

One of the uses a union may have is to unite an elementary type with an array or structures of smaller elements. For example:

union mix_t {
long l;
struct {
short hi;
short lo;
} s;
char c[4];
} mix;

As long as you're aware of the potential problems involved in union access, type puns, packing, alignment, endianness - none of which should be an issue here - the behavior should be consistent.

I don't understand how std::memcpy relates to this?

Share on other sites
rozz666    896

The thing is, I don't really see a problem with the cast. As long as the union is being formed correctly by the compiler it shouldn't cause any kind of problem. Just because it's not defined by the C++ standard doesn't necessarily mean it's anathema to all who dare.

Actually, it is. Unless you have an explicit specification of compiler behavior (like VC), you can't rely on it.

http://www.cplusplus.com/doc/tutorial/other_data_types/

says...

Quote

One of the uses a union may have is to unite an elementary type with an array or structures of smaller elements. For example:

union mix_t {
long l;
struct {
short hi;
short lo;
} s;
char c[4];
} mix;

The site is wrong.

As long as you're aware of the potential problems involved in union access, type puns, packing, alignment, endianness - none of which should be an issue here - the behavior should be consistent.

It shouldn't. The language gives no guarantees, so it can fail randomly and you have no right to complain.

I don't understand how std::memcpy relates to this?

It provides you with a safe approach. You can copy your bytes into a struct instead of casting. It's defined.