# RGB-HSV use SSE

This topic is 3838 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

__inline void ConvertRGB2HSV(
unsigned char sR,unsigned char sG,unsigned char sB,int *dH,int *dS,int *dV)
{
float fTemp,fDelta;
int nMin,nMax;
nMin = min( sR, min( sG, sB ));
nMax = max( sR, max( sG, sB ));
fDelta = nMax - nMin;
*dV = nMax;
if(nMax == nMin)
*dH = *dS = 0;
else
{
fTemp = fDelta / nMax;
*dS= (int) (fTemp * 255);
if(sR == nMax)
fTemp = (float)(sG - sB) / fDelta;
else if(sG == nMax)
fTemp = 2.0 + ((float)(sB - sR) / fDelta);
else
fTemp = 4.0 + ((float)(sR - sG) / fDelta);
fTemp *= 60;
if(fTemp < 0)
fTemp += 360;
if(fTemp == 360)
fTemp = 0;
*dH = (int)fTemp;
}
}

unsinged char sR,sG,sB;
int dH,dS,dV;
int nMaxH,nMinH,nMaxS,nMinS,nMaxV,nMinV;
nMaxH = nMaxS = nMaxV = 0;
nMinH = nMinS = nMinV = 360;
RGBTRIPLE *pPixel = (RGBTRIPLE *)pSourceBuffer;
//for(int j = 0; j < 576; j++)
//for(int i = 0; i < 768; i++)
int i=0;
while(i++ < 768 * 576)
{
sR = pPixel->rgbtRed;
sG = pPixel->rgbtGreen;
sB = pPixel->rgbtBlue;
ConvertRGB2HSV( sR, sG, sB, &dH, &dS, &dV);
if( dH > nMaxH)
nMaxH = dH;
if( dH < nMinH)
nMinH = dH;
if( dS > nMaxS)
nMaxS = dS;
if( dS < nMinS)
nMinS = dS;
if( dV > nMaxV)
nMaxV = dV;
if( dV < nMinV)
nMinV = dV;
pPixel++;
}
ConvertRGB2HSV(...) is too slow.I want to rewrite this function (or the all code)using SSE (C++ code with inline Assembly ,like as: _asm{ movaps xmm1,xmm2 ... }) The compiler is VC6(SP5),support inline assembly with SSE instructions. I want to use SSE,Who can help me?Please!

##### Share on other sites
Your algorithm appears to have too many branches to benefit from SIMD reliably. About the only place where you could apply some parallelization is in the final "determine minmax" six comparisons, but these only account for 4% of your execution time anyway. Where did you intend to apply SSE?

My first check would be to determine whether the call is actually inlined (on VC2005, for instance, it isn't).

##### Share on other sites
Quote:
 Original post by ToohrVykYour algorithm appears to have too many branches to benefit from SIMD reliably. About the only place where you could apply some parallelization is in the final "determine minmax" six comparisons, but these only account for 4% of your execution time anyway. Where did you intend to apply SSE?My first check would be to determine whether the call is actually inlined (on VC2005, for instance, it isn't).

If the compiler is smart, it's not going to be inlined any time soon - the function is far too heavy.

Now, this alorithm is quite difficult to vectorize. However, there are some possible areas of improvement:
* Avoid unnecessary data type changes (and casts). And remember, 2.0 is a double, and 2.0f is a float.
* fDelta shall contain 1,0f / static_cast<float>(nMax - nMin) - you can then get rid of many divisions.

I would rewrite the code like this:
void ConvertRGB2HSV(unsigned char sR,                    unsigned char sG,                    unsigned char sB,                    int *dH,int *dS,int *dV){  float fTemp, fDelta;  int nMin, nMax;  nMin = std::min(sR, std::min(sG, SB));  nMax = std::max(sR, std::max(sG, SB));  *dV = nMax;  if (nMin == nMax) {    *dH = *dS = 0;  } else {    // no need to use any float to get dS    *dS = ((nMax - nMin) * 255) / nMax;    // we only compute fDelta here: we incorporate the    // the *60, and we store the inverse of the final value    fDelta = 60.0f / static_cast<float>(nMax - nMin);    // constants also include the *60, so we avoid the final    // multiplication    if(sR == nMax) {      fTemp = static_cast<float>(sG - sB) * fDelta;    } else if(sG == nMax) {        fTemp = 120.0f + static_cast<float>(sB - sR) * fDelta;    } else {      fTemp = 240.0f + static_cast<float>(sR - sG) * fDelta;    }    // cast now - integer comparisons and operations are    // going to be faster than the same operation on floats    *dH = static_cast<int>(fTemp);    if (*dH < 0) *dH += 360;    else if (*dH >= 360) *dH -= 360;  }}

Now, I won't say you that this version performs incrediby faster than your version. One thing that can be improved is to use a fake float-to-int conversion - since you're dealing with a range that allow you to remove all checks AND you are not using your result anymore.
You can include the small function:
namespace {  // this is, of course, heavily MS oriented. If you are using GCC,   // the syntax will change.  inline int fake_ftol(float v)  {    int retval;    __asm fld a    __asm fistp retval    return retval;      }}
*dH = static_cast<int>(fTemp);

becomes
*dH = fake_ftol(fTemp);

We are in the realm of micro-optimization, so don't expect a terrible boost.

You can also avoid passing 6 parameters for each function call - just pass a const reference to the RGBTRIPLE and a non-const reference to a HSVTRIPLE:
struct HSVTRIPLE{  int h, s, v;}void ConvertRGB2HSV(const RGBTRIPLE& rgb, HSVTRIPLE& hsv){  // ...}RGBTRIPLE *pSrc = (RGBTRIPLE *)pSourceBuffer;RGBTRIPLE *pSrcEnd = pPixel + (768 * 576)// let's say we're storing the results in an imageHSVTRIPLE *pDestBuffer = new HSVTRIPLE[768*576];HSVTRIPLE *pDest = pDestBuffer;while(pSrc < pSrcEnd){  ConvertRGB2HSV(*pPixel, *pDest);  // update the max and min values  // ...  ++pDestBegin;  ++pSrc;}

You'll avoid as much as 4 push per function call - and that's quite important for a function which is called that often.

##### Share on other sites
Thank you very much!

##### Share on other sites
Quote:
 Original post by Emmanuel DelogetIf the compiler is smart, it's not going to be inlined any time soon - the function is far too heavy.

If the compiler managed to determine that the ConvertRGB2HSV function was only called within this code, and was always called exactly once (per loop iteration) when that code was run, then it could inline it to save both time and space.

So, not inlining would be the situation of a compiler that is smart, but not smart enough [wink]

##### Share on other sites
Entire algorithm could also be rewritten like this:
void ConvertRGB2HSV(unsigned char sR,                    unsigned char sG,                    unsigned char sB,                    int *dH,int *dS,int *dV){  float fTemp, fDelta;  int nMin, nMax;  nMin = std::min(sR, std::min(sG, SB));  nMax = std::max(sR, std::max(sG, SB));  *dV = nMax;  if (nMin == nMax) {    *dH = *dS = 0;  } else {    // no need to use any float to get dS    *dS = ((nMax - nMin) * 255) / nMax;  }}

Hue does not have minimum or maximum, it's a cyclical value. It's also not independant of other two - the relations between H, S and V are not linear in the way R, G and B are.

Whatever the resulting value ranges will be, they will have no real meaning - increasing single H, S or V value is non-linear to changes in RGB - as such, HSV min/max values are meaningless.

See the definition of hue for why finding min/max for it doesn't make sense.

Given the (17, 55, 18) and (56, 7, 100) H, S and V values, the minimum would be: 17, 7, 18) - which is meaningless color in RGB space. The only thing that would matter would be the last one (18 or 100), but you cannot calculate the values for H and S from that value alone, due to non-linearity.

And for finding brightness, you should use the YUV (YCrCb) transform, which gives meaningful value (from human perspective).

##### Share on other sites
Quote:
Original post by ToohrVyk
Quote:
 Original post by Emmanuel DelogetIf the compiler is smart, it's not going to be inlined any time soon - the function is far too heavy.

If the compiler managed to determine that the ConvertRGB2HSV function was only called within this code, and was always called exactly once (per loop iteration) when that code was run, then it could inline it to save both time and space.

So, not inlining would be the situation of a compiler that is smart, but not smart enough [wink]

Thou be ruining my expectations!

Thanks god, I'm drunk, so I don't care much [smile]

##### Share on other sites

This topic is 3838 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.