Efficient 24/32-bit sRGB to linear float image conversion on CPU

Graphics and GPU Programming Programming

Started by Chris_F April 11, 2015 07:53 PM

3 comments, last by Hodgman 9 years ago

3,030

Author

April 11, 2015 07:53 PM

Does anyone know of some efficient ways of converting 24/32-bit sRGB to linear floating point on the CPU? I haven't got access to a CPU with AVX2 instructions yet, but I am intrigued by the new gather instructions. I was thinking that these could possibly be used for this type of conversion, such as in this example below. The LUT would be 256x4 bytes, so I imagine it would fit entirely into L1 data cache.


__m256 RGBA8toRGBA32F(const char* pixel_data, const float* LUT)
{
    return _mm256_i32gather_ps(LUT, _mm256_cvtepu8_epi32(_mm_load_si128((__m128i*)pixel_data)), 4);
}

IYP

1,545

April 11, 2015 09:00 PM

for a given sRGB (sR,sB,sG) and Gamma the RGB(R,G,B) will be :

R=pow(sR,1/Gamma)

G=pow(sG,1/Gamma)

B=pow(sB,1/Gamma)

but I don't think you can do this with AVX2's functions

Blog: HG3D Graphics

Chris_F

3,030

Author

April 11, 2015 09:44 PM

for a given sRGB (sR,sB,sG) and Gamma the RGB(R,G,B) will be :

R=pow(sR,1/Gamma)

G=pow(sG,1/Gamma)

B=pow(sB,1/Gamma)

but I don't think you can do this with AVX2's functions

sRGB isn't a gamma, see: http://entropymine.com/imageworsener/srgbformula/

IYP

1,545

April 11, 2015 10:44 PM

read http://en.wikipedia.org/wiki/Gamma_correction#Windows.2C_Mac.2C_sRGB_and_TV.2Fvideo_standard_gammas and

"Unlike most other RGB color spaces, the sRGB gamma cannot be expressed as a single numerical value. The overall gamma is approximately 2.2, consisting of a linear (gamma 1.0) section near black, and a non-linear section elsewhere involving a 2.4 exponent and a gamma (slope of log output versus log input) changing from 1.0 through about 2.3."

in http://en.wikipedia.org/wiki/SRGB 2nd paragraph well it says that sRGB is not using a single gamma and gpu uses a table to change the out put RGB to sRGB and the link you sent itself uses 2.4 as gamma

Blog: HG3D Graphics

Hodgman

52,717

April 12, 2015 02:29 AM

It involves a gamma curve, but isn't a gamma curve - pow2.2 is a good approximation, but for accuracy it's important to use the real formula with the linear tail.
$33093914fd8e8a5b71a35360155af91d.png$
Where $94af36b08a89271078d4a538585fca35.png$

I'd implement your look-up-table version, and a plain ALU version and profile them in a real usage situation. The LUT version's performance will depend heavily on how much pressure is on the cache.

For the ALU version you can do both sides of the discontinuity and then select the correct side branchlessly.
a = srgb*(1/12.92);
b = pow((srgb+0.055)*(1/1.055),2.4);
rgb = srgb ? 0.04045 ? a : b;
^ That final ternary statement can be implemented with conditional moves/shuffles, masking and adding (ANDing and ORing), etc...

...but the pow is costly, so maybe you do want to use a real branch if any elements in the vec4/vec8 need it.

n.b. to SSEize the pow, you can use exp/log instead:
b = exp(log((srgb+0.055)*(1/1.055))*2.4);
...and get an exp/log implementation from a library like http://gruntthepeon.free.fr/ssemath/

[edit]

To write this kind of SIMD code, I've recently been using the ISPC language, which lets you write the algorithm once and then compile it to SSE2/AVX/AVX2/etc... Gathers/scatters will be emulated on the older instruction sets though, of course.

. 22 Racing Series .

Efficient 24/32-bit sRGB to linear float image conversion on CPU

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Efficient 24/32-bit sRGB to linear float image conversion on CPU

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines