It involves a gamma curve, but isn't a gamma curve - pow2.2 is a good approximation, but for accuracy it's important to use the real formula with the linear tail.
Where
I'd implement your look-up-table version, and a plain ALU version and profile them in a real usage situation. The LUT version's performance will depend heavily on how much pressure is on the cache.
For the ALU version you can do both sides of the discontinuity and then select the correct side branchlessly.
a = srgb*(1/12.92);
b = pow((srgb+0.055)*(1/1.055),2.4);
rgb = srgb ? 0.04045 ? a : b;
^ That final ternary statement can be implemented with conditional moves/shuffles, masking and adding (ANDing and ORing), etc...
...but the pow is costly, so maybe you do want to use a real branch if any elements in the vec4/vec8 need it.
n.b. to SSEize the pow, you can use exp/log instead:
b = exp(log((srgb+0.055)*(1/1.055))*2.4);
...and get an exp/log implementation from a library like http://gruntthepeon.free.fr/ssemath/
[edit]
To write this kind of SIMD code, I've recently been using the ISPC language, which lets you write the algorithm once and then compile it to SSE2/AVX/AVX2/etc... Gathers/scatters will be emulated on the older instruction sets though, of course.