I took the data file and did a simple matrix multiplication and let the compiler optimize it.
-O3 -ffast-math and using single precission float my machine needed about 3ms to transform them all.
Using a fixedpoint arithmetic works too but needs about twice the time.