• ### Popular Now

• 13
• 18
• 19
• 27
• 9

#### Archived

This topic is now archived and is closed to further replies.

# Fastest way to convert WORDs to floats

This topic is 6312 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

Ok, I have a sound program I''m working on and it uses directsound''s capture interface to collect a stereo audio stream. So, in case you don''t know, the data stream comes in packed DWORDs, one WORD of it is the left channel, and the other WORD is the right channel. Now, I need to extract each WORD into two floating point arrays (one for left, one for right), to so some DSP processing... <sigh> It''s damn slow the way I have it now, I know I have a lot going against me... but I can''t change the way sound cards collet data...
  int l=-1; int r=-2; for(int j=0;j

##### Share on other sites
Assuming that the target is a PC (with dx, it must be), couldn''t you use MMX for all of the DSP processing, using fixed-point arithmetic...I assume that you just need the floating-point for precission.

##### Share on other sites
First, a question: you''re copying from two float arrays into two target arrays. Is this what you want?

If I remember correctly from my embedded days, it''s faster to use incrementing pointers than indexing. Have you tried something like this?
  WORD *rSrc, *lSrc;float *rTgt, *lTgt;rSrc = m_dwBuffer;lSrc = m_pvBuffer; // should this be m_dwBuffer + 1?rTgt = fRight;lTgt = lRight;int count = m_dwBufferSize / 2;while (count--){ *lTgt++ = (float) *lSrc; lSrc += 2; *rTgt++ = (float) *rSrc; rSrc += 2;} `

There''s probably some other way to optimize this too. I''ve never been good with micro-optimization.

DSP is very CPU-intensive, which is why they make separate chips to do it. My senior project for my Bachelor''s degree used PC-based DSP on sound samples from CDs. Granted, all I had was a 386 in those days, but it took a tremendous amount of time to do FFTs. It was fun, though. Good luck.

##### Share on other sites
It takes about the same amount of time to do my FFT, as it does to do make the floating point copy!

The FFT usually only does 1024 points at a time, 2048 tops, often 512 as well (though other powers of two are options, upto 32768).

The next thing that takes forever is graphing the result, but it could skip frames and no one would ever know the difference.

...
My FFT''s quick because I cheated. Usually FFTs are designed for embedded applications where RAM is still lots of \$ for some reason. So my FFT does an out-of-place BRS (bit reverse sort), which saves a minute amount of power (no swaps, just copies), but requires twice the amount of storage (2k oh no!). The big savings comes from pre-calculating ALL the coefficients (takes a meg or two of RAM... alot more for higher order FFTs), which is generally not an option on an embedded device. Hence, there is no trig in any of the FFT loops, just a simple dereference.

...
Stoffel: No, that was a typo, it''s supposed to be m_pvBuffer for both converts. It currently takes me about 125MHz a channel, I was hoping to get down to about 60. Right now my blitting routines are rather inefficent, but I better idea how to fix them up, than the itof.

...
NickB
I might be able to use 128bit fixed point math... The results vary from about 1e6 to 1e-6, and are displayed on a logrithmic graph, I guess that would only require 40bits... with 128 I could use 64 bits for both sides of the decimal...
Is there a signed 128bit integer type? Then the conversion would just be a xor, load, & shift!
Do you have to do anything special other than shift the bits?

##### Share on other sites
Floating point numbers are stored in a particular format - some bytes are dedicated to significant binary figures, and then some more bytes are dedicated to how far left or right the number is shifted... how far the binary equivalent of the decimal point needs to move. To the best of my knowledge, the first 2/3 (rounded to the nearest byte if need be) of the bytes of all floating point types are the significant figures, and the remainder are the shifts. Theoretically, you could put the integer directly into the significant figures, and set the shift to zero. This would be relatively fast, but I''ve never tried it. Do some experiments on floating-point numbers - looking at the hex storage of them, determine the format, and do what you can.

I hope that helps.

##### Share on other sites
Intel uses the IEEE {something or rather} standard, as most processor architectures do, for floating point numbers. if you want to convert 2-byte words to 32-bit floats the assembler code is.

{
short s;
float f;

s = 10;
__asm {
fild s
fstp f
}
// f {approx.} = 10.0
}

which is the same as "f = (float)s";

you can try other methods of conversion, but it probably won''t be faster than the Intel FPU. if you are doing the same type of calculations on a lot of numbers, you can use the Streaming SIMD functions { Pentium III } or as mentioned above MMX. those are most likely, the fastest methods.

##### Share on other sites
Something else you might consider is that memory access is slow. I guess your converting the data into a two float arrays in one routine and then converting them into WORDs in anyother. If you merged the two rouines into one it would save a LOT of memory accesses. But you''d cleanness of having your convertion routines seperate from your hardware dependent D-Sound routines.

Another good point of merging the routines (especially if you''ve got a good compiler) is that result from your convertion would (probably) be in a CPU register, and could therefor be converted to a float without reading memory == more speed.

for all elements
{
fft convert,
write to memory
}
for all elements
{
convert to word,
write
}

You would have:
for all elements
{
fft convert,
float convert,
write
}

##### Share on other sites
<takes the bong away from Beer Hunter>
That''s how fixed point numbers work, not floating point!

...
jenova:
Can you rep the fild & fstp ops?

...
Beelzebub:
that''s not exactly what happens, more like this:
SoundCard->Words
Words->Float
Float->FFT->more floats
log(floats)->DWords
DWords->DirectDrawBuffer

...
Anyone, are there multiple & division ops for MMX regs?

##### Share on other sites
i don''t think you can "rep" fpu instructions. almost positive. yes there are multiply "mmx" instructions, and there are mulatiply and add "mmx" instructions. there are no "division" instructions. not sure about streaming SIMD.

##### Share on other sites
Hi Magmai,

Sorry I''ve only just got round to looking at this thread again. You say that you''ve got numbers in the range of -1e6 to 1e6, ie between + and - 1 million. I recon that with that range you should be able to use only a 32bit number, with 1st bit as the sign bit, th 2nd to the 22nd bit representing the integer part and the remaining 11 bits representing the non-integer part:
ie organised like so:
siiiiiiiiiiiiiiiiiiiinnnnnnnnnnn

where s=sign bit, i=integer bits, n=non-integer parts

converting from 16 hit is easy - just load into the registers, shift left 16 bits, then shift arithmetical right 4 bits (I think) which you can do 2 at a time.

wrt the divides - what do you need these for ? if it''s just for converting back to 16bit (or even 32 bit), just use shifts. If you are dividing by a pre-calculated constant, you could always pre-calculate 1 divided by the constant, then at run time multiply by this number.

hope this helps a bit,

NickB