Audio volume as a floating point value?

Started by
15 comments, last by Extrarius 19 years, 8 months ago
having last worked at an audio console manufacturer, I can tell you that their products are all using DSP processors to manipulate audio, and the current generation they use is all 40 bit floating point SHARK processors.

24 bit integer is better than 16 bit for the increased range AND/OR precision, 16 bit floating point has greater range, but lower precision, 24 bit floating point or higher has much much greater range than 16 bit int (of course), and still has greater precision as well.

I would rather have 24 bit int vs. 16 bit float for all FINAL storage values - but float makes for a much better intermediate calulation format.
Advertisement
Quote:Original post by Xai
I would rather have 24 bit int vs. 16 bit float for all FINAL storage values - but float makes for a much better intermediate calulation format.

I can buy that. I work in low-power land, but I suppose if I had all the juice in the world I'd want 64-bit floating point precision for my internal calcs.

Can you _really_ tell the difference between 24-bit and 16-bit DAC? Like double-blind test certain?
Krylloan: I agree with what you said, except that you missed out on the last comment I made: The integers don't have to represent a linear scale. You could use a spline-based algorithm of some sort to get exactly the level of accuracy and range you want without having to worry about how to get an extra tenth of a bit for the exponent. You could store the spline as part of the sample (if you're using files, or possibly in the setup phase for streams) and have exactly the correct precision in each part of any samples =-)

Stoffel: Only 64 bits? Why not go for 128, and then you can used fixed point to get more accuracy than you want as well as a larger range =-) Hell, if I had 'all the juice[computing power] in the world' I'd probably go quite a bit higher
"Walk not the trodden path, for it has borne it's burden." -John, Flying Monk
Extrarius: When talking of integers here, I think most of us are referring to values whose _final_ representation is a linear scale, (eg constant voltage steps between values in output). You could classify floats as integers that


Quote: Krylloan: BTW: I'm an avid supporter of integers and think floating point numbers should really only be used in rare circumstances, but this is one of them.
Quote: Stoffel: Can you tell us why.


Because precision is more important at lower overall volumes. In a game where gunshots can reach, say 100db on a decent speaker system, you don't want to have a minimum volume of 55db, because quiet sounds like footsteps will never get rendered.

I think it is important to maintain a decent S/N ratio right through the entire available audible amplitude range, especially in games as the ratio of maximum / minimum(audible) amplitudes is likely to be typically a lot higher than that of music or communications.

Most sound environments are centered about 0 (no DC component because there is typically no constant velocity air current, and it is undesirable to have high motion air around a recording device (you've probably all heard recorded wind noise on microphones that isn't really audible to humans at the time of recording, since ears neutralise this.) So floating point data types should have their highest precision around the center of most sounds, which is good.

Change in volume is what we hear, and a sound sample recorded this way with very small negative feedback (to prevent DC buildup) is likely to be superior to a direct voltage-level sample until you start considering frequencies close to the sampling frequency, but unfortunaltely nobody uses this except maybe in compressed sound.
Er, I think my DB measurements may have been incorrect since I was only multiplying by 10 (not 20, which is required for audio pressures). So you might want to double them. This changes some thiggs, sorry.
by the way, the concept of INTEGER isn't some made up thing to do with C data types, we are talking about the math that would happen to properly do various math operations ...

an "integer" on a "logrithmic scale" isn't an integer for most usefull purposes. Sure, you can add 6db to 36db as an int, but then you get 42db, which is NOT what 6db added to 36db sounds like.

To add logrithmic scaled numbers required more complex math, that is NOT built into computer chip, and therefore very slow by comparison to int and float operations (if computers had such circuits in them, languages like C would have a "log" data type :).

For example, with sound, adding 36db + 36db yields 39db (because on the db scale 3db == double the sound pressure level). So what people do in computers is convert the sound level (in db) into a linear scale FIRST, and store that as an int ...

for instance a table for a linear conversion might be sipmly this (although you may multiply these values by any factor you prefer):

3db == 30
6db == 60
9db == 120
12db == 240
15db == 480
...
27db == 7680
30db == 15360
33db == 30720
36db == 61440

so you see, a 16bit int would already be exhausted at 37db if we used a precision of 30 for the first 3db. we could of course have used only a precision of about 8, and made it to 42db, but no matter what precision we use, we cannot simultaneous get more than a few numbers in the first 10db and still have enough range to reach 60db of distance.

Of course, 32 integers pretty much fix this, as then you have 4 billion as your max ... so using the same starting scale we can reach about 84 db of FULL PRECISION dynamic range (which is actually pretty amazing). But believe it or not, using this scale for internal computations of DSP systems which provide sound processing such as reverb, compression, gain, etc, can very easily cause clipping and banding.

Hence the reason floating point numbers are prefered, the idea of the benifit of floating point is this - the amount of precision for any given sample is fixed based on the magnitude of the sample, not a predetermined fixed precision. So a quite (16 bit float) sound has say 11 binary digits of precision, and an exponent that is low, and in the quite range. And then a loud sound has the same 11 bits of precision, but a higher exponent. Then, adding 2 loud sounds together is very accurate, adding 2 quite sounds is very accurate, and it is only when you add a quite and a loud sound together that you get a loss of detail (the quite sound has less effect on the loud sound than it should - but since the magnitude difference is high, no one will notice anyway - because the quite sound is expected to be washed out by the loud sound).

Now, this 16 bit float sound system would really suck, cause 11 bits precision for sound is NOT ENOUGH. 32 bit floating point is really great, because it has a 24 bit mantisa, giving the same precision as a 24 bit integer, while still having a sign and 7 bits of exponent too. A 24 bit float is a decent compromise, because it has as much precision as the 16 bit int it will eventually be converted into, while providing the ungodly dynamic range only a floating point number can yield.
My point of a 'logarithmic intergral scale' is one of storage type, just as floating point is. If you're talking about the math you'll be doing, then you should be talking about integers and fractions instead of integers and floating points (since floating point is a storage type - or at least I've never heard it used in math).
I should have been more clear, but my point was that you neednt use a traditional storage type (like a simple int or floating point number) in order to get good precision, and that using some kind of quantization table as is used in most audio/video/image/etc compression is probably a good idea since even with 'uncompressed' audio you're still losing information due to the limits of the representation type.
"Walk not the trodden path, for it has borne it's burden." -John, Flying Monk

This topic is closed to new replies.

Advertisement