Audio volume as a floating point value?

Started by
15 comments, last by Extrarius 19 years, 7 months ago
I'm just wondering about something: audio (i.e. waves) has their volumes stored as an integer, right? Wouldn't it be possible and beneficial to instead use floating point numbers? Unless there's some hardware restriction, I see no reason not to as it could easily remove a lot of distortion that comes from extremely loud sounds. ...or are floating point volumes already being used?
Advertisement
distortion from extremely loud sounds results from clipping - nothing to do with rounding or precision.
An X-bit floating point number has less accuracy than an X-bit integer because it has a much larger range, so it is beneficial to use an integer.

The best accuracy would probably be to create some kind of non-linear quantization table to use on the samples. A simple logarithmic scale seems usefull since DB is a logarithmic scale.
"Walk not the trodden path, for it has borne it's burden." -John, Flying Monk
I can't say I'm an expert on the topic, but I believe floating point values are superior to integers as time-domain sound sample types.



AP: Distortion on loud sounds results from clipping, sure. I t's nothing to do with rounding or precision, true. But it's extremely hard to get clipping with floating point numbers, their maximum magnitudes are normally at least hundreds of times larger than what's being used.



Extrarius: An X bit floating point has less _average_ accuracy over the whole range. But for sounds, the greater the magnitude, the less accuracy matters, since the singal to noise ratio is on average higher for floating point numbers when looking from a logarithmic point of view.

The loudest sound a 16-bit sample can make is 2^15 times as large as the quietest. (= 3.27x10^4 = 45 db).

The loudest sound a 16-bit floating point number (6 exponent bits) can make is 2^42 times as large as the quietest. (= 4.39x10^12 = 126db).

At 0db (relative to peak integer volume):
Integer maximum error (relative) = ~3x10^-5. (45db)
FP maximum error (relative) = ~1.9x10^-3. (27db)

At -20db (relative to peak integer volume):
Integer maximum error (relative) = 3.1x10-3 (25db)
FP maximum error (relative) = ~1.9x10^-3. (27db)

At -45db (relative to peak integer volume):
Integer maximum error (relative) = 1. (0db)
FP maximum error (relative) = ~1.9x10^-3. (27db)



The maximum error is sort-of the signal-to-noise ratio, so higher is better. For a 20db s/n ratio, think of a listening to a sound at 60db while 40db static is playing in the background.

And these margins are more in favour with greater bit widths.

Many speakers can put out sounds much louder than 45db above the human limit of hearing.



SAMPLES: Most game sound samples are normalised, but a significant proportion of them have sections where the maximum volume is a very small fraction of the overall maximum volume of the sample. (Eg a weapon firing / reloading sequence, or a mechanical sound with a very loud clang at the end).

INTERNAL: The internal sound channels are likely to blend lots of high-volume and low-volume sounds. During segments of quietness you would prefer not to have a significant decrease in s/n ratio. Floating point samples will be more important here.

OUTPUT: Speakers are also able to be more precise at lower volumes, so floating point DAC is more suitable from the speakers point of view.


Extrarius: floating point is a non-linear scale, as close to the logarithmic scale as you can get while still maintaining an acceptable degree of operations per second.


BTW: I'm an avid supporter of integers and think floating point numbers should really only be used in rare circumstances, but this is one of them. (The other big one being color intensity). I think using floating point numbers for model/world co-ordinates should be considered a big no-no, but everyone seems to do it.

The main advantage of using integers in sound is the speed of calculations, and a floating point software sound soultion is going to require more CPU grunt, which is not necessarily insignificant.
Not to mention that sound cards won't handle floating point values - hence a 16 bit sound card has an amplitude range of 65536 values, and an 8 bit card has a mere 256.
Unless you're talking about redesigning the way sound cards currently operate.

[Website] [+++ Divide By Cucumber Error. Please Reinstall Universe And Reboot +++]

Er, some cards do. Just not cheap ones. My friend was looking at importing (NZ) a card from somewhere that was 24-bit floating point. A quick search on google showed some professional cards with float, but nothing in the consumer range that I could see.

Audigy 2 has 24 bit. I can't see whether that's int or float though.

But good point, if your output is 16-bit integer I suppose there's not much you can do with the greater flexibility.

Another point: some (most?) sound compressers use logarithmic amplitude data.
Frankly, I can't hear much difference between 16 and 24 bit - well; 24 sounds a little crisper, but not much to me.
I was reading this topic in view of game development, (what with this site being called GameDev.net and all) and so ignored the high-end floating point cards, and was thinking of your bog-standard card in your home PC.

[Website] [+++ Divide By Cucumber Error. Please Reinstall Universe And Reboot +++]

This page goes over the basics of DAC (digital to analog conversion).

But all you really need is one bit anyway.
Quote:Original post by Krylloan
BTW: I'm an avid supporter of integers and think floating point numbers should really only be used in rare circumstances, but this is one of them.

Can you tell us why? The output waveform is necessarily filtered, so the output waveform should be smoothed. At the kind of output sample rates you'd get for high fidelity audio, I really doubt you need a voltage level between two existing levels. Also remember that your ears are sensitive to the change in output level, not the level itself, so it's fundamentally different than color. Constant 0V output sounds exactly the same as constant -1V and +1V--silence. Sound is only interesting when it is transitioning between voltage levels, so if the waveform's always moving then you're only going to get very small gainst as you move from 16-bit to above.

Quote:
The main advantage of using integers in sound is the speed of calculations, and a floating point software sound soultion is going to require more CPU grunt, which is not necessarily insignificant.


I think this argument puts the cart before the horse. ADCs have historically had linear levels, therefore fixed point is fine.

Also, if you move to a mantissa + exponent method, you've automatically made your ADC curve non-linear. It actually might be needed (I think there might be some non-linearity already in the system, but now I'm confusing myself if that were really ADCs for audio or for communications; it's been a while). But if you're going to do that, a non-linear ADC is probably fine. (There's a term for this but I'm blanking on it now).

EDIT: I've been saying ADC when I mean DAC. Is it Friday? Why yes, yes it is.
FWIW, Csound does its calculations internally with floating-point numbers, then outputs integer PCM data. It does seem to make things easier. E.g., if you want to compose your sound out of a thousand little voices each with a peak amplitude of 3 (assume 16-bit sound), the quantization might not matter most of the time statistically (law of averages) but it could get really nasty if sounds synchronize in certain ways.

Of course, it could probably all be done in fixed-point too, but I don't suppose it would be much benefit. (There's also a "compile-time" flag to use either float or double values globally when rendering your sound.)

This topic is closed to new replies.

Advertisement