Sign in to follow this  
Gagyi

HDR sound?

Recommended Posts

I really think that there exists HDR sound, like HDR lighting, but i have not found any hardware or API that supports it. Do you know any? Or do i have to do it manually? If so, how (with directsound preferably)?

Share this post


Link to post
Share on other sites
Sound on a CD uses 16-bit samples (at ~44100 samples per second).

Single precision HDR imaging uses 16-bit colour channels.

Therefore consumer-grade sound was of high range long before consumer-grade visualization.

Share this post


Link to post
Share on other sites
Actually, most sound cards sold today can output at least 24-bit 96kHz signal. AFAIK professional sound cards can even use 32-bit floating point; professional sound mixing applications generally support this format for its convenience regarding filter implementation and flexible range. However, most real-time mixers do not need higher bit-depths than 24 because it is difficult or impossible for the listener to notice any improvements beyond that, and the sound author has already done the pre-mixing which does require higher precision.

Some sound APIs (such as DirectSound) allow you to configure the sample rate and bit depth programmatically. In creating your sound buffers, set the members of the associated DSBUFFERDESC structure appropriately to request buffers with desired sample rates and bit depths.

Share this post


Link to post
Share on other sites
Yes, i know its 16-24 bit, i do use 16 bit sound, but i was talking about automatic dynamic volume, sorry.
So is there some way to automatically adjust the sound volume depending on the current enviroment (louder where there are only birds singing, and softer when a bomb explodes near you), so you will always hear footsteps and wont get deaf when loud sounds come in.

Share this post


Link to post
Share on other sites
HDR in graphics is not automatic either; the programmer generally finds black, midpoint and white values from the high-range color data and adjusts the final brightness transfer function accordingly.

It is very common in games to set the master sound pressure according to the loudest possible sounds in a given scene. This is equivalent to evaluating the white point in graphics and mapping the maximum scene brightness to correspond with that of the display system.

Share this post


Link to post
Share on other sites
And how do i do it? Simply lock the primrary buffer, read out data and adjust the volume? This is pretty much straightforward, but is it efficient enough?

Share this post


Link to post
Share on other sites
Generally, you know in advance how loud your samples are. In a game scenario, you should store your samples in normalized (full) volume but set the playback volume of the samples as properties of your "actors".

If you want to adjust the general sound volume in a 3d environment, just sort your sound-generating objects by (1/distance*maximum loudness) to find the maximum audible volume and use it to multiply the master volume. You don't even have to actually sort anything, just take the maximum value of the above equation across your objects.

This is equivalent to a simple white-point compensation in graphics.

Share this post


Link to post
Share on other sites
Gagyi, if I understand your original post correctly, this is an idea that I've had as well. In real life the sound of footsteps (for example) is much, much quieter than the sound of a gun firing. In a game, if you put these two sounds at their correct relative volumes, and made sure that the gun sound wasn't overly loud (i.e., no clipping distortion), then you'd have to make the footsteps be so incredibly quiet that the user would never hear them. Instead, it makes sense to me to use a dynamic audio compression algorithm to increase the volume of the footsteps and other quiet sounds when there are no loud sounds present, but fade them back down to a low level when loud sounds are present. I don't know anything about DirectX, but I think all this would require is:
1) set the gains on all of your sound samples to realistic relative volums (if gun sounds are a gain of 1.0, then footsteps should use a much, much smaller gain)
2) on the mixed audio output, run a dynamic range compression algorithm (this is equivalent to your tone mapping in visual HDR)

http://en.wikipedia.org/wiki/Audio_level_compression

EDIT: Nik02's post above is a simpler way to do #2 above that doesn't require the overhead of running an algorithm on the mixed audio output

Share this post


Link to post
Share on other sites
Fire up Half-Life 2, drop a grenade next to your feet.

All sound quietens, except a high-pitched whine in your ears. You gradually regain you hearing after a few seconds.

This what you mean?

Share this post


Link to post
Share on other sites
Quote:
Original post by deadstar
Fire up Half-Life 2, drop a grenade next to your feet.

All sound quietens, except a high-pitched whine in your ears. You gradually regain you hearing after a few seconds.

This what you mean?


Yes, kinda, but a bit faster hearing regaining :)

Quote:

If you want to adjust the general sound volume in a 3d environment, just sort your sound-generating objects by (1/distance*maximum loudness) to find the maximum audible volume and use it to multiply the master volume...


I thought getting the primrary sound buffer's content (after the 3D and mixing stuff) somehow, and then adjusting the volume is easy, but this is just as easy and much faster i guess. Thanks for helping :)

BTW, isn't it 1/distance^2? or 1/distance^Rolloff factor?

Share this post


Link to post
Share on other sites
Quote:
Original post by taby
Sound on a CD uses 16-bit samples (at ~44100 samples per second).

Single precision HDR imaging uses 16-bit colour channels.

Therefore consumer-grade sound was of high range long before consumer-grade visualization.


You realize this is comparing apples and oranges, right? Sound needs to be higher than 8-bit just to sound decent. On the other hand, the eyes cannot distinguish between 8-bit color channels and 16-bit color channels (unless your monitor can actually produce brighter colors for 16-bit color channels; I don't think any monitors do).

16-bit color channels are actually more "comparable" (this is still apples and oranges) to 32-bit floating-point audio. 32-bit audio is used for generating and mixing the audio. but it still often gets converted to 16-bit in the end. So 32-bit audio is more of an intermediate representation... and the same applies to 16-bit color channels. They may be useful for certain kinds of rendering, but at the end of the process you still end up with 8 bits per channel (or something that would be indistinguishable from that).

- Kef

Share this post


Link to post
Share on other sites
Quote:
Original post by Gagyi

BTW, isn't it 1/distance^2? or 1/distance^Rolloff factor?



Sound pressure decreases linearly with distance, but sound intensity does indeed decrease with square of distance.

The actual volume multiplication factor should be adapted to your soundstage. Most computers have very insensitive, low-quality speakers at quite low volume, so you can't hope to achieve strictly realistic sound pressures anyway.

For example, a bullet firing in a FPS game would blow out most computer speakers should the sound be at the same intensity as a real bullet; yet, you need to hear whispers and footsteps in the same game at same volume level of the amplifier and with the same speakers.

Share this post


Link to post
Share on other sites
Quote:
Original post by Nik02
Quote:
Original post by Gagyi

BTW, isn't it 1/distance^2? or 1/distance^Rolloff factor?



Sound pressure decreases linearly with distance, but sound intensity does indeed decrease with square of distance.

The actual volume multiplication factor should be adapted to your soundstage. Most computers have very insensitive, low-quality speakers at quite low volume, so you can't hope to achieve strictly realistic sound pressures anyway.

For example, a bullet firing in a FPS game would blow out most computer speakers should the sound be at the same intensity as a real bullet; yet, you need to hear whispers and footsteps in the same game at same volume level of the amplifier and with the same speakers.


Interesting the parallels between sound and vision here: unless you have an expensive HDR screen, HDR imagery must be tonemapped to achieve similar effects on standard hardware - are there any games that run a volume-mapping system on the sound? I've played many games where loud explosions produce a ringing noise and your hearing slowly fades back, but I'm pretty sure it's hardcoded.

Share this post


Link to post
Share on other sites
Quote:
Original post by Pox

Interesting the parallels between sound and vision here: unless you have an expensive HDR screen, HDR imagery must be tonemapped to achieve similar effects on standard hardware - are there any games that run a volume-mapping system on the sound? I've played many games where loud explosions produce a ringing noise and your hearing slowly fades back, but I'm pretty sure it's hardcoded.


I believe this effect is implemented in almost all games as a simple scaling of the sound actor volumes.

It may be of interest to note that the new Microsoft sound API, XAudio 2, natively supports programmable filtering and mixing; this could be used to implement a more sophisticated response curve for "tone mapping" of sound effects.

Share this post


Link to post
Share on other sites
Quote:
Original post by Nik02
Quote:
Original post by Pox

Interesting the parallels between sound and vision here: unless you have an expensive HDR screen, HDR imagery must be tonemapped to achieve similar effects on standard hardware - are there any games that run a volume-mapping system on the sound? I've played many games where loud explosions produce a ringing noise and your hearing slowly fades back, but I'm pretty sure it's hardcoded.


I believe this effect is implemented in almost all games as a simple scaling of the sound actor volumes.

It may be of interest to note that the new Microsoft sound API, XAudio 2, natively supports programmable filtering and mixing; this could be used to implement a more sophisticated response curve for "tone mapping" of sound effects.


Scaling? How do you mean? Obviously most games have volume scaling based on distance and in some cases the mediums it travels though, but what I'm talking about is something that takes gradual effect over time like tonemapping - loud noises having an effect on other sounds for extended periods, etc. Don't know if there's any point in my posts, but yah.

Share this post


Link to post
Share on other sites
Quote:
Original post by Pox
Quote:
Original post by Nik02
Quote:
Original post by Pox

Interesting the parallels between sound and vision here: unless you have an expensive HDR screen, HDR imagery must be tonemapped to achieve similar effects on standard hardware - are there any games that run a volume-mapping system on the sound? I've played many games where loud explosions produce a ringing noise and your hearing slowly fades back, but I'm pretty sure it's hardcoded.


I believe this effect is implemented in almost all games as a simple scaling of the sound actor volumes.

It may be of interest to note that the new Microsoft sound API, XAudio 2, natively supports programmable filtering and mixing; this could be used to implement a more sophisticated response curve for "tone mapping" of sound effects.


Scaling? How do you mean? Obviously most games have volume scaling based on distance and in some cases the mediums it travels though, but what I'm talking about is something that takes gradual effect over time like tonemapping - loud noises having an effect on other sounds for extended periods, etc. Don't know if there's any point in my posts, but yah.


It is fairly trivial to implement a peak volume tracking system yourself; when a loud sound is played, scale the volume of all samples (except the "ringing noise" effect) down and, over a small amount of time, scale them back up to their original levels.

Incidentally, this is also analogous of how the HDR tonemapping works in graphics. The level correction is usually done across several seconds instead of instantly, in order to emulate the behavior of our eyes.

Share this post


Link to post
Share on other sites
Yes, indeed, but HOW do I detect the peak? Because i have to trace the mixed primrary buffer, wich is not cheap.

Share this post


Link to post
Share on other sites
Quote:
Original post by Gagyi
Yes, indeed, but HOW do I detect the peak? Because i have to trace the mixed primrary buffer, wich is not cheap.


If your samples are normalized (and I highly recommend this), then all your samples have somewhat same peak levels. The classification then becomes as simple as evaluating the general loudness levels of your sound-emitting objects.

There is really no automatic way to do this, but the method I outline here is time-tested and robust.

Unless you have an extremely exotic application (for example, you write a real-time mixer), it is heavily not recommended to muck about with the primary sound buffer. Locking sound resources is always a bad idea if you don't actually need to do it. Loading the sounds in the first place of course requires a lock.

Share this post


Link to post
Share on other sites
There are many ways to have all the sounds ever at the same loudness during the run of the game...

DON'T lock the primary buffer...instead write a simple DSP object that processes audio data in real time and attach it to the primary buffer. DirectSound uses DMO (DirectX Media Objects) that can be attached to its buffers.
XAudio2 (I reccomend this over DirectSound for many reasons) uses XAPO, the API to write XAudio2 DSP.
XAudio2 is the new DirectX audio API that should replace DirectSound.
The DSP code is very simple: get the RMS of the audio data and change its gain so that it has always the same RMS value.

Share this post


Link to post
Share on other sites
32 bit audio makes no sense. A perfect example of marketing overriding whatengineering and human physiology tells us. It's useful internally for maintaining precision when doing signal processing on the audio, but for transmission it should be dithered back down to 24 bits, as that already exceeds the dynamic range of the ear (which is 120 dB, whereas 24 bit gives 144 dB).

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this