Mixing Algorithm

Math and Physics Programming

Started by Grain February 04, 2009 05:38 PM

15 comments, last by LorenzoGatti 15 years, 2 months ago

500

Author

February 04, 2009 05:38 PM

For certain reasons I need to mix sound streams in software. Currently I am just averaging them together: (snd1 + snd2 + ... sndN) / N This kind of works but Its not correct if all sounds streams are silent at a particular moment with the exception on snd1 then the final output is only 1/N of snd1. Which it really shouldn't be if you think about it logically. I tried simply adding them but with more than 2 streams it quickly reaches the max values and clamps, which seriously distorts the sound. Should I add the Log() of each stream or multiply them all together and take the Nth root? I'm just shooting in the dark at this point. I need to know the proper way to do it. How does the Sound card do in hardware? Ideally I'd like to emulate that.

DekuTree64

1,170

February 04, 2009 06:45 PM

Yes, the basic theory is just to add up all the sounds. The simplest way to reduce clipping is to divide by some constant, to make everything quieter so it will clip less when it gets loud. You're doing this one already, but you don't necessarily have to divide by the number of channels. Chances are you won't have all channels playing most of the time, and even if you do, they'll cancel eachother out pretty often. So you can usually get away with at least N/2 as the divider, and let it clip a little bit when things get very loud.

Another technique that I've seen on some hardware is to use a semi-non-linear scale on the final output. It let you set a threshold value and a scale value, and if the output sample was above the threshold, scaled it down by the value before clipping. So everything below the threshold was unaffected, and everything above the threshold was actually still linear, but just increased more slowly so as to squeeze a bit more range into the upper sample values.

You could use an actual nonlinear scale, such as logarithm, on the output before clipping. But that will affect the sound of things at lower volumes too. I prefer only altering it if things get very loud, so it will be less noticable.

Another method is to adjust the divider dynamically, to boost the volume when things are quiet, and drop it when loud. Usually you want to adjust it over a short period of time and not instantly, so it's less noticable... but there could also be some clipping during that time if there's a sudden loud sound, like an explosion. I haven't actually used this technique before, so I don't know if it really sounds good, or how to effectively tweak it.

yahastu

154

February 04, 2009 08:02 PM

Instead of averaging the volume at each instant, take the max volume across all sound channels.

Grain

500

Author

February 04, 2009 10:08 PM

Quote:Original post by yahastu
Instead of averaging the volume at each instant, take the max volume across all sound channels.

Take it and do what with it?

Rockoon1

104

February 05, 2009 04:14 AM

Summing and then scaling by a constant value (< 1.0) is correct, even if a bunch of channels are "silent"

Using 1/n means that you will NEVER clip, so it is often a very good choice prior to playback (If you need more sample precision, changing this coefficient isnt the proper way to get it)

During playback you simply let the user alter the amplitude as desired, where the "100%" coefficient could be equal to n.

Think about it in terms of a real world piano. Each string inside the piano is independent of the others. C2 vibrating doesnt effect the amplitude of F#2's vibrations, and so forth.

Now, with computers, we have the problem that summing things up could sum to outside of our sample precision, which is generally a bad thing. This is magnified by the fact that when someone records something, usualy they want to give that recording the full benefits of the sample precision in use (ie: they make it just loud enough to not blow out the sample precision) .. that recorded pin drop is standardized to be much louder than it should be in relation to other recordings.

...and since most programs play back this audio data that is also just at the limits, unmodulated, the users tend to have their global volume settings set such that anything that ISNT at these limits is too damn quiet. So here we are, riding the edge of the limits, and there is very little we can do about it but to compensate for these facts trying to ride the edge ourselves.

Now, idealy we would be able to say that if the global volume is (A) and the sample bounces around +/-(X) that the decibal level just outside the speaker is a computable quantity of (D) decibals, such that we could allow the user to configure his audio output for a desired decibal level.. but things simply arent ideal and will never be.

Grain

500

Author

February 05, 2009 08:16 AM

Quote:Original post by Rockoon1
Think about it in terms of a real world piano. Each string inside the piano is independent of the others. C2 vibrating doesn't effect the amplitude of F#2's vibrations, and so forth.

And by that same logic, the string's amplitude will be the same regardless of the number of other strings in the piano.

I realize the computer clipping problem, however there must be a way around it. I know its possible because the sound card does it all the time. If you have several programs that all play sound at or near the saturation point they will still overlap and somehow not clip and distort. For example if you play a youtube video, MP3s in media player, and a game all at the same time they blend together nicely without clipping. This is what I want to be able to do.

The only other operation I can think of other than the ones mentioned above is to simply overlay waveforms using which ever sample is greatest at any given time. But I don't think that is right either.

alvaro

21,604

February 05, 2009 10:11 AM

If you run several sounds near saturation, I am sure the combined sound will have clipping distortions. You can handle situations where the sum is too large by saturating instead of overflowing, and that might make things a bit less catastrophic, but if you want sounds to be added together, you should try to adjust each sound's amplitude much lower than saturation.

Rockoon1

104

February 05, 2009 09:04 PM

Quote:Original post by Grain
I realize the computer clipping problem, however there must be a way around it. I know its possible because the sound card does it all the time. If you have several programs that all play sound at or near the saturation point they will still overlap and somehow not clip and distort.

This is incorrect. Feel free to record from the 'Stereo Mix' or 'What U Hear' sound source and plot the waveform while multiple programs are playing. There will be tons of (now visible) clipping.

The difference yopu are experiencing, most likely, between this case and the case of an incomplete software mixer...

...is that the sound cards mixer drivers will saturate the audio data at -32768 or +32767 when the data overflows, rather than allow it to 2's-complement overflow ala C integer math (which creates wild sign changes.)

see: http://en.wikipedia.org/wiki/Saturation_arithmetic

Quote:Original post by Grain
The only other operation I can think of other than the ones mentioned above is to simply overlay waveforms using which ever sample is greatest at any given time. But I don't think that is right either.

Its very wrong.

Aressera

3,144

February 05, 2009 11:56 PM

program a compressor/limiter?

can't be that difficult. Just find the RMS average of the samples over a particular interval, if it is over a certain level, scale the output by some ratio. You can add other features like a smooth transition from scaling to 1:1 output (soft knee), attack and release times, etc.

Also, I'd recommend doing everything with floats if you can. Most professional DSPs use floats internally so that they don't have to deal with integer overflow and can then clip to a certain range (usually +/- 1) if the output needs to be converted to some integral bit depth.

Aressera

3,144

February 06, 2009 12:04 AM

Another option would be to program a gain "rider" which keeps a running RMS average of last N input samples and adjusts the output to correspond to a certain optimal value. The output might still clip in this model, but if your rider window was small enough (~10 ms) you probably wouldn't notice.

This could also be paired with a compressor/limiter (rider first in signal chain) in order to guarantee that there will be no clipping and that the limited output will not sound wonky like comps/limiters pushed too hard.

Mixing Algorithm

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Mixing Algorithm

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines