Sign in to follow this  
Tispe

{XAudio2 2.7] - Source Voice questions

Recommended Posts

Tispe    1468

Hello

 

I am interested in using XAudio2 for my DX9 game. And I can't find if there are any limits to how many Source Voices one can create in XAudio2.

 

I am tempted to create thousands, one for each sound in the game and warp both the voice and the data in a single Sound Object. That way I can play the object when I need to just by calling its "Play" method. Is this good/bad?

 

 

Also, if I have thousands of these sound objects, can all of them be submitted at creation, or must this be done right before you Start the voice?

//Once at init, or before every Start call?
pSourceVoice->SubmitSourceBuffer( &buffer );
void MySoundObjectClass::Play(){
//pSourceVoice->SubmitSourceBuffer( &buffer ); needed?
pSourceVoice->Start( 0 );
}

Share this post


Link to post
Share on other sites
Tispe    1468

It occurred to me that I might want to have one Source Voice for each entity that can emmit sound, and have the audio data in a separate place. This way many things can play from the same data. But how many Source Voices can I have? 10.000 too many? Must 10.000 entities share 50 Source Voices for instance?

Share this post


Link to post
Share on other sites
Icebone1000    1958

I think you must think like this "how many simultaneous sound I want be able to play ?" -> number of voices

 

more specific "how many of this type of sound I want to be able to play simultaneously ?", since a voice needs its wave information..I dont understand it very deeply, I never had problems creating voices with the wave data of a single wave file and using it for all others.

 

On my engine the sound stuff is really raw, but what I did is create a "audiotracks", a track is able to play a single audio at time, but can play any audio you pass in. So, in my engine, I create something like audioTracks<12> m_tracks, it works like a circular buffer, means if I pass the 13th sound and the first is still playing it will interrupt it out and start the new one, so an optimal number would be one large enough to never let interrupts happen..( you could do something like detect if the first one is still playing and increase the number of tracks I guess..but note that if you have a really high number of sounds playing at the same time, an interrupt wouldnt be notices as everything would be noise any way, so having a reasonable limit is comom sense).

 

Dont take my opinion as granted, Im not very experienced with audio.

Edited by Icebone1000

Share this post


Link to post
Share on other sites
Burnt_Fyr    1665

I think the limit of sources would very much be dependent on the hardware available. As Ice bone mentioned, source voices should be used for polyphony, not for how many sounds you want to be able to play total. I have broken my audio system down before, but I'll reiterate again. My mixer class works like a hardware version, I create a number of source voices, a number of mix voices, and a number of master voices. I tend to think in hardware, so my source voices are akin to the number of simultaneous tracks. The submix voices are like mixdown tracks, where i can take for instance, all the UI sounds and mix together, giving a single fader to control the UI levels. the submix voices are all connected to a number of master tracks, for stereo, 5.1 or what ever mixing. The next piece of the puzzler is my sampler class, which holds sound buffers after loading. It is akin to a midi sampler, allowing sounds to be cached. When I want to play something, I can grab a random track, from a pool as icebone mentioned, or I can put it to a specific track.

 

TL,DR: Use enough source voice to ensure that sounds are not clipped or delayed, based on the maximum polyphony you expect in game. 32-64 should be more than enough for most situations.

Share this post


Link to post
Share on other sites
Tispe    1468

So Source Voices are better not being placed in the same object as the entities? They should be inside the audio class where you can submit a "PlayThisTrack(TrackID, pEntity);" request such that a pool of available source voices can pick it up and begin playing that sound?

 

What if I want to move the sound arround in 3D space? Then I need to tie the current source voice to the position of the entity, that means a pointer from the source voice to the entity to get that 3D position and update the voice location each frame. Is this the way to go?

Share this post


Link to post
Share on other sites
Burnt_Fyr    1665

While I've not got to deep into positional audio, I would essentially use X3dAudio with my existing setup. When rendering positional audio, set your listener and emitter, calculate the dsp settings, and then use these on the source voice, before submitting buffers.

 

http://msdn.microsoft.com/en-us/library/windows/desktop/ee415798%28v=vs.85%29.aspx

Share this post


Link to post
Share on other sites
Tispe    1468

The IXAudio2::CreateSourceVoice method requires the Wave format of any buffer that wants to play on it. Such that a Source Voice can only play buffers that match the format it was created with.

 

With this compability issue between buffers and voices, one must make sure to create enough voices to accomodate enough buffers with varying formats. This translates to many pools of voices, one pool for each wave format. This can quickly grow to a ton of source voices.

 

 

I have spent some time thinking about how this can be simplified in code. And one option I have come up with is to create a new Source Voice every time a sound is to be played. I don't know if there are any performance issues of creating potentially upto 100 source voices each second and submitting buffers to them for play but if it can be done I would much prefer it. If I can do this then the format issue and pool management goes away.

 

One issue to think of is garbage collection. After some time thousands of Source Voices has been created and are unused. So there might have to be a callback or some destructor that takes care of it.

 

Is this something worth considering?

Share this post


Link to post
Share on other sites
Burnt_Fyr    1665

I just make sure audio is in a format compatible with the buffers ahead of time. Most data will be in a CD standard format, either mono or stereo, and 16bit 44.1khz. Data not in that format can be converted by your tool chain.

 

My pooled tracks are usually mono for sound effects, and I make a few stereo tracks for background music. There is really no need for anything more than this. if your sound effects are stereo, they already have positional cues embedded into the sound itself, such as mic placement.

Share this post


Link to post
Share on other sites
Tispe    1468

The X3DAudio API really bothers me. It wants to be initialized just so it can calculate a volume matrix array to be applied to a voice. Is there a X3DAudioCalculate() fuction that does not need a X3DInstance as a parameter? Do I need to clean up the X3DInstance when I am finished with it?

 

It seems like to get the illusion of 3D sound, X3DAudio only adjust volume levels on each channel, but does not delay the sound for each channel. Is this true?

 

After the IXAudio2Voice::SetOutputMatrix method is called do I need to keep the pLevelMatrix array arround until the sound is finished playing, or can I deallocate it immediately?

Share this post


Link to post
Share on other sites
Burnt_Fyr    1665

x3daudio needs to be initialized so that it's outputs will match the speaker configuration you are using, and to match the scale of units to your application. In general yes, you need to release com interfaces for objects.

 

For true to life stereo you would need 2 listeners, set as far apart as the characters ears are. This is the technique that was used for the voices in Pixar's Monsters Inc movie. (http://video.sina.com.cn/v/b/44064572-1604540395.html)

 

But that is overkill, IMHO. What is the delay heard between one ear and the next? 340.29 m / s is the velocity of sound at sea leavel, your head is roughly 0.2 meters across, we are looking at less than a millisecond delay between ears. I read somewhere long ago that humans cannot in general discern delays lower than about 9ms. A much better clue to positional audio is the doppler effect, and the filtering effect cause by the shape of the ears. All of these, however, can be calculated by setting flags in your call to X3dAudioCalculate. The Delay function only works with stereo speaker setups however, as we humans are binaural beasts after all.

 

Not having my codebase with me at the moment, I would assume that your level matrix can be cleaned up right away. According to M$

 

IXAudio2Voice::GetOutputMatrix always returns the levels most recently set by IXAudio2Voice::SetOutputMatrix. However, they may not actually be in effect yet: they only take effect the next time the audio engine runs after the IXAudio2Voice::SetOutputMatrix call (or after the corresponding IXAudio2::CommitChanges call, if IXAudio2Voice::SetOutputMatrix was called with a deferred operation ID).

 

So once SetOutputMatrix has been called, the voice has saved a copy internally, even if it has not been applied to the hardware yet.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this