[SlimDX/Xaudio2] Playing buffer from memory

Started by
6 comments, last by ArjunNair 14 years, 1 month ago
What I'm trying to do something simple, and indeed I even got it to work almost, but not quite, well enough... Here's the deal: I'm filling a sample buffer of size 882 floats, and every 20 milliseconds I convert this 882 floats to a byte buffer of size 882 * 4 * 2 (2 = number of channels) and submit it to Xaudio 2 play. As you can see, the sample rate is therefore 44100 (882 * 50 {50 times in one second}). In another thread (I presume Xaudio2 is running in a different thread?), I have a render loop that needs to be executed every 20 milliseconds or so, which is incidentally the rate at which I push the buffer to audio. So theoretically, I should be able to push the 20ms worth of audio data to xaudio2, wait for it to finish playing and then render. Repeat ad infinitum. Here's the code I've come up with:

public class SoundManager
    {
       
        const int BUFFER_COUNT = 1;
        private XAudio2 device;
        private MasteringVoice masteringVoice;
        private SourceVoice sourceVoice;
        private WaveFormat waveFormat;
        private int bytesPerSample;
        private AudioBuffer buffer;
        float[] sampleData = new float[882];
       
        byte[] bData = new byte[882 * 4 * 2];
        int currentBuffer = 0;
        int samplePos = 0;
        private bool isPlaying = false;

       public SoundManager(IntPtr handle, short BitsPerSample, short Channels, int SamplesPerSecond)
        {
            WaveFormat format = new SlimDX.Multimedia.WaveFormat();
            format.BitsPerSample = BitsPerSample; //32
            format.Channels = Channels; //2
            format.SamplesPerSecond = SamplesPerSecond; //44100
            format.BlockAlignment = (short)(format.Channels * format.BitsPerSample / 8);
            format.AverageBytesPerSecond = format.SamplesPerSecond * format.BlockAlignment;
           
            format.FormatTag = WaveFormatTag.Pcm;  
            device = new XAudio2();
            masteringVoice = new MasteringVoice(device);
            sourceVoice = new SourceVoice(device, format);
            sourceVoice.BufferEnd += new EventHandler<ContextEventArgs>(sourceVoice_BufferEnd);
            sourceVoice.StreamEnd += new EventHandler(sourceVoice_StreamEnd);
            sourceVoice.BufferStart += new EventHandler<ContextEventArgs>(sourceVoice_BufferStart);
            //bData = new byte[sampleData.Length * 4 * Channels];
            buffer = new AudioBuffer();
            buffer.AudioData = new System.IO.MemoryStream(); 
            
            sourceVoice.SubmitSourceBuffer(buffer);
            waveFormat = format;
            bytesPerSample = waveFormat.BitsPerSample / 8;
            
        }

       void sourceVoice_BufferStart(object sender, ContextEventArgs e)
       {
            isPlaying = true;
       }

       void sourceVoice_StreamEnd(object sender, EventArgs e)
       {
           //isPlaying = false;
       }

       void sourceVoice_BufferEnd(object sender, ContextEventArgs e)
       {
          isPlaying = false;
          //buffer.Dispose();
       }

       public void Play() { sourceVoice.Start(); }
       public void Stop() { sourceVoice.Stop(); }
  
       //soundOut is sampled data.
       public void AddSample(float soundOut)
       {
         
           if (sourceVoice.State.BuffersQueued < 1 )
           {
               if (samplePos < 882)
               {
                   sampleData[samplePos++] = soundOut;
                   return;
               }
              
               for (int i = 0, j = 0; j < samplePos; i += 8, j++)
               {
                   float data = (float)sampleData[j];
                   byte[] tmp = System.BitConverter.GetBytes(data);
                   
                   bData = tmp[0];
                   bData = tmp[<span class="cpp-number">1</span>];
                   bData = tmp[<span class="cpp-number">2</span>];
                   bData = tmp[<span class="cpp-number">3</span>];
                   bData = tmp[<span class="cpp-number">0</span>];
                   bData = tmp[<span class="cpp-number">1</span>];
                   bData = tmp[<span class="cpp-number">2</span>];
                   bData = tmp[<span class="cpp-number">3</span>];
               }
               
               buffer.AudioData.SetLength(<span class="cpp-number">0</span>);
               buffer.AudioData.Write(bData, <span class="cpp-number">0</span>, bData.Length);
               buffer.AudioData.Position = <span class="cpp-number">0</span>;
               
               buffer.AudioBytes = bData.Length;
               buffer.Flags = BufferFlags.EndOfStream; 
               sourceVoice.SubmitSourceBuffer(buffer);
               samplePos = <span class="cpp-number">0</span>;
               sampleData[samplePos++] = soundOut;
           }        
       }

       <span class="cpp-keyword">public</span> <span class="cpp-keyword">bool</span> FinishedPlaying()
       {
           <span class="cpp-keyword">return</span> !isPlaying;
       }

    }

</pre></div><!–ENDSCRIPT–>

In the main loop, I simply block till buffer finishes playing:
while (!sm.FinishedPlaying())
{
}

My expectation was that for 882*4*2 samples it would take 20ms to play, then set isPlaying to false in the BufferEnd event, and then my render loop would therefore be synchronized to ~50 frames per second.

Unfortunately, I find that the frame rate is much greater than 50 and both sound and video fairly whizz along! 

So then, could someone point out where I've gone wrong in my assumptions or in my code and how to fix it?
Advertisement
No one? Could someone at least point me out to an example that shows how the BufferEnd event works with streaming buffers?
One obvious thing: if you are passing float data you need to use the WaveFormatTag.IeeeFloat and not WaveFormatTag.Pcm. Also don't set EndOfStream on your buffer unless it is that last buffer submitted.

If I understand correctly form looking at the source SlimDX handles the AudioData property of the buffer differently depending on what it is. I have had the best luck using the SlimDX.DataStream (instead of MemoryStream).

When I do this I typically use two or three buffers in rotation. The buffer end event I just use to release a waithandle on another thread that will fill any and submit available buffers. There is a way that you can pass an ID of sorts through the ContextEventArgs (sorry, I don't have the details, my code is not available right now). You need to be careful not to touch any buffers that are in use and also to make sure that there is always some data that has been submitted to the voice.

hope this helps
turnpast, thanks for the suggestions. A bit of poking around later:

* Changing the WaveFormatTag from PCM to IeeeFloats didn't really change anything apart from making the sound slightly louder?

* Didn't succeed in creating a DataStream to use with the audio buffer. Kept throwing an exception. I must be creating it incorrectly but I'll be damned if I can find an example that uses DataStream for the AudioData. :)

In the end I did manage to synch video and audio by using multiple data buffers (not audio buffer of which I have only one and recycle it) and using the ContextEvent property to signal which buffer has finished playing and if it's the one I expect I let the main loop continue.

However, I can hear slight crackling and odd blips while the audio plays. Maybe using a DataStream will smooth that out too...
Quote:Original post by ArjunNair
* Changing the WaveFormatTag from PCM to IeeeFloats didn't really change anything apart from making the sound slightly louder?

Changing this should have a much more severe effect on the sound than just the volume (unless you are expecting to just hear strange noise).
When sending floating point PCM data make sure your values are in the -1 to +1 range -- exceeding the range can cause artifacts.
Quote:
* Didn't succeed in creating a DataStream to use with the audio buffer. Kept throwing an exception. I must be creating it incorrectly but I'll be damned if I can find an example that uses DataStream for the AudioData. :)

When I last looked at the slimdx source it appeared that if you did not use a data stream than an additional copy was made of all the data in the stream. It may not make any difference if you just working with small buffers.

Quote:
However, I can hear slight crackling and odd blips while the audio plays. Maybe using a DataStream will smooth that out too...

Lots of things can cause this stuff and converting to DataStream probably won't fix it. Review all your sample rates and sizes and the form of your data. If you have not already read through the unmanaged Xaudio2 docs here they kinda suck, but there is so little info about this stuff out there.

Quote:Original post by turnpast
Changing this should have a much more severe effect on the sound than just the volume (unless you are expecting to just hear strange noise).
When sending floating point PCM data make sure your values are in the -1 to +1 range -- exceeding the range can cause artifacts.


My samples are all either 1's or 0's anyway, so all are within the +1 range.
I actually haven't figured out how to do a non-float PCM to be honest. I tried using a pure 2 byte array to hold the samples instead of float but couldn't get a single squeak out of the speakers! Obviously there's more to it than just the backing store datatype! lol.

Quote:
When I last looked at the slimdx source it appeared that if you did not use a data stream than an additional copy was made of all the data in the stream. It may not make any difference if you just working with small buffers.


Thanks for the info. You might have just saved me from more bungling about in the dark! Also, my buffers are only 882 bytes worth as I mentioned, so I suppose I'll stick to MemoryStream then.

Quote:
Lots of things can cause this stuff and converting to DataStream probably won't fix it. Review all your sample rates and sizes and the form of your data. If you have not already read through the unmanaged Xaudio2 docs here they kinda suck, but there is so little info about this stuff out there.



The lack of proper docs has been most frustrating I tell you. A shame because SlimdX and XAudio2 are easily two of the most impressive, easy to use (the docs notwithstanding!) technologies I've seen of late. I should know because I'm a directX to slimDX convert. ;)

Also, shouldn't using the UserFilters parameter and choosing something other than None have *some* effect on my samples. Well, there is none whatsoever! The sound levels and clarity stay put. At least to my none-too-discerning ear...
Quote:Original post by ArjunNair
My samples are all either 1's or 0's anyway, so all are within the +1 range.

This could explain all the cracking and odd blips. I don't even want to think about your poor drivers and the seizures they must be having. :)

Quote:
I actually haven't figured out how to do a non-float PCM to be honest. I tried using a pure 2 byte array to hold the samples instead of float but couldn't get a single squeak out of the speakers!

If I recall correctly about PCM:
32 bit samples must be ieee floats in the -1 to +1 range
16 bit samples must be signed shorts in the range short.MinValue to short.MaxValue
8 bit samples must be unsigned bytes in the byte.MinValue to byte.MaxValue range.

So to convert from 32 to 16 you can probably just multiply you float value by short.MaxValue.

Quote:
Also, shouldn't using the UserFilters parameter and choosing something other than None have *some* effect on my samples. Well, there is none whatsoever! The sound levels and clarity stay put. At least to my none-too-discerning ear...


The filters you can apply to a voice are all frequency related (high-pass, low-pass etc) and I have never tried to use them. What do you expect to hear?
Quote:Original post by turnpast
If I recall correctly about PCM:
32 bit samples must be ieee floats in the -1 to +1 range
16 bit samples must be signed shorts in the range short.MinValue to short.MaxValue
8 bit samples must be unsigned bytes in the byte.MinValue to byte.MaxValue range.


Cheers! Managed to convert the data to 16 bit PCM's using shorts with the above bit of info and also smoothed out the inputs to be more discrete than just 1's or 0's. ;) Sound quality is much better now although I can still make out very very small "dips" in the sound here and there. Probably either not accurate enough sample conversion or buffer under-runs. Will have to investigate.


Quote:
The filters you can apply to a voice are all frequency related (high-pass, low-pass etc) and I have never tried to use them. What do you expect to hear?


Not sure. I was hoping they'll cut out some of the "noise" I was hearing in my samples. But it looks like it's my own sample conversion that's at fault rather than anything else.

This is the bit of code I've ended up with eventually:

  //soundOut is an averaged sample value in the range 0 to 1. public void AddSample(float soundOut)       {           if (samplePos >= SAMPLE_SIZE)           {               playBuffer = currentBuffer;               currentBuffer = (++currentBuffer) % BUFFER_COUNT;               samplePos = 0;               readyToPlay = true;               if (!isMute)                PlayBuffer();           }           sampleData[currentBuffer][samplePos++] =  soundOut;       }       public void PlayBuffer()       {           if (sourceVoice.State.BuffersQueued < BUFFER_COUNT)           {               for (int i = 0, j = 0; j < SAMPLE_SIZE; i += bytesPerSample, j++)               {                      short data = (short)((sampleData[playBuffer][j] - 0.5f) * 2 * (short.MaxValue - 1)); //Normalizes the sample data                                       byte[] tmp = System.BitConverter.GetBytes(data);                    bData[playBuffer] = tmp[0];                    bData[playBuffer] = tmp[<span class="cpp-literal"><span class="cpp-number">1</span></span>];<br>                    bData[playBuffer] = tmp[<span class="cpp-literal"><span class="cpp-number">0</span></span>];<br>                    bData[playBuffer] = tmp[<span class="cpp-literal"><span class="cpp-number">1</span></span>];<br>                  <br>               }<br>               buffer.AudioData.SetLength(<span class="cpp-literal"><span class="cpp-number">0</span></span>);<br>               buffer.AudioData.Write(bData[playBuffer], <span class="cpp-literal"><span class="cpp-number">0</span></span>,               bData[playBuffer].Length);<br>               buffer.AudioData.Position = <span class="cpp-literal"><span class="cpp-number">0</span></span>;<br><br>               buffer.AudioBytes = bData[playBuffer].Length;<br>               buffer.Flags = BufferFlags.None;<br>               buffer.Context = (IntPtr)playBuffer;<br>               sourceVoice.SubmitSourceBuffer(buffer);<br>               <br>               readyToPlay = <span class="cpp-literal">false</span>;<br>               submittedBuffers++;<br>               <span class="cpp-comment">//samplePos = 0;</span><br>            }<br><br>       }<br><br><br></pre></div><!–ENDSCRIPT–> 

This topic is closed to new replies.

Advertisement