• entries
14
18
• views
13450

# Sounds Like a Hack

432 views

While the side project is shelved for now, I did discover something interesting that might be useful to other XNA users working with sound. XNA 3.0 added the SoundEffect API to bypass the complexity of XACT. Unfortunately, the sounds must still be authored ahead of time and can only be instantiated through the content pipeline... Right?

NOT SO!

This is a total unabashed hack, but it works on both PC and 360. I successfully generated a sine wave at runtime with custom loop points, and it works great (after getting the units right for the loop point that is, but more on that later).

Also, this is NOT suitable for "interactive" audio, which is to say you can't have a rolling buffer of continuously generated sound data. It almost works for that, but the gap between buffers is noticeable, and especially jarring on the 360. Here's to hoping they improve that in a future XNA release. Nevertheless, the ability to generate sound effects at runtime still provides interesting possibilities.

Anyway, down to business. The first thing that bars our way is the fact that SoundEffect has no public constructor. This can be easily remedied with the crowbar that is reflection:
_SoundEffectCtor = typeof(SoundEffect).GetConstructor(	BindingFlags.NonPublic | BindingFlags.Instance, Type.DefaultBinder,	new Type[] { typeof(byte[]), typeof(byte[]), typeof(int), typeof(int), typeof(int) },	null);

As can be seen, SoundEffect has a private constructor that takes 2 byte arrays and 3 ints. Fantastic. So... what are they?
Digging deeper with Reflector (which is a tool any .NET developer should have handy) we find that the first byte array is a WAVEFORMATEX structure, and the second byte array is the PCM data. The first 2 ints are the loop region start and the loop region length (measured in samples, NOT bytes), and the final int is the duration of the sound in milliseconds. I'm not sure why that's a parameter, since it could be computed from the wave format and the data itself, but whatever.

While most of the parameters are straightforward, we'll need to construct a WAVEFORMATEX byte by byte. Fortunately, the MSDN page for it tells us what we need to know. Eventually, I came up with this:
#if WINDOWSstatic readonly byte[] _WaveFormat = new byte[]{ // WAVEFORMATEX little endian	0x01, 0x00, // wFormatTag	0x02, 0x00, // nChannels	0x44, 0xAC, 0x00, 0x00, // nSamplesPerSec	0x10, 0xB1, 0x02, 0x00, // nAvgBytesPerSec	0x04, 0x00, // nBlockAlign	0x10, 0x00, // wBitsPerSample	0x00, 0x00 // cbSize};#elif XBOXstatic readonly byte[] _WaveFormat = new byte[]{ // WAVEFORMATEX big endian	0x00, 0x01, // wFormatTag	0x00, 0x02, // nChannels	0x00, 0x00, 0xAC, 0x44, // nSamplesPerSec	0x00, 0x02, 0xB1, 0x10, // nAvgBytesPerSec	0x00, 0x04, // nBlockAlign	0x00, 0x10, // wBitsPerSample	0x00, 0x00 // cbSize};#endif

The first thing that should be apparent is that it's different for the PC and the 360. This is because the 360 is big-endian, whereas PCs are little. This also applies to the PCM data itself.

The first member is the format of the wave (0x1 for PCM). Next is the number of channels (2 for stereo). The sample rate (44100Hz in hex). Bytes per second (sample rate times atomic size). Bytes per atomic unit (two 2-byte samples). Bits per sample (16), and size of the extended data block (0 since PCM doesn't have one). This will give us a pretty standard 44.1kHz, 16-bit, stereo wave to work with. It could just as easily be made mono with the appropriate adjustments.

The next parameter is the sound data itself. This is stored as a series of 16-bit values alternating between the left and right channels. Here's a snippet that generates a sine wave:
_WavePos = 0.0F;float waveIncrement = MathHelper.TwoPi * 440.0F / 44100.0F;for (int i = 0; i < _SampleData.Length; i += 4){	short sample = (short)(Math.Round(Math.Sin(_WavePos) * 4000.0));#if WINDOWS	_SampleData[i + 0] = (byte)(sample);	_SampleData[i + 1] = (byte)(sample >> 8);	_SampleData[i + 2] = (byte)(sample);	_SampleData[i + 3] = (byte)(sample >> 8);#elif XBOX	_SampleData[i + 0] = (byte)(sample >> 8);	_SampleData[i + 1] = (byte)(sample);	_SampleData[i + 2] = (byte)(sample >> 8);	_SampleData[i + 3] = (byte)(sample);#endif	_WavePos += waveIncrement;}

This will generate a 440Hz (A) tone. Again notice the endian difference, and how the 16-bit sample is sliced into 2 bytes for placement into the array. It's written to the array twice so that the tone will sound in both channels.

Next we have the loop region. The loopStart is the inclusive sample offset of the beginning of the loop, and loopStart + loopLength is the the exclusive ending sample. In this context, sample includes both the left and right channel samples, so really a 4-byte atomic block. If you pass in values measured in bytes, playback will run past the end of your sound and the app will die a sudden and painful death.

Finally, the duration parameter. I just calculate the length of the sound in milliseconds and pass it in (soundData.Length * 250 / 44100). I'm not sure if this parameter actually has an effect on anything, but it's still prudent to set it.

Once you have all this, you can just invoke the constructor and supply your arguments, and you should get a nice new SoundEffect from which you can spawn instances and play it just as you would with one you'd get from the content pipeline.

That about covers it. Certainly not as useful as full real-time audio would be, but I thought it was cool anyway, and would hopefully be useful for some scenarios at least.

This is FANTASTIC! Great work :D

Very nice! I use XNA a bit and could use something like this.

Nice hack, thanks for sharing!

Did you try to modify the sampledata buffer after passing it to soundbuffer? I guess they make a copy of it, but just in case... (anyway, we would need sound playing position... so it's not usable as you said for continuous dynamic sound generation)

## Create an account

Register a new account