Decoding Audio for XAudio2 with Microsoft Media Foundation

Published January 15, 2016 by Michael Fyffe (tragiccode), posted by tragiccode
Do you see issues with this article? Let us know.
Advertisement

As I was learning XAudio2 I came across countless tutorials showing how to read in uncompressed .wav files and feed them into an XAudio2 source voice. What was even worse was most of these tutorials reinvented the wheel on parsing and validating a .wav file (even the sample on MSDN "How to: Load Audio Data Files in XAudio2" performs such manual parsing). While reinventing the wheel is never a good thing you also might not want to utilize uncompressed audio files in your game because, well... they are just to big! The .mp3 compression format reduces audio file size by about 10x and provides no inherently noticeable degradation in sound quality. This would certainly be great for the music your games play!

Microsoft Media Foundation

Microsoft Media Foundation, as described by Microsoft, is the next generation multimedia platform for Windows. It was introduced as a replacement for DirectShow and offers capabilities such as the following
  1. Playing Media
  2. Transcoding Media
  3. Decoding Media
  4. Encoding Media
NOTE: I use Media to represent audio, video, or a combination of both

The Pipeline Architecture

Media Foundation is well architectured and consists of many various components. These components are designed to connect together like Lego pieces to produce what is known as a Media Foundation Pipeline. A full Media Foundation pipeline consists of reading a media file from some location, such as the file system, to sending the it to one or more optional components that can transform the audio in someway and then finally sending it to a renderer that forwards the media to some output device.

The Media Foundation Source Reader

The Source reader was introduced to allow applications to utilize features of Media Foundation without having to build a full MF Pipeline. For Example, you might want to read and possibly decode an audio file and then pass it to the XAudio2 engine for playback. Source Readers can be thought of as a component that can read an audio file and produce media samples to be consumed by your application in any way you see fit.

Media Types

Media Types are used in MF to describe the format of a particular media stream that came from possibly a file system. Your applications generally use media types to determine the format and the type of media in the stream. Objects within Media Foundation, such as the source reader, use these as well such as for loading the correct decoder for the media type output you are wanting.

Parts of a Media Type

Media Types consist of 2 parts that provide information about the type of media in a data stream. The 2 parts are described below:
  1. A Major Type
    1. The Major Type indicates the type of data (audio or video)
  2. A Sub Type
    1. The Sub Type indicates the format of the data (compressed mp3, uncompressed wav, etc)

Getting our hands dirty

With the basics out of the way, let's now see how we can utilize Media Foundation's Source Reader to read in any type of audio file (compressed or uncompressed) and extract the bytes to be sent to XAudio2 for playback. First Things First, before we can begin using Media Foundation we must load and initialize the framework within our application. This is done with a call to MSStartup(MF_VERSION). We should also be good citizens and be sure to unload it once we are done using it with MSShutdown(). This seems like a great opportunity to use the RAII idiom to create a class that handles all of this for us. struct MediaFoundationInitialize { MediaFoundationInitialize() { HR(MFStartup(MF_VERSION)); } ~MediaFoundationInitialize() { HR(MFShutdown()); } }; int __stdcall wWinMain(HINSTANCE, HINSTANCE, PWSTR, int) { MediaFoundationInitialize mf{}; return 0; } Once Media Foundation has been initialized the next thing we need to do is create the source reader. This is done using the MFCreateSourceReaderFromURL() factory method that accepts the following 3 arguments.
  1. Location to the media file on disk
  2. Optional list of attributes that will configure settings that affect how the source reader operates
  3. The output parameter of the newly allocated source reader
int __stdcall wWinMain(HINSTANCE, HINSTANCE, PWSTR, int) { MediaFoundationInitialize mf{}; // Create Attribute Store ComPtr sourceReaderConfiguration; HR(MFCreateAttributes(sourceReaderConfiguration.GetAddressOf(), 1)); HR(sourceReaderConfiguration->SetUINT32(MF_LOW_LATENCY, true)); // Create Source Reader ComPtr sourceReader; HR(MFCreateSourceReaderFromURL(L"C:\\Users\\TraGicCode\\Desktop\\394506-n-a--1450673416.mp3", sourceReaderConfiguration.Get(), sourceReader.GetAddressOf())); return 0; } Notice we set 1 attribute for our source reader
  1. MF_LOW_LATENCY - This attribute informs the source reader we want data as quick as possible for in near real time operations
With the source reader created and attached to our media file we can query the source reader for the native media type of the file. This will allow us to do some validation such as verifying that the file is indeed an audio file and also if its compressed so that we can branch off and perform extra work needed by MF to uncompress it. int __stdcall wWinMain(HINSTANCE, HINSTANCE, PWSTR, int) { MediaFoundationInitialize mf{}; // Create Attribute Store ComPtr sourceReaderConfiguration; HR(MFCreateAttributes(sourceReaderConfiguration.GetAddressOf(), 1)); HR(sourceReaderConfiguration->SetUINT32(MF_LOW_LATENCY, true)); // Create Source Reader ComPtr sourceReader; HR(MFCreateSourceReaderFromURL(L"C:\\Users\\TraGicCode\\Desktop\\394506-n-a--1450673416.mp3", sourceReaderConfiguration.Get(), sourceReader.GetAddressOf())); // Query information about the media file ComPtr nativeMediaType; HR(sourceReader->GetNativeMediaType(MF_SOURCE_READER_FIRST_AUDIO_STREAM, 0, nativeMediaType.GetAddressOf())); // Check if media file is indeed an audio file GUID majorType{}; HR(nativeMediaType->GetGUID(MF_MT_MAJOR_TYPE, &majorType)); if (MFMediaType_Audio != majorType) { throw NotAudioFileException{}; } // Check if media file is compressed or uncompressed GUID subType{}; HR(nativeMediaType->GetGUID(MF_MT_MAJOR_TYPE, &subType)); if (MFAudioFormat_Float == subType || MFAudioFormat_PCM == subType) { // Audio File is uncompressed } else { // Audio file is compressed } return 0; } If the audio file happens to be compressed (such as if we were reading in an .mp3 file) then we need to inform the source reader we would like it to decode the audio file so that it can be sent to our audio device. This is done by creating a Partial Media Type object and setting the MAJOR and SUBTYPE options for the type of output we would like. When passed to the source reader it will look throughout the system for registered decoders that can perform such requested conversion. Calling IMFSourceReader::SetCurrentMediaType() will pass if a decoder exists or fail otherwise int __stdcall wWinMain(HINSTANCE, HINSTANCE, PWSTR, int) { MediaFoundationInitialize mf{}; // Create Attribute Store ComPtr sourceReaderConfiguration; HR(MFCreateAttributes(sourceReaderConfiguration.GetAddressOf(), 1)); HR(sourceReaderConfiguration->SetUINT32(MF_LOW_LATENCY, true)); // Create Source Reader ComPtr sourceReader; HR(MFCreateSourceReaderFromURL(L"C:\\Users\\TraGicCode\\Desktop\\394506-n-a--1450673416.mp3", sourceReaderConfiguration.Get(), sourceReader.GetAddressOf())); // Query information about the media file ComPtr nativeMediaType; HR(sourceReader->GetNativeMediaType(MF_SOURCE_READER_FIRST_AUDIO_STREAM, 0, nativeMediaType.GetAddressOf())); // Check if media file is indeed an audio file GUID majorType{}; HR(nativeMediaType->GetGUID(MF_MT_MAJOR_TYPE, &majorType)); if (MFMediaType_Audio != majorType) { throw NotAudioFileException{}; } // Check if media file is compressed or uncompressed GUID subType{}; HR(nativeMediaType->GetGUID(MF_MT_MAJOR_TYPE, &subType)); if (MFAudioFormat_Float == subType || MFAudioFormat_PCM == subType) { // Audio File is uncompressed } else { // Audio file is compressed // Inform the SourceReader we want uncompressed data // This causes it to look for decoders to perform the request we are making ComPtr partialType = nullptr; HR(MFCreateMediaType(partialType.GetAddressOf())); // We want Audio HR(partialType->SetGUID(MF_MT_MAJOR_TYPE, MFMediaType_Audio)); // We want uncompressed data HR(partialType->SetGUID(MF_MT_SUBTYPE, MFAudioFormat_PCM)); HR(sourceReader->SetCurrentMediaType(MF_SOURCE_READER_FIRST_AUDIO_STREAM, NULL, partialType.Get())); } return 0; } Now that we have the source reader configured we must next create a WAVEFORMATEX object from the source reader. This data structure essentially represent the fmt chunk in a RIFF file. This is needed so that XAudio2 or more generally anything that wants to play the audio knows the speed at which playback should happen. This is done by Calling IMFSourceReader::MFCreateWaveFormatExFromMFMediaType(). This function takes the following 3 parameters
  1. The Current Media Type of the Source Reader
  2. The address to a WAVEFORMATEX struct that will be filled in by the function
  3. The address of an unsigned int that will be filled in with the size of the above struct
int __stdcall wWinMain(HINSTANCE, HINSTANCE, PWSTR, int) { MediaFoundationInitialize mf{}; // Create Attribute Store ComPtr sourceReaderConfiguration; HR(MFCreateAttributes(sourceReaderConfiguration.GetAddressOf(), 1)); HR(sourceReaderConfiguration->SetUINT32(MF_LOW_LATENCY, true)); // Create Source Reader ComPtr sourceReader; HR(MFCreateSourceReaderFromURL(L"C:\\Users\\TraGicCode\\Desktop\\394506-n-a--1450673416.mp3", sourceReaderConfiguration.Get(), sourceReader.GetAddressOf())); // Query information about the media file ComPtr nativeMediaType; HR(sourceReader->GetNativeMediaType(MF_SOURCE_READER_FIRST_AUDIO_STREAM, 0, nativeMediaType.GetAddressOf())); // Check if media file is indeed an audio file GUID majorType{}; HR(nativeMediaType->GetGUID(MF_MT_MAJOR_TYPE, &majorType)); if (MFMediaType_Audio != majorType) { throw NotAudioFileException{}; } // Check if media file is compressed or uncompressed GUID subType{}; HR(nativeMediaType->GetGUID(MF_MT_MAJOR_TYPE, &subType)); if (MFAudioFormat_Float == subType || MFAudioFormat_PCM == subType) { // Audio File is uncompressed } else { // Audio file is compressed // Inform the SourceReader we want uncompressed data // This causes it to look for decoders to perform the request we are making ComPtr partialType = nullptr; HR(MFCreateMediaType(partialType.GetAddressOf())); // We want Audio HR(partialType->SetGUID(MF_MT_MAJOR_TYPE, MFMediaType_Audio)); // We want uncompressed data HR(partialType->SetGUID(MF_MT_SUBTYPE, MFAudioFormat_PCM)); HR(sourceReader->SetCurrentMediaType(MF_SOURCE_READER_FIRST_AUDIO_STREAM, NULL, partialType.Get())); } ComPtr uncompressedAudioType = nullptr; HR(sourceReader->GetCurrentMediaType(MF_SOURCE_READER_FIRST_AUDIO_STREAM, uncompressedAudioType.GetAddressOf())); WAVEFORMATEXTENSIBLE d; WAVEFORMATEX * waveformatex; unsigned int waveformatlength; HR(MFCreateWaveFormatExFromMFMediaType(uncompressedAudioType.Get(), &waveformatex, &waveformatlength)); return 0; } lastly we synchronously read all the audio from the file and store them in a vector. NOTE: In production software you would definitely not want to synchronously read bytes into memory. This is only meant for this example int __stdcall wWinMain(HINSTANCE, HINSTANCE, PWSTR, int) { MediaFoundationInitialize mf{}; // Create Attribute Store ComPtr sourceReaderConfiguration; HR(MFCreateAttributes(sourceReaderConfiguration.GetAddressOf(), 1)); HR(sourceReaderConfiguration->SetUINT32(MF_LOW_LATENCY, true)); // Create Source Reader ComPtr sourceReader; HR(MFCreateSourceReaderFromURL(L"C:\\Users\\TraGicCode\\Desktop\\394506-n-a--1450673416.mp3", sourceReaderConfiguration.Get(), sourceReader.GetAddressOf())); // Query information about the media file ComPtr nativeMediaType; HR(sourceReader->GetNativeMediaType(MF_SOURCE_READER_FIRST_AUDIO_STREAM, 0, nativeMediaType.GetAddressOf())); // Check if media file is indeed an audio file GUID majorType{}; HR(nativeMediaType->GetGUID(MF_MT_MAJOR_TYPE, &majorType)); if (MFMediaType_Audio != majorType) { throw NotAudioFileException{}; } // Check if media file is compressed or uncompressed GUID subType{}; HR(nativeMediaType->GetGUID(MF_MT_MAJOR_TYPE, &subType)); if (MFAudioFormat_Float == subType || MFAudioFormat_PCM == subType) { // Audio File is uncompressed } else { // Audio file is compressed // Inform the SourceReader we want uncompressed data // This causes it to look for decoders to perform the request we are making ComPtr partialType = nullptr; HR(MFCreateMediaType(partialType.GetAddressOf())); // We want Audio HR(partialType->SetGUID(MF_MT_MAJOR_TYPE, MFMediaType_Audio)); // We want uncompressed data HR(partialType->SetGUID(MF_MT_SUBTYPE, MFAudioFormat_PCM)); HR(sourceReader->SetCurrentMediaType(MF_SOURCE_READER_FIRST_AUDIO_STREAM, NULL, partialType.Get())); } ComPtr uncompressedAudioType = nullptr; HR(sourceReader->GetCurrentMediaType(MF_SOURCE_READER_FIRST_AUDIO_STREAM, uncompressedAudioType.GetAddressOf())); WAVEFORMATEXTENSIBLE d; WAVEFORMATEX * waveformatex; unsigned int waveformatlength; HR(MFCreateWaveFormatExFromMFMediaType(uncompressedAudioType.Get(), &waveformatex, &waveformatlength)); std::vector bytes; // Get Sample ComPtr sample; while (true) { DWORD flags{}; HR(sourceReader->ReadSample(MF_SOURCE_READER_FIRST_AUDIO_STREAM, 0, nullptr, &flags, nullptr, sample.GetAddressOf())); // Check for eof if (flags & MF_SOURCE_READERF_ENDOFSTREAM) { break; } // Convert data to contiguous buffer ComPtr buffer; HR(sample->ConvertToContiguousBuffer(buffer.GetAddressOf())); // Lock Buffer & copy to local memory BYTE* audioData = nullptr; DWORD audioDataLength{}; HR(buffer->Lock(&audioData, nullptr, &audioDataLength)); for (size_t i = 0; i < audioDataLength; i++) { bytes.push_back(*(audioData + i)); } // Unlock Buffer HR(buffer->Unlock()); } return 0; } Now that we have the WAVEFORMATEX object and vector of our audio file we are reading to send it to XAudio2 for playback! int __stdcall wWinMain(HINSTANCE, HINSTANCE, PWSTR, int) { MediaFoundationInitialize mf{}; // Create Attribute Store ComPtr sourceReaderConfiguration; HR(MFCreateAttributes(sourceReaderConfiguration.GetAddressOf(), 1)); HR(sourceReaderConfiguration->SetUINT32(MF_LOW_LATENCY, true)); // Create Source Reader ComPtr sourceReader; HR(MFCreateSourceReaderFromURL(L"C:\\Users\\TraGicCode\\Desktop\\394506-n-a--1450673416.mp3", sourceReaderConfiguration.Get(), sourceReader.GetAddressOf())); // Query information about the media file ComPtr nativeMediaType; HR(sourceReader->GetNativeMediaType(MF_SOURCE_READER_FIRST_AUDIO_STREAM, 0, nativeMediaType.GetAddressOf())); // Check if media file is indeed an audio file GUID majorType{}; HR(nativeMediaType->GetGUID(MF_MT_MAJOR_TYPE, &majorType)); if (MFMediaType_Audio != majorType) { throw NotAudioFileException{}; } // Check if media file is compressed or uncompressed GUID subType{}; HR(nativeMediaType->GetGUID(MF_MT_MAJOR_TYPE, &subType)); if (MFAudioFormat_Float == subType || MFAudioFormat_PCM == subType) { // Audio File is uncompressed } else { // Audio file is compressed // Inform the SourceReader we want uncompressed data // This causes it to look for decoders to perform the request we are making ComPtr partialType = nullptr; HR(MFCreateMediaType(partialType.GetAddressOf())); // We want Audio HR(partialType->SetGUID(MF_MT_MAJOR_TYPE, MFMediaType_Audio)); // We want uncompressed data HR(partialType->SetGUID(MF_MT_SUBTYPE, MFAudioFormat_PCM)); HR(sourceReader->SetCurrentMediaType(MF_SOURCE_READER_FIRST_AUDIO_STREAM, NULL, partialType.Get())); } ComPtr uncompressedAudioType = nullptr; HR(sourceReader->GetCurrentMediaType(MF_SOURCE_READER_FIRST_AUDIO_STREAM, uncompressedAudioType.GetAddressOf())); WAVEFORMATEXTENSIBLE d; WAVEFORMATEX * waveformatex; unsigned int waveformatlength; HR(MFCreateWaveFormatExFromMFMediaType(uncompressedAudioType.Get(), &waveformatex, &waveformatlength)); std::vector bytes; // Get Sample ComPtr sample; while (true) { DWORD flags{}; HR(sourceReader->ReadSample(MF_SOURCE_READER_FIRST_AUDIO_STREAM, 0, nullptr, &flags, nullptr, sample.GetAddressOf())); // Check for eof if (flags & MF_SOURCE_READERF_ENDOFSTREAM) { break; } // Convert data to contiguous buffer ComPtr buffer; HR(sample->ConvertToContiguousBuffer(buffer.GetAddressOf())); // Lock Buffer & copy to local memory BYTE* audioData = nullptr; DWORD audioDataLength{}; HR(buffer->Lock(&audioData, nullptr, &audioDataLength)); for (size_t i = 0; i < audioDataLength; i++) { bytes.push_back(*(audioData + i)); } // Unlock Buffer HR(buffer->Unlock()); } // Create XAudio2 stuff auto xAudioEngine = CreateXAudioEngine(); auto masteringVoice = CreateMasteringVoice(xAudioEngine); auto sourceVoice = CreateSourceVoice(xAudioEngine, *waveformatex); XAUDIO2_BUFFER xAudioBuffer{}; xAudioBuffer.AudioBytes = bytes.size(); xAudioBuffer.pAudioData = (BYTE* const)&bytes[0]; xAudioBuffer.pContext = nullptr; sourceVoice->Start(); HR(sourceVoice->SubmitSourceBuffer(&xAudioBuffer)); // Sleep for some time to hear to song by preventing the main thread from sleep // XAudio2 plays the sound on a seperate audio thread :) Sleep(1000000); return 0; } And There you have it. Not too bad if you ask me!

Cancel Save
0 Likes 4 Comments

Comments

turanszkij

Thanks for this, I will definetly check this out!

January 18, 2016 06:41 PM
dgreen02
Hey first off great article, second I just wanted to mention a few things:

1) Previously I've disregarded the .mp3 format after learning there was a fee associated with using the algorithm for any commercial purposes ? ( http://mp3licensing.com/help/developers.html ). I've not looked into this for years, but I read most of those patents are expiring soon?

2) While it is true an uncompressed .wav file will eat up more storage space, most projects are compressed before being distributed so it's mostly a non-issue with storage prices these days? In addition to the computational gain by non having to deal w the decompression process in a game (which might be necessary if using 1000s of sounds, but you suggested this is mostly for music). I usually stream the music into memory which is an option w/XAudio2.

I only mention these things because I thought a while ago how cool it would be to have .mp3 playback, letting the user drop their music/.mp3s into a folder in the game's directory...then I was informed about the fees by a publisher at the time.

Anyways this is a great article, I just wanted to point out the following, keep up the good work!

- Dan
January 20, 2016 08:18 AM
tragiccode

Hey first off great article, second I just wanted to mention a few things:

1) Previously I've disregarded the .mp3 format after learning there was a fee associated with using the algorithm for any commercial purposes ? ( http://mp3licensing.com/help/developers.html ). I've not looked into this for years, but I read most of those patents are expiring soon?

2) While it is true an uncompressed .wav file will eat up more storage space, most projects are compressed before being distributed so it's mostly a non-issue with storage prices these days? In addition to the computational gain by non having to deal w the decompression process in a game (which might be necessary if using 1000s of sounds, but you suggested this is mostly for music). I usually stream the music into memory which is an option w/XAudio2.

I only mention these things because I thought a while ago how cool it would be to have .mp3 playback, letting the user drop their music/.mp3s into a folder in the game's directory...then I was informed about the fees by a publisher at the time.

Anyways this is a great article, I just wanted to point out the following, keep up the good work!

- Dan

Awesome feedback!

January 21, 2016 03:43 PM
mr_tawan

2) While it is true an uncompressed .wav file will eat up more storage space, most projects are compressed before being distributed so it's mostly a non-issue with storage prices these days? In addition to the computational gain by non having to deal w the decompression process in a game (which might be necessary if using 1000s of sounds, but you suggested this is mostly for music). I usually stream the music into memory which is an option w/XAudio2.

I think it's probably better to ship with audio-specific compression (eg. ... FLAC ?), and do the decompress at the installation. It should saves some more space taken by the installer.

Also considering most PC nowadays have a few GB of its main memory, we could also load the uncompressed PCM data into the memory and stream it from there (given that there is only a few large music file loaded at the same time). Having uncompressed audio in the memory would also eliminate the latency from the decoder, which is quite important for music games. This wouldn't work with mobile game, though....

January 27, 2016 04:45 PM
You must log in to join the conversation.
Don't have a GameDev.net account? Sign up!

Decoding Audio with Media Foundation and playing it back with XAudio2

Advertisement

Other Tutorials by tragiccode

tragiccode has not posted any other tutorials. Encourage them to write more!
Advertisement