Jump to content
  • Advertisement
  • 01/15/16 07:24 PM
    Sign in to follow this  

    Decoding Audio for XAudio2 with Microsoft Media Foundation

    General and Gameplay Programming

    tragiccode

    As I was learning XAudio2 I came across countless tutorials showing how to read in uncompressed .wav files and feed them into an XAudio2 source voice. What was even worse was most of these tutorials reinvented the wheel on parsing and validating a .wav file (even the sample on MSDN "How to: Load Audio Data Files in XAudio2" performs such manual parsing). While reinventing the wheel is never a good thing you also might not want to utilize uncompressed audio files in your game because, well... they are just to big! The .mp3 compression format reduces audio file size by about 10x and provides no inherently noticeable degradation in sound quality. This would certainly be great for the music your games play!

    Microsoft Media Foundation

    Microsoft Media Foundation, as described by Microsoft, is the next generation multimedia platform for Windows. It was introduced as a replacement for DirectShow and offers capabilities such as the following
    1. Playing Media
    2. Transcoding Media
    3. Decoding Media
    4. Encoding Media
    NOTE: I use Media to represent audio, video, or a combination of both

    The Pipeline Architecture

    Media Foundation is well architectured and consists of many various components. These components are designed to connect together like Lego pieces to produce what is known as a Media Foundation Pipeline. A full Media Foundation pipeline consists of reading a media file from some location, such as the file system, to sending the it to one or more optional components that can transform the audio in someway and then finally sending it to a renderer that forwards the media to some output device.

    The Media Foundation Source Reader

    The Source reader was introduced to allow applications to utilize features of Media Foundation without having to build a full MF Pipeline. For Example, you might want to read and possibly decode an audio file and then pass it to the XAudio2 engine for playback. Source Readers can be thought of as a component that can read an audio file and produce media samples to be consumed by your application in any way you see fit.

    Media Types

    Media Types are used in MF to describe the format of a particular media stream that came from possibly a file system. Your applications generally use media types to determine the format and the type of media in the stream. Objects within Media Foundation, such as the source reader, use these as well such as for loading the correct decoder for the media type output you are wanting.

    Parts of a Media Type

    Media Types consist of 2 parts that provide information about the type of media in a data stream. The 2 parts are described below:
    1. A Major Type
      1. The Major Type indicates the type of data (audio or video)
    2. A Sub Type
      1. The Sub Type indicates the format of the data (compressed mp3, uncompressed wav, etc)

    Getting our hands dirty

    With the basics out of the way, let's now see how we can utilize Media Foundation's Source Reader to read in any type of audio file (compressed or uncompressed) and extract the bytes to be sent to XAudio2 for playback. First Things First, before we can begin using Media Foundation we must load and initialize the framework within our application. This is done with a call to MSStartup(MF_VERSION). We should also be good citizens and be sure to unload it once we are done using it with MSShutdown(). This seems like a great opportunity to use the RAII idiom to create a class that handles all of this for us. struct MediaFoundationInitialize { MediaFoundationInitialize() { HR(MFStartup(MF_VERSION)); } ~MediaFoundationInitialize() { HR(MFShutdown()); } }; int __stdcall wWinMain(HINSTANCE, HINSTANCE, PWSTR, int) { MediaFoundationInitialize mf{}; return 0; } Once Media Foundation has been initialized the next thing we need to do is create the source reader. This is done using the MFCreateSourceReaderFromURL() factory method that accepts the following 3 arguments.
    1. Location to the media file on disk
    2. Optional list of attributes that will configure settings that affect how the source reader operates
    3. The output parameter of the newly allocated source reader
    int __stdcall wWinMain(HINSTANCE, HINSTANCE, PWSTR, int) { MediaFoundationInitialize mf{}; // Create Attribute Store ComPtr sourceReaderConfiguration; HR(MFCreateAttributes(sourceReaderConfiguration.GetAddressOf(), 1)); HR(sourceReaderConfiguration->SetUINT32(MF_LOW_LATENCY, true)); // Create Source Reader ComPtr sourceReader; HR(MFCreateSourceReaderFromURL(L"C:\\Users\\TraGicCode\\Desktop\\394506-n-a--1450673416.mp3", sourceReaderConfiguration.Get(), sourceReader.GetAddressOf())); return 0; } Notice we set 1 attribute for our source reader
    1. MF_LOW_LATENCY - This attribute informs the source reader we want data as quick as possible for in near real time operations
    With the source reader created and attached to our media file we can query the source reader for the native media type of the file. This will allow us to do some validation such as verifying that the file is indeed an audio file and also if its compressed so that we can branch off and perform extra work needed by MF to uncompress it. int __stdcall wWinMain(HINSTANCE, HINSTANCE, PWSTR, int) { MediaFoundationInitialize mf{}; // Create Attribute Store ComPtr sourceReaderConfiguration; HR(MFCreateAttributes(sourceReaderConfiguration.GetAddressOf(), 1)); HR(sourceReaderConfiguration->SetUINT32(MF_LOW_LATENCY, true)); // Create Source Reader ComPtr sourceReader; HR(MFCreateSourceReaderFromURL(L"C:\\Users\\TraGicCode\\Desktop\\394506-n-a--1450673416.mp3", sourceReaderConfiguration.Get(), sourceReader.GetAddressOf())); // Query information about the media file ComPtr nativeMediaType; HR(sourceReader->GetNativeMediaType(MF_SOURCE_READER_FIRST_AUDIO_STREAM, 0, nativeMediaType.GetAddressOf())); // Check if media file is indeed an audio file GUID majorType{}; HR(nativeMediaType->GetGUID(MF_MT_MAJOR_TYPE, &majorType)); if (MFMediaType_Audio != majorType) { throw NotAudioFileException{}; } // Check if media file is compressed or uncompressed GUID subType{}; HR(nativeMediaType->GetGUID(MF_MT_MAJOR_TYPE, &subType)); if (MFAudioFormat_Float == subType || MFAudioFormat_PCM == subType) { // Audio File is uncompressed } else { // Audio file is compressed } return 0; } If the audio file happens to be compressed (such as if we were reading in an .mp3 file) then we need to inform the source reader we would like it to decode the audio file so that it can be sent to our audio device. This is done by creating a Partial Media Type object and setting the MAJOR and SUBTYPE options for the type of output we would like. When passed to the source reader it will look throughout the system for registered decoders that can perform such requested conversion. Calling IMFSourceReader::SetCurrentMediaType() will pass if a decoder exists or fail otherwise int __stdcall wWinMain(HINSTANCE, HINSTANCE, PWSTR, int) { MediaFoundationInitialize mf{}; // Create Attribute Store ComPtr sourceReaderConfiguration; HR(MFCreateAttributes(sourceReaderConfiguration.GetAddressOf(), 1)); HR(sourceReaderConfiguration->SetUINT32(MF_LOW_LATENCY, true)); // Create Source Reader ComPtr sourceReader; HR(MFCreateSourceReaderFromURL(L"C:\\Users\\TraGicCode\\Desktop\\394506-n-a--1450673416.mp3", sourceReaderConfiguration.Get(), sourceReader.GetAddressOf())); // Query information about the media file ComPtr nativeMediaType; HR(sourceReader->GetNativeMediaType(MF_SOURCE_READER_FIRST_AUDIO_STREAM, 0, nativeMediaType.GetAddressOf())); // Check if media file is indeed an audio file GUID majorType{}; HR(nativeMediaType->GetGUID(MF_MT_MAJOR_TYPE, &majorType)); if (MFMediaType_Audio != majorType) { throw NotAudioFileException{}; } // Check if media file is compressed or uncompressed GUID subType{}; HR(nativeMediaType->GetGUID(MF_MT_MAJOR_TYPE, &subType)); if (MFAudioFormat_Float == subType || MFAudioFormat_PCM == subType) { // Audio File is uncompressed } else { // Audio file is compressed // Inform the SourceReader we want uncompressed data // This causes it to look for decoders to perform the request we are making ComPtr partialType = nullptr; HR(MFCreateMediaType(partialType.GetAddressOf())); // We want Audio HR(partialType->SetGUID(MF_MT_MAJOR_TYPE, MFMediaType_Audio)); // We want uncompressed data HR(partialType->SetGUID(MF_MT_SUBTYPE, MFAudioFormat_PCM)); HR(sourceReader->SetCurrentMediaType(MF_SOURCE_READER_FIRST_AUDIO_STREAM, NULL, partialType.Get())); } return 0; } Now that we have the source reader configured we must next create a WAVEFORMATEX object from the source reader. This data structure essentially represent the fmt chunk in a RIFF file. This is needed so that XAudio2 or more generally anything that wants to play the audio knows the speed at which playback should happen. This is done by Calling IMFSourceReader::MFCreateWaveFormatExFromMFMediaType(). This function takes the following 3 parameters
    1. The Current Media Type of the Source Reader
    2. The address to a WAVEFORMATEX struct that will be filled in by the function
    3. The address of an unsigned int that will be filled in with the size of the above struct
    int __stdcall wWinMain(HINSTANCE, HINSTANCE, PWSTR, int) { MediaFoundationInitialize mf{}; // Create Attribute Store ComPtr sourceReaderConfiguration; HR(MFCreateAttributes(sourceReaderConfiguration.GetAddressOf(), 1)); HR(sourceReaderConfiguration->SetUINT32(MF_LOW_LATENCY, true)); // Create Source Reader ComPtr sourceReader; HR(MFCreateSourceReaderFromURL(L"C:\\Users\\TraGicCode\\Desktop\\394506-n-a--1450673416.mp3", sourceReaderConfiguration.Get(), sourceReader.GetAddressOf())); // Query information about the media file ComPtr nativeMediaType; HR(sourceReader->GetNativeMediaType(MF_SOURCE_READER_FIRST_AUDIO_STREAM, 0, nativeMediaType.GetAddressOf())); // Check if media file is indeed an audio file GUID majorType{}; HR(nativeMediaType->GetGUID(MF_MT_MAJOR_TYPE, &majorType)); if (MFMediaType_Audio != majorType) { throw NotAudioFileException{}; } // Check if media file is compressed or uncompressed GUID subType{}; HR(nativeMediaType->GetGUID(MF_MT_MAJOR_TYPE, &subType)); if (MFAudioFormat_Float == subType || MFAudioFormat_PCM == subType) { // Audio File is uncompressed } else { // Audio file is compressed // Inform the SourceReader we want uncompressed data // This causes it to look for decoders to perform the request we are making ComPtr partialType = nullptr; HR(MFCreateMediaType(partialType.GetAddressOf())); // We want Audio HR(partialType->SetGUID(MF_MT_MAJOR_TYPE, MFMediaType_Audio)); // We want uncompressed data HR(partialType->SetGUID(MF_MT_SUBTYPE, MFAudioFormat_PCM)); HR(sourceReader->SetCurrentMediaType(MF_SOURCE_READER_FIRST_AUDIO_STREAM, NULL, partialType.Get())); } ComPtr uncompressedAudioType = nullptr; HR(sourceReader->GetCurrentMediaType(MF_SOURCE_READER_FIRST_AUDIO_STREAM, uncompressedAudioType.GetAddressOf())); WAVEFORMATEXTENSIBLE d; WAVEFORMATEX * waveformatex; unsigned int waveformatlength; HR(MFCreateWaveFormatExFromMFMediaType(uncompressedAudioType.Get(), &waveformatex, &waveformatlength)); return 0; } lastly we synchronously read all the audio from the file and store them in a vector. NOTE: In production software you would definitely not want to synchronously read bytes into memory. This is only meant for this example int __stdcall wWinMain(HINSTANCE, HINSTANCE, PWSTR, int) { MediaFoundationInitialize mf{}; // Create Attribute Store ComPtr sourceReaderConfiguration; HR(MFCreateAttributes(sourceReaderConfiguration.GetAddressOf(), 1)); HR(sourceReaderConfiguration->SetUINT32(MF_LOW_LATENCY, true)); // Create Source Reader ComPtr sourceReader; HR(MFCreateSourceReaderFromURL(L"C:\\Users\\TraGicCode\\Desktop\\394506-n-a--1450673416.mp3", sourceReaderConfiguration.Get(), sourceReader.GetAddressOf())); // Query information about the media file ComPtr nativeMediaType; HR(sourceReader->GetNativeMediaType(MF_SOURCE_READER_FIRST_AUDIO_STREAM, 0, nativeMediaType.GetAddressOf())); // Check if media file is indeed an audio file GUID majorType{}; HR(nativeMediaType->GetGUID(MF_MT_MAJOR_TYPE, &majorType)); if (MFMediaType_Audio != majorType) { throw NotAudioFileException{}; } // Check if media file is compressed or uncompressed GUID subType{}; HR(nativeMediaType->GetGUID(MF_MT_MAJOR_TYPE, &subType)); if (MFAudioFormat_Float == subType || MFAudioFormat_PCM == subType) { // Audio File is uncompressed } else { // Audio file is compressed // Inform the SourceReader we want uncompressed data // This causes it to look for decoders to perform the request we are making ComPtr partialType = nullptr; HR(MFCreateMediaType(partialType.GetAddressOf())); // We want Audio HR(partialType->SetGUID(MF_MT_MAJOR_TYPE, MFMediaType_Audio)); // We want uncompressed data HR(partialType->SetGUID(MF_MT_SUBTYPE, MFAudioFormat_PCM)); HR(sourceReader->SetCurrentMediaType(MF_SOURCE_READER_FIRST_AUDIO_STREAM, NULL, partialType.Get())); } ComPtr uncompressedAudioType = nullptr; HR(sourceReader->GetCurrentMediaType(MF_SOURCE_READER_FIRST_AUDIO_STREAM, uncompressedAudioType.GetAddressOf())); WAVEFORMATEXTENSIBLE d; WAVEFORMATEX * waveformatex; unsigned int waveformatlength; HR(MFCreateWaveFormatExFromMFMediaType(uncompressedAudioType.Get(), &waveformatex, &waveformatlength)); std::vector bytes; // Get Sample ComPtr sample; while (true) { DWORD flags{}; HR(sourceReader->ReadSample(MF_SOURCE_READER_FIRST_AUDIO_STREAM, 0, nullptr, &flags, nullptr, sample.GetAddressOf())); // Check for eof if (flags & MF_SOURCE_READERF_ENDOFSTREAM) { break; } // Convert data to contiguous buffer ComPtr buffer; HR(sample->ConvertToContiguousBuffer(buffer.GetAddressOf())); // Lock Buffer & copy to local memory BYTE* audioData = nullptr; DWORD audioDataLength{}; HR(buffer->Lock(&audioData, nullptr, &audioDataLength)); for (size_t i = 0; i < audioDataLength; i++) { bytes.push_back(*(audioData + i)); } // Unlock Buffer HR(buffer->Unlock()); } return 0; } Now that we have the WAVEFORMATEX object and vector of our audio file we are reading to send it to XAudio2 for playback! int __stdcall wWinMain(HINSTANCE, HINSTANCE, PWSTR, int) { MediaFoundationInitialize mf{}; // Create Attribute Store ComPtr sourceReaderConfiguration; HR(MFCreateAttributes(sourceReaderConfiguration.GetAddressOf(), 1)); HR(sourceReaderConfiguration->SetUINT32(MF_LOW_LATENCY, true)); // Create Source Reader ComPtr sourceReader; HR(MFCreateSourceReaderFromURL(L"C:\\Users\\TraGicCode\\Desktop\\394506-n-a--1450673416.mp3", sourceReaderConfiguration.Get(), sourceReader.GetAddressOf())); // Query information about the media file ComPtr nativeMediaType; HR(sourceReader->GetNativeMediaType(MF_SOURCE_READER_FIRST_AUDIO_STREAM, 0, nativeMediaType.GetAddressOf())); // Check if media file is indeed an audio file GUID majorType{}; HR(nativeMediaType->GetGUID(MF_MT_MAJOR_TYPE, &majorType)); if (MFMediaType_Audio != majorType) { throw NotAudioFileException{}; } // Check if media file is compressed or uncompressed GUID subType{}; HR(nativeMediaType->GetGUID(MF_MT_MAJOR_TYPE, &subType)); if (MFAudioFormat_Float == subType || MFAudioFormat_PCM == subType) { // Audio File is uncompressed } else { // Audio file is compressed // Inform the SourceReader we want uncompressed data // This causes it to look for decoders to perform the request we are making ComPtr partialType = nullptr; HR(MFCreateMediaType(partialType.GetAddressOf())); // We want Audio HR(partialType->SetGUID(MF_MT_MAJOR_TYPE, MFMediaType_Audio)); // We want uncompressed data HR(partialType->SetGUID(MF_MT_SUBTYPE, MFAudioFormat_PCM)); HR(sourceReader->SetCurrentMediaType(MF_SOURCE_READER_FIRST_AUDIO_STREAM, NULL, partialType.Get())); } ComPtr uncompressedAudioType = nullptr; HR(sourceReader->GetCurrentMediaType(MF_SOURCE_READER_FIRST_AUDIO_STREAM, uncompressedAudioType.GetAddressOf())); WAVEFORMATEXTENSIBLE d; WAVEFORMATEX * waveformatex; unsigned int waveformatlength; HR(MFCreateWaveFormatExFromMFMediaType(uncompressedAudioType.Get(), &waveformatex, &waveformatlength)); std::vector bytes; // Get Sample ComPtr sample; while (true) { DWORD flags{}; HR(sourceReader->ReadSample(MF_SOURCE_READER_FIRST_AUDIO_STREAM, 0, nullptr, &flags, nullptr, sample.GetAddressOf())); // Check for eof if (flags & MF_SOURCE_READERF_ENDOFSTREAM) { break; } // Convert data to contiguous buffer ComPtr buffer; HR(sample->ConvertToContiguousBuffer(buffer.GetAddressOf())); // Lock Buffer & copy to local memory BYTE* audioData = nullptr; DWORD audioDataLength{}; HR(buffer->Lock(&audioData, nullptr, &audioDataLength)); for (size_t i = 0; i < audioDataLength; i++) { bytes.push_back(*(audioData + i)); } // Unlock Buffer HR(buffer->Unlock()); } // Create XAudio2 stuff auto xAudioEngine = CreateXAudioEngine(); auto masteringVoice = CreateMasteringVoice(xAudioEngine); auto sourceVoice = CreateSourceVoice(xAudioEngine, *waveformatex); XAUDIO2_BUFFER xAudioBuffer{}; xAudioBuffer.AudioBytes = bytes.size(); xAudioBuffer.pAudioData = (BYTE* const)&bytes[0]; xAudioBuffer.pContext = nullptr; sourceVoice->Start(); HR(sourceVoice->SubmitSourceBuffer(&xAudioBuffer)); // Sleep for some time to hear to song by preventing the main thread from sleep // XAudio2 plays the sound on a seperate audio thread :) Sleep(1000000); return 0; } And There you have it. Not too bad if you ask me!



      Report Article
    Sign in to follow this  


    User Feedback


    Hey first off great article, second I just wanted to mention a few things:

    1) Previously I've disregarded the .mp3 format after learning there was a fee associated with using the algorithm for any commercial purposes ? ( http://mp3licensing.com/help/developers.html ). I've not looked into this for years, but I read most of those patents are expiring soon?

    2) While it is true an uncompressed .wav file will eat up more storage space, most projects are compressed before being distributed so it's mostly a non-issue with storage prices these days? In addition to the computational gain by non having to deal w the decompression process in a game (which might be necessary if using 1000s of sounds, but you suggested this is mostly for music). I usually stream the music into memory which is an option w/XAudio2.

    I only mention these things because I thought a while ago how cool it would be to have .mp3 playback, letting the user drop their music/.mp3s into a folder in the game's directory...then I was informed about the fees by a publisher at the time.

    Anyways this is a great article, I just wanted to point out the following, keep up the good work!

    - Dan

    Share this comment


    Link to comment
    Share on other sites

    Hey first off great article, second I just wanted to mention a few things:

    1) Previously I've disregarded the .mp3 format after learning there was a fee associated with using the algorithm for any commercial purposes ? ( http://mp3licensing.com/help/developers.html ). I've not looked into this for years, but I read most of those patents are expiring soon?

    2) While it is true an uncompressed .wav file will eat up more storage space, most projects are compressed before being distributed so it's mostly a non-issue with storage prices these days? In addition to the computational gain by non having to deal w the decompression process in a game (which might be necessary if using 1000s of sounds, but you suggested this is mostly for music). I usually stream the music into memory which is an option w/XAudio2.

    I only mention these things because I thought a while ago how cool it would be to have .mp3 playback, letting the user drop their music/.mp3s into a folder in the game's directory...then I was informed about the fees by a publisher at the time.

    Anyways this is a great article, I just wanted to point out the following, keep up the good work!

    - Dan

    Awesome feedback! 

    Share this comment


    Link to comment
    Share on other sites

    2) While it is true an uncompressed .wav file will eat up more storage space, most projects are compressed before being distributed so it's mostly a non-issue with storage prices these days? In addition to the computational gain by non having to deal w the decompression process in a game (which might be necessary if using 1000s of sounds, but you suggested this is mostly for music). I usually stream the music into memory which is an option w/XAudio2.

    I think it's probably better to ship with audio-specific compression (eg. ... FLAC ?), and do the decompress at the installation. It should saves some more space taken by the installer.

     

    Also considering most PC nowadays have a few GB of its main memory, we could also load the uncompressed PCM data into the memory and stream it from there (given that there is only a few large music file loaded at the same time). Having uncompressed audio in the memory would also eliminate the latency from the decoder, which is quite important for music games. This wouldn't work with mobile game, though....

    Share this comment


    Link to comment
    Share on other sites


    Create an account or sign in to comment

    You need to be a member in order to leave a comment

    Create an account

    Sign up for a new account in our community. It's easy!

    Register a new account

    Sign in

    Already have an account? Sign in here.

    Sign In Now

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!