Sign in to follow this  
ranearal

Ogg Vorbis encoding adding pop to end of the sound

Recommended Posts

I am writing an Ogg Vorbis encoder. It seems to be working pretty well except that it adds a small pop at the end of the sound. I based the encoder on the encoder example that comes with the vorbis library, with only minor changes to suit my needs, most noteable are that I have the wav file in memory and it is mono. Any help on this would be greatly appreciated. I have been staring at it far to long. The sound comes out just slightly longer than the wav so I think it has to with how I am stopping.
BOOL SoundDat::OGGFileCompression()
{
   ogg_stream_state oggStreamState; /* take physical pages, weld into a logical
                        stream of packets */
   ogg_page         oggPage; /* one Ogg bitstream page.  Vorbis packets are inside */
   ogg_packet       oggPacket; /* one raw packet of data for decode */

   vorbis_info      vorbisInfo; /* struct that stores all the static vorbis bitstream
                        settings */
   vorbis_comment   vorbisComment; /* struct that stores all the user comments */

   vorbis_dsp_state vorbisDspState; /* central working state for the packet->PCM decoder */
   vorbis_block     vorbisBlock; /* local working space for packet->PCM decode */

   ogg_packet header;
   ogg_packet headerComment;
   ogg_packet headerCode;

   char tempFileName[MAX_PATH];

   BOOL returnVal, bEndOfStream = FALSE;

   FILE *pFile;
   float **buffer;
   unsigned int currentPCMDataLoc = 48;  //skipping wav header

   signed char readbuffer[READ*4+44]; /* out of the data segment, not the stack */
   int iDatatSize = m_WavDataSize -  48;

   vorbis_info_init(&vorbisInfo);

   returnVal = vorbis_encode_init_vbr(&vorbisInfo, m_Channels, m_Frequency, .4f);

   if(returnVal)
      return FALSE;

   vorbis_comment_init(&vorbisComment);
   vorbis_comment_add_tag(&vorbisComment,"RSOggEcoder","RSSoundDat Ogg Encoding");

   vorbis_analysis_init(&vorbisDspState,&vorbisInfo);
   vorbis_block_init(&vorbisDspState,&vorbisBlock);

   srand(time(NULL));
   ogg_stream_init(&oggStreamState,rand());

   // Writting the header
   vorbis_analysis_headerout(&vorbisDspState,&vorbisComment,&header,&headerComment,&headerCode);

   ogg_stream_packetin(&oggStreamState,&header);
   ogg_stream_packetin(&oggStreamState,&headerComment);
   ogg_stream_packetin(&oggStreamState,&headerCode);

   GetTempFileName(".", "ogt", 0, tempFileName);

   pFile = fopen(tempFileName, "wb");
   while(true)
   {
      int result = ogg_stream_flush(&oggStreamState, &oggPage);
      if(!result)
         break;
      fwrite(oggPage.header, 1, oggPage.header_len, pFile);
      fwrite(oggPage.body, 1, oggPage.body_len, pFile);
   }

   while(!bEndOfStream)
   {
      int i;
      int bytes = READ * 4;//min(READ, iDatatSize - currentPCMDataLoc);

      if(iDatatSize - currentPCMDataLoc < READ *4)
      {
         bytes = iDatatSize - currentPCMDataLoc; //READ - (currentPCMDataLoc - (iDatatSize-(READ + 96)));
      }
      if(iDatatSize < currentPCMDataLoc)
         bytes = 0;
      else
         memcpy(readbuffer, ((char*)m_pData)+currentPCMDataLoc, bytes);

      if(!bytes)
      {
         vorbis_analysis_wrote(&vorbisDspState, 0);
      }
      else
      {
         buffer = vorbis_analysis_buffer(&vorbisDspState, READ);

         int mod = m_Channels *2;
         for(i=0; i<bytes/mod; i++)
         {
            buffer[0][i]=((readbuffer[i*mod +1]<<8)|
               (0x00ff&(int)readbuffer[i*mod ]))/32768.f;
            if(m_Channels == 2)
            {
               buffer[1][i]=((readbuffer[i*mod +3]<<8)|
                  (0x00ff&(int)readbuffer[i*mod +2]))/32768.f;
            }
         }
         vorbis_analysis_wrote(&vorbisDspState, i);
      }
      while(vorbis_analysis_blockout(&vorbisDspState, &vorbisBlock) == 1)
      {
         vorbis_analysis(&vorbisBlock,NULL);
         vorbis_bitrate_addblock(&vorbisBlock);

         while(vorbis_bitrate_flushpacket(&vorbisDspState,&oggPacket))
         {
            ogg_stream_packetin(&oggStreamState, &oggPacket);

            while(!bEndOfStream)
            {
               int result = ogg_stream_flush(&oggStreamState, &oggPage);
               if(!result)
                  break;
               fwrite(oggPage.header, 1, oggPage.header_len, pFile);
               fwrite(oggPage.body, 1, oggPage.body_len, pFile);

               if(ogg_page_eos(&oggPage))
                  bEndOfStream = TRUE;
            }
         }
      }

      currentPCMDataLoc += READ * 4;
   }

   ogg_stream_clear(&oggStreamState);
   vorbis_block_clear(&vorbisBlock);
   vorbis_dsp_clear(&vorbisDspState);
   vorbis_comment_clear(&vorbisComment);
   vorbis_info_clear(&vorbisInfo);

   // Get the data out of the temp file
   fclose(pFile);
   pFile = fopen(tempFileName, "rb");

   fseek(pFile, 0, SEEK_END);
   unsigned int length = ftell(pFile);
   fseek(pFile, 0, SEEK_SET);
   m_CompressedDataSize = length;
   m_Compressed_Data_Buffer = (unsigned char*)malloc(m_CompressedDataSize);
   fread(m_Compressed_Data_Buffer, 1, m_CompressedDataSize, pFile);

   fclose(pFile);

   return TRUE;

}

[Edited by - ranearal on August 22, 2006 12:14:46 PM]

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Please edit your post and put source tags around the code so it's readable. Thanks.

[ source]Code here[ /source]

Just remove the space after the [ and it will work.

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Are you sure there is a pop added to the end of the sound - or is the player just closing the audio output in some "abrupt" manner that might cause a pop?

Load the encoded vorbis file into an audio app (or decode it into wav first if you don't have an audio app that supports loading of ogg) and graphically inspect the spectrum to see if there is in fact a pop present at the end.

Share this post


Link to post
Share on other sites
The pop sound might be if your sound buffer is larger than the WAV sample in it, so the player plays those random bits at the end. Make sure to fill those bytes at the end with silence. From DX docs, 8bit sound samples use '128' for silence bytes while 16bit samples use 0.

Also, if you use 16bit sound samples, make sure your sound buffer is a multiple of 2 bytes for streaming sounds.

< spelling edit >

[Edited by - KreK on August 18, 2006 6:58:50 AM]

Share this post


Link to post
Share on other sites
I had a similar problem when I was starting to get into ripping MP3s. My problem was caused by the very end of the track not being silence for some non-zero period of time. My guess is that doesn't know how to handle a non-zero end to the waveform. Though I never did a thorough investigation of the problem (such as looking at the decoded waveform and comparing to the original), I thought it might be something like the encoder amplifying the end of the track, causing a tear when the waveform goes from non-zero (the last sample) to zero (the audio data following the last sample), which results in a pop.

Don't mind me, I'm half awake and babbling incoherently.

Share this post


Link to post
Share on other sites
I have opened the files in Audacity to compare the raw wav to the ogg file. The ogg file comes out slightly longer with bad data at the end. Even if I could just make that silent it would still be an issue as I am looping most of my sounds.

Thanks about the source tags, I had not posted code here before.

I can arbitrary stop the file early by making the iDataSize variable smaller and no extra data will be in the ogg. I really need toi make it match the original though.

Share this post


Link to post
Share on other sites
Quote:
Original post by ranearal
I have opened the files in Audacity to compare the raw wav to the ogg file. The ogg file comes out slightly longer with bad data at the end. Even if I could just make that silent it would still be an issue as I am looping most of my sounds.


Now we've got something to work with. I'm not actually that familiar with the Vorbis format, but in the case of MP3s, sounds are encoded in frames 1/30 of a second long. I'm assuming this implies that sounds end up being multiples of 1/30 second long after encoding. Either way, it's possible that either the encoder or decoder is inserting noise after where the sound is supposed to stop, rather than silence (in the case of the decoder that would require that the Vorbis stream DOES store the true number of samples, but still packages them into fixed-length frames, padded with something or other). I suppose in the end I've done nothing but a lot of theorization about stuff I'm too lazy to verify, but maybe you can make something of it :P

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this