Any DSP gurus - PDM to PCM - noise/aliasing/spectral leakage problem?

Started by
6 comments, last by aregee 9 years, 7 months ago
For the non-expert:
DSP = Digital Signal Processing
PDM = Pulse Density Modulation
PCM - Pulse Code Modulation
I am struggling with what is sounding like an evenly distributed hiss on top of otherwise very good sounding audio when decoding pulse density modulated audio to pulse code modulated audio.
The bitstreams I am working with are 1 bit 2.8225 Mhz PDM, and I am "resampling" it to 16 bit 44.1 kHz PCM. That translates to about 64 one bit samples per 16 bit PCM samples at 44100, thus I am using a 64 order window function to make PCM out of PDM.
How I am processing the audio is like this:
First I am making a precalculated window function like this:


        for (int i = 0; i < 64; i++) {
            fir64Coff[i] = (double_t)(1 << 10);
            //Hanning Window (less noise than hamming?
            fir64Coff[i] *= 0.5 * (1 - cos((2 * M_PI * i)/ (64 - 1)));


            //Hamming Window
//            fir64Coff[i] *= 0.54 - 0.46 * cos((2 * M_PI * i) / (64 - 1));
            
            //Nuttall Window
//            fir64Coff[i] *= 0.355768f - 0.487396*cos((2*M_PI*i)/(64 -1)) + 0.144232*cos((4*M_PI*i)/(64-1))-0.012604*cos((6*M_PI*i)/(64-1));
  
            //Cosine Window
//            fir64Coff[i] *= sin((M_PI * i)/(64-1));


   //Blackman Harris            
//            double_t w = (2 * M_PI * i) / 64;
//            fir64Coff[i] *= 0.422323 - 0.49755 * cos(w) + 0.07922 * cos(2 * w);
        }
Hamming/Hanning sounds the best of the window types I have tried.
Here is how I generate PCM from PDM: (Be warned this is just some pilot tinkering.)


- (uint64_t)getBytes:(void *)buffer sampleCount:(long unsigned)count {


    if (mIsEndOfSong) {
        return 0;
    }


    [mutex lock];


    int16_t *sampleBuffer = (int16_t *)buffer;
    
    for (uint32_t currentSample = 0; currentSample < count; currentSample++) {
        uint64_t testLeft = 0;
        uint64_t testRight = 0;


        int64_t sumLeft = 0;
        int64_t sumRight = 0;


        uint8_t coefficientIndex = 0;
        
        for (uint8_t i = 0; i < 8; i++) {
            if (currentBufferPos >= blockSize) {
                if (![self readChunk]) {  //Here I fetch left and right channel blockSize bytes (Usually 4096)
                    [mutex unlock];
                    return currentSample;
                }
            }
            
            testLeft = bufferLeft[currentBufferPos];
            testRight = bufferRight[currentBufferPos];
            
            for (uint8_t i = 0; i < 8; i++) {  //Two nested for loops, 8 * 8 = 64 bits, lsb first per byte
                if ((testLeft & 0x01) == 0x01) {
                    sumLeft += fir64Coff[coefficientIndex];  //These are the pre-calculated window values
                }
                else {
                    sumLeft -= fir64Coff[coefficientIndex];
                }
                
                if ((testRight & 0x01) == 0x01) {
                    sumRight += fir64Coff[coefficientIndex];
                }
                else {
                    sumRight -= fir64Coff[coefficientIndex];
                }
                
                testLeft >>= 1; //Next bit
                testRight >>= 1;
                
                coefficientIndex++;
            }
            
            currentBufferPos++;  //next byte
        }


        sampleBuffer[(currentSample << 1) + 0] = (int16_t)sumLeft;
        sampleBuffer[(currentSample << 1) + 1] = (int16_t)sumRight;
    }
    
    [mutex unlock];
    return count;
}
Pretty simple, actually, and it sounds great - except for the noise. I am not sure what I am doing wrong. I am sure super audio is not supposed to have varying degree of noise in them. I have read that when audio is being sampled, a noise shaping function is being used, and noise generated during this process is pushed up into the inaudible frequency range. I am wondering if the noise I am hearing can be this high frequency noise being aliased back into the perceptible frequency area.
Does anyone have any tips for me, how to get rid of the noise? I tried to make a low pass filter too, but that made the noise even louder and with no music at all, so there is probably something I did wrong.
Here is the function I made for that, if it can be of any help:


- (void)lowPassFIR:(double_t*)firBuffer withSampleRate:(uint64_t)Fs cutOffFrequency:(uint64_t)Fc filterLength:(uint64_t)M {
    M = M -1;
    double_t Ft = (double_t)Fc / (double_t)Fs;


    double_t sum = 0.0f;
    
    for (uint64_t i = 0; i < M; i++) {
        if (i != ((double_t)M / 2)) {
            firBuffer[i] *= sin(2.0f * M_PI * Ft * (i - ((double_t)M / 2.0f))) / (M_PI * (i - ((double_t)M / 2.0f)));
        }
        else {
            firBuffer[i] *= 2.0f * Ft;
        }
        
        sum += firBuffer[i];
        printf("%f\n", firBuffer[i]);
    }
    printf("Sum: %f\n", sum);
}
Edit: it is worth mentioning that I am trying to learn DSP, but I am an utter noob in this area. I have a vague understanding, but not all the pieces are in place yet, so I think I have done pretty well, considering I have just jumped into this. This is also something I am doing just for fun and to learn, and weirdly something as recognised as VLC can't play either.
Edit2: I have a feeling the noise comes because I am not properly "moving" the window, just taking a weighted average starting at every 64 bits, but I am trying to reduce the massive amount of processing I need to do if I would do a moving average of every 64 bit bits before I have produced even one PCM sample. That would make 64 * 64 = 4096 loop iterations for each PCM sample, 4096 * 44100 = 180.633.600 iterations per second... I think my computer would have hiccups long before that.
Another theory I have, is that I am just discarding the remainder of the double after a full integer, and that is generating some dithering noise.
Edit3: Just got an idea to get completely rid of the if-sentences in the inner loop. I can just make a lookup table for that as well... That means I can get rid of the whole inner loop completely.
Advertisement

I made some changes to the above program. This is doing exactly the same as above, but is much faster. I went from about 10% CPU usage to about 4%.


        //Pre-Calculate the FIR coefficients for Hamming Window with gain
        for (int i = 0; i < 64; i++) {
            fir64Coff[i] = (double_t)((1 << 10) - 1);

            //Hanning Window
            fir64Coff[i] *= 0.5f * (1.0f - cos((2.0f * M_PI * (double_t)i) / (64.0f - 1.0f)));
        }

        //Precalculate the sums for byte-bit combinations
        precalculated = malloc(sizeof(double_t *) * 8);

        for (uint8_t i = 0; i < 8; i++) {
            precalculated[i] = malloc(sizeof(double_t) * 256);
            
            for (uint16_t j = 0; j < 256; j++) {
                uint8_t test = j;
                double_t sum = 0;

                uint8_t coefficientIndex = 8 * i;

                for (uint8_t k = 0; k < 8; k++) {
                    if ((test & 0x01) == 0x01) {
                        sum += fir64Coff[coefficientIndex];
                    }
                    else {
                        sum -= fir64Coff[coefficientIndex];
                    }
                    
                    test >>= 1;

                    coefficientIndex++;
                }
                
                precalculated[i][j] = sum;
            }
        }

And the play routine:


- (uint64_t)getBytes:(void *)buffer sampleCount:(long unsigned)count {

    if (mIsEndOfSong) {
        return 0;
    }

    [mutex lock];

    int16_t *sampleBuffer = (int16_t *)buffer;
    
    for (uint32_t currentSample = 0; currentSample < count; currentSample++) {

        int64_t sumLeft = 0;
        int64_t sumRight = 0;

        for (uint8_t i = 0; i < 8; i++) {
            if (currentBufferPos >= blockSize) {
                if (![self readChunk]) {
                    [mutex unlock];
                    return currentSample;
                }
            }

            sumLeft += precalculated[i][bufferLeft[currentBufferPos]];
            sumRight += precalculated[i][bufferRight[currentBufferPos]];
            
            currentBufferPos++;
        }
        
        sampleBuffer[(currentSample << 1) + 0] = (int16_t)sumLeft;
        sampleBuffer[(currentSample << 1) + 1] = (int16_t)sumRight;
    }
    
    [mutex unlock];
    return count;
}

You seem to be using bad and/or nonsensical filters.

Ideally, you want to run your PDM waveform through an ideal lowpass filter with cutoff rate equal to the target Nyquist frequency (for CD audio, 22050 Hz): the corresponding FIR coefficients are an infinitely long sinc waveform.

In the real world, your options include

  • exact computation for a periodic signal using a Fourier transform
  • hopefully cheap, but difficult to design, approximation by IIR filters (not recommended for a beginner, probably not recommended for anyone).
  • decent approximation with FIR filters: 64 coefficients are VERY few. The correct way to use window functions is multiplying them by a sinc function to obtain a "windowed sinc function" that sounds as good as possible for a given number of coefficients.
    Your "lowPassFIR" function appears to be a rather correct starting point, but coefficients are constant and they should be kept in an array for performance reasons (sparing sine evaluations and allowing parallel computations).

You shouldn't perform the ugly and wrong filtering by bell-shaped functions at the beginning. Conceptually, resampling from 2822400 to 44100 Hz follows lowpass filtering; you can simply compute one output in 64 of the 2822400 Hz filter from the original values, without inappropriate preprocessing.

Omae Wa Mou Shindeiru

Well I think I can see a simple error which might be the problem.

if (currentBufferPos >= blockSize)
{
       if (![self readChunk])    // load in next block
       {
             [mutex unlock];
              return currentSample;
       }
}

So say your block size is 4096, when currentBufferPos == 4096 you read the next buffer, ok

      sumLeft += precalculated[i][bufferLeft[currentBufferPos]];                 // currentBufferPos == 4096 !!!!
      sumRight += precalculated[i][bufferRight[currentBufferPos]];            // currentBufferPos == 4096 !!!!
            
      currentBufferPos++;                                                                          // currentBufferPos == 4097

It could be that currentBufferPos is reset in the code that reads in the new block, but if not it definetly looks wrong.

Well I think I can see a simple error which might be the problem.


if (currentBufferPos >= blockSize)
{
       if (![self readChunk])    // load in next block
       {
             [mutex unlock];
              return currentSample;
       }
}

So say your block size is 4096, when currentBufferPos == 4096 you read the next buffer, ok


      sumLeft += precalculated[i][bufferLeft[currentBufferPos]];                 // currentBufferPos == 4096 !!!!
      sumRight += precalculated[i][bufferRight[currentBufferPos]];            // currentBufferPos == 4096 !!!!
            
      currentBufferPos++;                                                                          // currentBufferPos == 4097

It could be that currentBufferPos is reset in the code that reads in the new block, but if not it definetly looks wrong.

Oh no, I didn't include that here, but currentBufPos is reset to 0 inside the same function that refills the buffers. I would probably have more serious errors, if I didn't reset the buffer pointer.

I am sure LorenzoGatti is closer to what my problem is, which is what I suspect in the first place.

You seem to be using bad and/or nonsensical filters.

Ideally, you want to run your PDM waveform through an ideal lowpass filter with cutoff rate equal to the target Nyquist frequency (for CD audio, 22050 Hz): the corresponding FIR coefficients are an infinitely long sinc waveform.

In the real world, your options include

  • exact computation for a periodic signal using a Fourier transform
  • hopefully cheap, but difficult to design, approximation by IIR filters (not recommended for a beginner, probably not recommended for anyone).
  • decent approximation with FIR filters: 64 coefficients are VERY few. The correct way to use window functions is multiplying them by a sinc function to obtain a "windowed sinc function" that sounds as good as possible for a given number of coefficients.
    Your "lowPassFIR" function appears to be a rather correct starting point, but coefficients are constant and they should be kept in an array for performance reasons (sparing sine evaluations and allowing parallel computations).

You shouldn't perform the ugly and wrong filtering by bell-shaped functions at the beginning. Conceptually, resampling from 2822400 to 44100 Hz follows lowpass filtering; you can simply compute one output in 64 of the 2822400 Hz filter from the original values, without inappropriate preprocessing.

I had a feeling that 64 coefficients were a bit few. I tried with a lot more earlier, but with a different approach than I did here, and the result I got when I tried was more and more smearing (=dull sound) the more coefficients I used, and more crisp sound with fewer coefficients.

I will take your advice and redesign as you are suggesting at the end. It fits with the idea that a low pass filter is all you need to decode the data, something I am obviously not doing here. Thank you for pointing me in the right direction. smile.png How many coefficients do you think would be sane?

Edit: I agree to stay away from IIR filters. I would probably design one that would blow my speakers... ;)

Edit2: Thank you for mentioning parallel computations, that didn't even occur to me! That would probably give me a little bit more room to wiggle, if I would be short on processing time.

Edit3: I know that the coefficients are constants, and I am actually treating them as such. In my thinking, since I am working on 1-bit data, I don't need to multiply any coefficients, just add them from the precompiled values.


How many coefficients do you think would be sane?

As many as possible; it depends on your budget of multiplications and additions per output sample (which is likely to be in the tens or hundreds of thousands on a modern CPU).

What is your intended application? Do you need to decode SACD audio in real time, as opposed to performing a high-quality conversion to a more usable PCM format in advance? Are you doing other tasks while converting audio?

Another issue: extending the lowpass filter introduces a delay between input and output, which can be compensated in batch processing (just offset buffers by the correct number of samples for mixing with other sources and/or cut the initial silence in output) but not always in music reproduction or similar realtime applications (you might be unable to begin playing samples in advance).

Omae Wa Mou Shindeiru


How many coefficients do you think would be sane?

As many as possible; it depends on your budget of multiplications and additions per output sample (which is likely to be in the tens or hundreds of thousands on a modern CPU).

What is your intended application? Do you need to decode SACD audio in real time, as opposed to performing a high-quality conversion to a more usable PCM format in advance? Are you doing other tasks while converting audio?

Another issue: extending the lowpass filter introduces a delay between input and output, which can be compensated in batch processing (just offset buffers by the correct number of samples for mixing with other sources and/or cut the initial silence in output) but not always in music reproduction or similar realtime applications (you might be unable to begin playing samples in advance).

I am actually trying to decode SACD audio in real time, and it is a project just for fun and learning experience. Writing everything from scratch when there are other products that does the job for you. Actually, my app is turning into a nice and light audio player that can currently play WAV, FLAC, MOD, SACD and I am also working on MP3, and it can handle a massive library. The GUI is not great, but it is working. I am planning to throw away the GUI and go for a web-view instead, but I am not completely sure yet. I have much more plans in the future for other nice features too, but the essence is that I want the application to be responding fast, although that is not always possible when you have a system that stalls while waiting for I/O all the time, but I have implemented things to make that experience better, like loading things in the background and not switching to a new song until the other song is ready (showing an indicator if the system is lagging for more than half a second so the user doesn't think the application is hanging, or you end up losing the beginning of the next song due to buffer under-run.)

regarding SACD, playback is actually sounding pretty good, if you can ignore the hissing noise. It is not the kind of unpleasant noise, more like a soft low untuned treble radio out of tune-white noise. The noise has an "analog" quality, if I can put it like that. Of course I am going to remove that, but I am surprised it sounds as good as it does with my flawed FIR filtering. My goal is to make the sound as good as it possibly can get in real time. That also means supporting 192 kHz and such in the future, but for now I am a bit scared of the aliasing artefacts that I have read can destroy audio equipment, if you are not using a proper low pass filter. I think 44.1 kHz is good for a pilot.

I am not doing other tasks while converting Audio, but the application should ideally work together with use of other application on the host OS.

If I understand you correctly, you are saying that using a FIR filter with a lot of coefficients will introduce latency in the playback of the audio stream. I am not sure if I am too concerned with that, unless the latency gets too big. I would guess 0.5-1 second would be acceptable, but that is just a number I picked arbitrarily. I know that in realtime applications, you need to be aware of causality, but is it not real time like in a live gig, it is more like files on a hard drive, so some tricks would probably be possible with a fast computer. (Thinking about the same discarding of silence that you mentioned.)

Fortunately, regarding my budget, as I mentioned, since it is a 1-bit stream, I think I can get away with multiplications using a lookup table. I would only multiply by 1 and -1 anyway, which is a waste to actually perform such multiplication.

Edit: Weirdly some songs have a lot of noise, but other songs I can barely hear the noise. This fact makes me think that the noise I am hearing is aliasing for not filtering out the high frequencies. It seems to be dependent on how the SACD is produced if I hear a lot of noise or not.

Also, I find it encouraging that playback can be this good on my turtle-computer, using only 4% CPU. FLAC decoding easily consumes more CPU than that, although implementing small and simple things like pre-buffering instead of reading directly from file, made decoding way much faster. (30% down to 6-8% CPU.) I found that to be acceptable for now, although I will try to optimise more when I feel the product is nearing completion.

To Stainless: There is one thing I haven't told you... That there actually IS a bug in the code regarding loading the buffer, although it shouldn't matter unless the bitstream is malformed. If I prematurely reach end of file, the last buffer will be discarded. (Say you only read 100 bytes instead of 2 * 4096 bytes before reaching end of file, you still have 100 bytes to process, but it will be discarded.) Edit again: Actually, if you don't get to read a full chunk, you don't have the data for both audio channels, so discarding is the only sane thing to do. Unintentionally did the right thing lol... :)

Oops... Seems I am getting sidetracked again... Fun to have something that has "flawlessly" been working for weeks crash like this:


STAAudioCoordinator: could not fill entire buffer, will clear the rest of the buffer for now. Read 512.
Block Size: 3383413356
Last Sample: 7302926928557966667
objc[6886]: Hash table corrupted. This is probably a memory error somewhere. (table at 0x7fff72991490, buckets at 0x1009ea600 (4096 bytes), 256 buckets, 31 entries, 87 tombstones, data 0x84e54b0084ce8b 0x8512d10084fc0d 0x85405f00852997 0x856df400855729)
Hiding progress bar.
objc[6886]: Hash table corrupted. This is probably a memory error somewhere. (table at 0x7fff72990e10, buckets at 0x100a03200 (4096 bytes), 256 buckets, 26 entries, 82 tombstones, data 0xa92bea00a9124f 0xa95f2700a94588 0xa9926c00a978c8 0xa9c5b700a9ac11)

Pretty obvious what happened here, but why? Time to put some sanity checking into the DSF-player at least, but those values are just insane...

Edit:

Found one error that contributed, but I don't think this is the whole story:

Before:


    if ([iStream read:bufferLeft maxLength:blockSize] == 0) {mIsEndOfSong = YES; return NO;}
    if ([iStream read:bufferRight maxLength:blockSize] == 0) {mIsEndOfSong = YES; return NO;}

After:


    if ([iStream read:bufferLeft maxLength:blockSize] != blockSize) {mIsEndOfSong = YES; return NO;}
    if ([iStream read:bufferRight maxLength:blockSize] != blockSize) {mIsEndOfSong = YES; return NO;}

Sometimes I just shake my head of the silly errors I make...

Edit 2:

Seems I located the source for the error. I have postponed refactoring the bit file reader for too long. Time to solidify and complete all the functionality I have planned for this class. Seems it get hiccups at end of file situations sometimes. Also the weird thing that it tends to sometimes cut the end of songs, not only on FLAC but also on DSF indicates that it is the bit file reader that is buggy and needs some more work.

This topic is closed to new replies.

Advertisement