You can use a bilateral blurring filter.
When taking nearby samples for blurring, you don't (just) use hard-coded weights (Gaussian, etc). You also create a second set of weights based on how "valid" the sample is, optionally multiply these with your hard-coded Gaussian weights, then renormalize the weights for all samples so that they sum to 1.0.
To determine whether a sample is valid or not, you can use a colour threshold (e.g. so if the centre is white, a black sample will be rejected, but a slightly grey sample will be accepted), a depth threshold (so if the samples differ in Z compared to the centre by too much, they're rejected), etc...
Sometimes you'll see a bilateral blur filter that's based on Z values called a Depth Sensitive Filter.
I've used this to soften SSAO before, and I implemented both the colour-threshold and depth-threshold versions, and then tweaked them both before deciding which one worked better for my game