What do you mean by "1 ray, 16 samples"? That you shoot one ray (from eye to surface point) per pixel, reflecting it on the surface normal and approximate a roughness-based cone with 16 samples? Or that you have only shoot one reflection ray and weight it with the pixel's roughness value while you only do 16 depth-comparisons ("samples", in the means of "steps")??
Maybe I got you totally wrong, but my implementation used only one reflection ray per pixel, because more is simply a waste of gpu power in my eyes - that means you can either have only sharp reflections, or you have to blur the result afterwards. Since you don't want to have a blur pass, you can do what I did: mipmap your color buffer before you da the reflections and sample mipmap levels based on your pixel's roughness. Worked very good for reasonably low roughness values, depending on your mipmap algorithm. If you even want to avoid this mipmapping step, you can shoot only one main ray instead of 16 and take multiple samples from your buffer at the intersection point...try stretching them based on the pixel's roughness - but don't expect wonders, I expect this to be slower than using a mipmap step and the quality won't be great and will decrease with the roughness value, as you would have to take more samples for a rougher pixel.