So I'm correct in assuming that you're using the same approach as your specular where you're sampling off of a fairly high resolution HDR cubemaps for your diffuse results?
If so, don't bother, because as you've figured out for yourself that will require a massive amount of samples to get working nicely as you're sampling the entire hemisphere rather than a focused area as you would with glossy specular materials.
What you could do here is filter and downsample a seperate HDR cubemap offline just like how you would approach filtering your specular mip levels for high roughness values, but just for your diffuse term. You can use a similar importance sampling approach to do this. This gives you a fairly tiny texture which should give very acceptable results with only a minimal amount of samples required. At this point you should consider just doing an SH representation though as that will give pretty much the exact same results (as you're dealing with very low frequency data anyway) with a lower memory footprint, but if you're not familiar with that just yet you can experiment with just the separate diffuse texture.
Whether you use an SH approach or a texture-based one will not make a huge change in the end result as we're dealing with very low frequency data here anyway. Spherical harmonics are just another way to represent your data, they're not a different way of generating that data. You'll still need to sample your diffuse data somehow before you can store it.
Additionally you won't need any type of LUT as your diffuse term is usually solely dependent on your normal vector, unless you're doing some expensive fancy diffuse model which is view-dependent or which takes surface roughness into account. Even if that would be the case, this usually results down to taking your fresnel factor into account somewhere which can be accounted for after sampling your diffuse IBL.
(PS: Dag Kevin, lang geleden!)