GLSL port of SAO is pretty inefficient but its memory bound so ALU ops does not mean much in most case. Sin/Cos are not problem(can be replaced with 2x2 rotation matrix if needed) but the integer math/ integer uv sampling.
Try to reduce samples of your full resolution ssao. This will increase high freq noise but this is lot easier to deal than upscaling artifacts with high freq content. You can also try to use half resolution depth buffer as input(with 16bit depth) when still doing full resolution ssao.(current pixel use full res)