The short version: AO approximates the occlusion of direct lighting from a distant hemispherical light source

The long version:

AO essentially approximates the shadowing/visibility that you would use for computing diffuse lighting from a distant light source that covers the entire hemisphere that's visible for a surface (hence Hodgman's comment about sky lighting). It actually works fairly well for the case where you have an infinitely distant light source that has a constant incoming lighting for every direction on the hemisphere. To understand why, let's look at the integral for computing AO:

AO = IntegrateAboutHemisphere(Visibility(L) * (N dot L)) / Pi

where "Visibility(L)" is a visibility term that returns 0.0 if a ray intersects geometry in the direction of 'L', and 1.0 otherwise. Note that the "1 / Pi" bit is there because the integral of the cosine term (the N dot L part) comes out to Pi, and so we must divide by Pi to get a result that's of the range [0, 1].

Now let's look at the integral for computing direct diffuse lighting (no indirect bounce) from an infinitely distant light source (like the sky), assuming a diffuse albedo of 1:

Diffuse = IntegrateAboutHemisphere(Visibility(L) * Lighting(L) * (N dot L)) / Pi

where "Lighting(L)" is the sky lighting in that direction.

When we use AO for sky light, we typically do the equivalent of this:

Diffuse = AO * IntegrateAboutHemisphere(Lighting(L) * (N dot L)) / Pi,

which if we substitute in the AO equation gives us this:

Diffuse = (IntegrateAboutHemisphere(Visibility(L) * (N dot L)) / Pi) * (IntegrateAboutHemisphere(Lighting(L) * (N dot L)) / Pi)

Unfortunately this is not the same integral that we used earlier for computing direct lighting, since you generally can't just pull out terms from an integral like that and get the correct result. The only time we can do this is if the Lighting term is constant for the entire hemisphere, in which case we can pull it out like this:

Diffuse = Lighting * IntegrateAboutHemisphere(Visibility(L) * (N dot L)) / Pi

Which means that we can plug in the AO like so:

Diffuse = Lighting * AO

and it works! Unfortunately, the case we've constructed here isn't a very realistic one for two reasons:

- Very rarely is the incoming lighting constant in all directions surrounding a surface, unless perhaps if you live in The Matrix. Even for the case of a skydome you have some spots that are brighter with different hue due to sun scattering, or cloud coverage. For such cases you need to perform Visibility * Lighting inside of the integral in order to get the correct direct lighting. However, the approximation can still be fairly close to the ground truth as long as the lighting is pretty low frequency (in other words, the intensity/hue doesn't rapidly change from one direction to another). For high-frequency lighting, the approximation will be pretty poor. The absolute worse case is an infinitely small point light, which should produce perfectly hard shadows.
- In reality, indirect lighting will be bouncing off of the geometry surrounding the surface. For AO we basically assume that all of the surrounding geo is totally black and has no light reflecting off of it, which is of course never the case. When considering purely emissive light sources it's okay to have a visibility term like we had in our integral, however you then need to have a second integral that integrates over the hemisphere and computes indirect lighting from all other visible surfaces.

Reason #1 is why you see the general advice to only apply it to "ambient" terms, since games often partition lighting into low-frequency indirect lighting (often computed offline) that's combined with high-frequency direct lighting from analytical light sources. The "soft" AO occlusion simply doesn't look right when applied to small light sources, since the shadows should be "harder" for those cases. You also don't want to "double darken" your lighting for cases where the dynamic lights also have a dynamic shadowing term from shadow maps.

As for #2, that's tougher. Straightforward AO will *always* over-darken compared to ground truth, since it doesn't account for bounce lighting. It's possible to do a PRT-style computation where you try to precompute how much light bounces off of neighboring surfaces, but that can exacerbate issues caused by non-constant lighting about the hemisphere, and also requires having access to the material properties of the surfaces. It's also typically not possible to this very well for real-time AO techniques (like SSAO), and so you generally don't see anyone doing that. Instead it's more common to have hacks like only considering close-by surfaces as occluders, or choosing a fixed AO color to fake bounced lighting.