There are many issues with the way it is currently done. The main one is, while waiting for a proper lightmapping implementation, the ambient occlusion values are stored per-vertex in a color component. And per-vertex means being very tesselation dependant. Unfortunately, the building below is on the "good" side; i've seen many buildings that didn't behave as well.
Initially, the algorithm casted for each vertex 256 random rays in the half hemisphere pointed by the vertex normal. Each ray intersects the scene, and the distance is used to compute an "ambient" color. All the results of the rays are averaged, and the end result is a color to store per-vertex. That didn't work very well, because often, a vertex is "inside" a wall, while it was logically on the wall, but the modelers prefered to do it that way to save polys and avoid T-Junctions.
I've experimented an alternative algorithm, based on casting rays for each triangle, instead. 256 rays are still thrown into the scene, but this time from a circle more or less on the center of the polygon. The ambient color of a triangle is then added to its 3 vertices, and in a next pass, all the vertices ambient values are averaged. The result is looking much better, but it's still tesselation dependant, and i don't think there's any way to fully fix it.
Here's a result on a single building:
LEFT: diffuse lighting only
RIGHT: diffuse + ambient occlusion