First: Why do you need the closest point? You should be able to test for collisions without knowing it (you just need to be able to evaluate the distance function at the center of the sphere). Is it because you want to evaluate the gradient there? Have you tried just using the gradient at the sphere's center?
The gradient at the sphere's center (and the distance at the sphere's center) is often incorrect -- for example when the sphere is inside one or both of the shapes that form a union, intersection, or difference. As you pointed out these operations don't seem to be preserving the true-distance-field-ness of the distance function, just providing upper/lower bounds.
I tried adding some damping to my gradient descent, and now it does converge (barring certain degenerate configurations) on the correct closest point. A more sophisticated approach to approximating the gradient might help improve convergence (currently I'm just doing a forward difference by sampling at p, p+(1,0), p+(0,1)).
It seems like fundamentally what I want is impossible: I drew the isosurface contours of a circle+AABB intersection distance function, and it's just not correct... it's like the voronoi regions of the edges of the shape are smeared outward. AFAICT this isn't due to error in my code, rather it's fundamental to approximating e.g the intersection of A and B as max(dist_A, dist_B) -- the correct answer (i.e the true distance) is often neither the distance to the closest point on A or the distance to the closest point on B, because where the boundaries of the two shapes intersect new vertices are formed which are ignored in this approach.
What's weird is that many people seem to be using these sort of signed distance functions (and their composition via union/subtraction/intersection operators) with no problems, e.g http://persson.berke...mesh/index.html