I'm no expert, but considering the speed of sound (ca. 300 m/s) and the size of a head (ca. 0.3 m), the difference between "sound comes from far left" to "sound comes from far right", which is pretty much the most extreme possible, is 0.5 ms. The ear is able to pick that up without any trouble (and obviously, it's able to pick up much smaller differences, too -- we are able to hear a lot more detailled than just "left" and "right").
In that light, 10ms seems like... huge. I'm not convinced something that coarse can fly.
Of course we're talking about overall latency (on all channels) but the brain has to somehow integrate that with the visuals, too. And seeing how it's apparently doing that quite delicately at ultra-high resolution, I think it may not work out.
If all sounds are delayed the same, I think it might work. 10ms means it starts while the right frame is still displaying.
You usually have some delay in all soundsystems from when you tell it to start playing until it plays, but I don't know how long it usually is... Longer on mobile devices at least.
As long as it's below 100ms or so, I think most people will interpret it as "instantaneous".
Phase shifts and such in the same sound source reaching both ears is another thing.
It would be pretty easy to test...
Edit: Also, to simulate sound and visual-sync properly, you should add some delay. If someone drops something 3m away, the sound should be delayed 10ms.
100ms would be a very long delay, certainly enough to affect the continuity between what is seen and what is heard. This of course would only be an issue for audio sources less than approximately 100 feet from the player.
As a ballpark figure, anything less than 20ms would probably be feasible. The ear has trouble distinguishing separate sources that are delayed by approximately less than 20ms from each other (the Haas Effect) so I'm extrapolating that delays less than this may not be problematic (but I have nothing solid to back this claim up).
You could probably test this by knocking up a virtual piano that plays a note when the mouse is clicked. Keep pushing up the delay between the click and audio trigger until the discontinuity becomes noticeable.