I'm no expert, but considering the speed of sound (ca. 300 m/s) and the size of a head (ca. 0.3 m), the difference between "sound comes from far left" to "sound comes from far right", which is pretty much the most extreme possible, is 0.5 ms. The ear is able to pick that up without any trouble (and obviously, it's able to pick up much smaller differences, too -- we are able to hear a lot more detailled than just "left" and "right").
In that light, 10ms seems like... huge. I'm not convinced something that coarse can fly.
Of course we're talking about overall latency (on all channels) but the brain has to somehow integrate that with the visuals, too. And seeing how it's apparently doing that quite delicately at ultra-high resolution, I think it may not work out.
If all sounds are delayed the same, I think it might work. 10ms means it starts while the right frame is still displaying.
You usually have some delay in all soundsystems from when you tell it to start playing until it plays, but I don't know how long it usually is... Longer on mobile devices at least.
As long as it's below 100ms or so, I think most people will interpret it as "instantaneous".
Phase shifts and such in the same sound source reaching both ears is another thing.
It would be pretty easy to test...
Also, to simulate sound and visual-sync properly, you should add some delay. If someone drops something 3m away, the sound should be delayed 10ms.
I think this is good news. This means a minimum delay of 10ms just means you can't accurately delay sounds closer then 3m, but that shouldn't be much problem, since 3m is close enough that you wouldn't really notice it in real life either.