I don't see a problem with traversing a tree in parallel. Updating it with new state is the issue of course. With "lock step" threading I can see how it would work. Jobs run, store their results, all results are applied to the tree at the end. I'd be sceptical about the resulting performance improvements of course. Anyway will check out the BF presentation later.
On the surface, yes, you're bang on the money. You can certainly have multiple threads be traversing the 'physical' tree in memory without incident. Actually doing work on it, however, is a bit murky. Consider the case of arrays for a moment. You can do a blanket segregation at the beginning, maybe partitioning the whole thing out into n chunks (where n is defined as being at or around the number of physical processor cores on the target machine for simplicity's sake) and then have all your threads go in guns blazing. If your computations are otherwise embarrassingly parallel or you're doing some super scheduling cleverness, you can actually get away with no additional synchronization; threads can just do computation without ever needing to interact with one another. You can actually handle array modifications in a similar way by trying to provide some upper bound on work that each thread would do, segregating output slots ahead of time, then having all threads write their (potentially variable-width) output within that set of bounds. You can then coalesce everything in later (possibly also parallel) passes over that data-- this is actually a fairly common approach (though not often directly seen) used in graphics and it actually has a name: stream compaction. There's quite a bit of research dedicated to doing this effectively and I'd suggest some Google-fu if it sounds interesting, as it's a very broad topic mostly outside the scope of discussion.
So, bringing that massive (bi)tangent back to heel, we're back with the idea of work distribution for arrays vs. trees. This is actually somewhat more subtle, but the speed advantage of trees for searching comes directly from their hierarchical skipping of regions based on previous sorting. Since we're in a hierarchy, we don't actually know what region(s) we need to search next until we run the current set of computations; this really, really kills the parallelism factor since it introduces hard synchronization points, and frequently. You can task multiple queries through the structure and see some speedup there, but it's still the same sort of 'bolted-on' concurrency dead end as things like a 'physics/render thread.' It helps in edge cases, but it's very likely those edge cases won't be a limiting factor or even present at all for most of your processing time. As an example, parallel view queries won't help you if you only have one camera, but parallel queries in of themselves will benefit one camera or 10,000 equally.
clb: At the end of 2012, the positions of jupiter, saturn, mercury, and deimos are aligned so as to cause a denormalized flush-to-zero bug when computing earth's gravitational force, slinging it to the sun.