I am using an Octree in my prototype and it's working well enough. Here are some high level notes on my implementation:
* I rebuild the whole tree every frame. Yes, it's a bit more expensive than doing an in-place update, but it is a simpler solution. Good enough for a prototype!
* I store every octree region as a bounding box (in XNA, that's just two vector3's, or 24 bytes of memory)
* I only let a node exist if there is actually data inside of it, whether its objects or child nodes.
* In order to figure out what I need to draw, I intersect the camera view frustum against the octree and get a list of intersections. The intersected objects are the only objects I need to draw.
* I maintain a static list of all objects in the octree. Rather than visiting each node recursively and updating objects, I just traverse the master list and update from there (so much more simple).
* Collision detection is pretty straight forward: You only have to test for collision of an object against all objects within its containing node and any child nodes. In a perfectly balanced octree, this runs in O(log base 8 N).
Before you go rushing off to implement an Octree, you should back up a few steps and implement and profiler to measure how long each section of your code takes. What is its actual performance (measured in ticks or microseconds)? This shouldn't take more than 2-3 hours max to implement (compared to days for a good tree implementation from scratch). Once you have a good profiler to measure execution time, you can measure specific sections of your code and identify bottlenecks. It's much more scientific than guesswork. You might be optimizing a problem you don't have... ie, you're rendering hundreds of high res meshes to the screen even if they're really far from the camera, in which case you want to look at a "Level of Detail" switching scheme instead.