not personally all that familiar with NIF.
IME:
graph/tree structures seem to be best suited for a certain size-range, namely with an upper limit of "amount of stuff that may be present in the visible scene".
much past this point, and a uniform grid seems like a better solution.
above this point, basically, the trees offer no obvious advantage, but a lot more management overhead.
whereas a grid is much simpler, and allows more easily loading/unloading parts of a much bigger world.
for example, my engine uses a uniform grid for everything much past around 0.5km ("regions"), but may use BSP-like trees for organizing stuff within these regions (note: regions also contain things like voxel terrain, which is basically organized as a uniform 3D grid of "chunks", each of which is a uniform 3D grid of voxels, and with each chunk possibly having associated mesh geometry).
granted, this is for a first-person vantage point, and currently the player can only see about 256 meters in any direction, meaning that usually only at most 4 regions will be visible by the player at any given time. different gameplay styles may be different.
if I were doing it now, I might actually consider smaller regions, namely 0.25km, but this would be mostly so that regions could (cleanly) be perfect cubes (16x16x16), which when mixed with chunks, would mean: 16x16x16 chunk within 16x16x16 region = 256x256x256 voxels (meters) per region. (the change itself would be trivial, but currently would break compatibility with my existing region files, essentially requiring a full reset of my test world...). (easier would be 32x32x32, but this would waste a lot more memory... which as-is, is a tight resource for a currently 32-bit engine...).
then again, I also had idle thoughts of if there were a "metric cubit" of 0.5m, ...
otherwise:
actually, in my case, there is a client/server split, and each end has their own copy of the scene.
server-side scene:
voxel terrain (persistent / canonical);
entities (used for game logic, AI, physics, ...).
client-side scene:
more voxel terrain (currently non-persistent, streamed by server);
"client entities", which mostly just hold information about origin, rotation, model-name, ...
these are streamed by the server and represent a small subset of the entity fields.
static light sources (streamed by the server);
...
note that dynamic light sources exist, but are typically generated directly from client-entities, based mostly on effect flags (flags are used mostly to indicate the color and intensity of an emitted dynamic light source, as well as other effects like if it leaves a particle trail, ...).
the client code basically has its own view of the scene, but then tells the renderer about whatever seems relevant rendering wise (managing things like light-sources and particle emission, ...). there is partly a split as the client basically has a "high level" view of the scene (models represented via names, effects via flags, ...), but the renderer is more concerned with more concrete representations (like being handed renderable models).
the renderer has its own view of the scene:
"modelstates", basically, representing instances of potentially visible models (as per the client code);
they may also hold some amount of "state" for the models, like their current VBOs, bone positions and calculated vertex positions, ...
position is represented via a transformation matrix, ...
light sources (static lights, dynamic lights, ...);
...
some of this is because each part of the process needs different information.
like, stuff relevant to AI or physics generally isn't nearly as relevant for rendering, ...