Why do you want one graph to rule them all?
Spatial relationships aren't hierarchical or directed. The relationships between the rooms of an office naturally form a cyclic graph.
Transformation hierarchies naturally form a tree structure, which doesn't require any knowledge or connection to a spatial structure.
Optimal rendering order cannot be determined by traversing a scene graph -- such structures usually have to be linearized and sorted to determine rendering order.
Often different middleware components will contain their own internal representation of your scene. E.g. a physics engine like Bullet or PhysX will contain a "scene graph" and transformation hierarchy internally, which you cannot access. This isn't a problem; there's no need for your visual representation of the scene to be tightly coupled with the physical representation. All you need is a way for the updated physics state (the transforms) to be reflected in the visual structure.
The physical representation does not need to perform tasks like view-frustum culling, or potential-visible set determination, or material sorting, so it's structure will not be optimized for these tasks -- the visual representation of the scene will be organized in a way that's conducive for these tasks though.
The optimal data structure depends entirely on the tasks that must be performed, and it's likely that each task will work optimally with a different data structure. So, forcing tight coupling of all data into an uber-structure is very counter productive.