I perfectly know about that talks but the 360/PS3 way is not anymore. Beeing an in order RISC processor was a pain in the ass for code size and branching. Things as needed as possible in frustum culling like the float compare instruction was really costly too. Add the need for simple data layout to not saturate the DMA communication with the SPUs and the battefield choice may be legitime in their context.
But today and tomorow's engine will again scale the number of primitives to manage and draw by a good amount. Hierarchical layout will strike back easely with the more modern hardwares working more efficiently with branching.
And being hierarchical do not means we have to go down to single primitive leaves, everything is in the balance between the overhead of the hierarchy and the raw test. May be instead of storing hundreds of primitive in leafs, we will store thousands and dispatch each leaf on separate thread. In the end, only profile session and testing will give the answer, an answer dependant of the context of each game.
And strangely, on a previous game i work on, with a lot of instances to manage, one optimisation i did was to strip the hierarchical split of the culling after it reach a too small size because it was more efficient. So no, i am not a pro hierarchical, i am just a pro performance :)