If the CPU doesn't transform ALL object vertices to world-coordinates, it doesn't know the object AABB.
It does - you calculate an initial bbox from the initially untransformed vertexes at load time, then you just transform the corners of the bbox. 2 transforms versus one per-vertex is all that you need.
Even Quake 2 did that back in 1997. Really, you're coming across as if you've created an artificially complex solution to a problem that has already been solved in a much simpler, cleaner and more performant manner well over a decade ago.