What about adding N box geometries to the same VBO, just one after the other. For each geometry, add an additional integer attribute which is constant for each box (0 to N-1). Then you have basically a handrolled glDraw*Instanced in batches of maximal N, with your integer attribute taking the role of gl_InstanceID.
You'd need to do add that attribute for each vertex, so a lot of extra integers.
I guess you'd have to put the matrixes in a texture too.
My gut says it will be slower then just transform on CPU, but I can't say I know