I'm seriously considering abandoning batching altogether at this point and just drawing each box individually (and transforming on the GPU using a uniform matrix passed in). Vertex data and buffer indices remain the same for each draw call. The only thing that would change is the uniform matrix. Thoughts?
Worth benchmarking and seeing how you go. It's incredibly simple to implement and may turn out to be not a problem at all.