That said, a solution will probably involve call glDraw less often than once-per-cube.
True, but how ? Since every cube needs its own scaling, rotation and translation, how can I combine glDrawArrays() calls?
Option 1: Software transform the vertices. That is, do the matrix multiply on the CPU, then you can send all the cubes in one go.
Option 2: 'Hardware skinning' style solution. Instead of doing one cube at a time, put 16* cubes into your VBO. The vertices for each cube have indices, which you use in your vertex shader to look up into an array of matrix uniforms.
From experience doing similar things on iOS, I'd expect option 1 to be the better choice.
*Probably you'll want this number to be as large as possible within the constaints of the max amount of uniform space your GPU supports.