D3D11 has a max of 4096 shader-addressable float4s in a cbuffer, but in terms of performance and sizing it's not a limit I've personally ever come close to. That's not relevant to the rest of my post because I'm going to suggest something completely different here.
An option 4 you may be interested in is to (ab)use instancing to get similar behaviour to D3D11.1 - under this setup you'd create a large vertex buffer containing all of your per-object data, set it to a stream defined for D3D11_INPUT_PER_INSTANCE_DATA with a step rate of 1, then use the Draw*Instanced calls to draw a single instance of each object (i.e. InstanceCount 1 and StartInstanceLocation as appropriate to index into the buffer). Caveat is that I have no idea how this would perform by comparison to the other options (I suspect however that it may even be substantially faster than your D3D11.1 method as you only need to bind the buffer once per frame rather than binding a new range of it per object, but I honestly don't know), but it would nicely avoid the need to upload lots of dynamic data as well as sidestep the max cbuffer size.