You can store per instance data in a second vertex buffer. The Input Assembler combines the per vertex data and the per instance data for each vertex shader call.
So there are no SV semantics for a matrix per vertex. Honestly you probably don't need a matrix per vertex unless you're doing skinning, in which case it's better to just pass it in as a matrix array in a constant buffer.
BTT: It's better to upload just a single matrix to the GPU, because this would result in just 4 DP4 instructions. While uploading position, rotation and scale would result in way more instructions. Quaternions are probably faster though.
Also, you don't need to use the TEXCOORD# semantics anymore. Since DirectX 10 you can use any semantic name you want. To upload a matrix you simply upload the 4 float4 values with the same semantic name but different indexes, eg. WVP0, WVP1, WVP2, WVP3.