For isolating vertex shader performance I would suggest:
- You need a large data set, in a vertex buffer in GPU memory, which can be submitted in a single draw call.
- Use points rather than triangles to simplify (and therefore minimize the overhead of) primitive assembly, clipping, culling, etc.
- Transform each point to outside of the view frustum so that it's discarded by the pipeline as soon as possible after the vertex shader, and no subsequent shader stages run.
That will give you a reasonably accurate number, but I'd suggest that the number you get is actually useless. It has no bearing whatsoever on the kind of performance you'll get in a real-world program, and the mention of pipeline stages you optimize or skip above should hint why: because a real-world program won't be optimizing or skipping these stages, and will therefore have extra load on both the CPU and GPU that your test doesn't measure.