The cachelines started with 16 Vertices on the first Geforce generation and growing since then.
I think you mean 16bytes not 16 vertices.
I think it would be 16 indicies, of 16 bit size, to be rasterized, and if the vertex cache for this indexed rasterizing burst would be 16 bytes, we would be all set quite trivial? (this is also the reason why indicies should index as close as possible, to not escape vertex cache size much, to minimize vertex cache switches for such instanced indexed step draw )
16 bytes seems nearly like some operational register closest level cache.