I am trying to support on my framework DX9, DX11 and DX12 for that I really need to know many things of the way things are done for each API
It's worth pointing out that before DX10 came (i.e. pre 2005), the advice of "how things should be done" in DX9 was different.
This is because:
- New techniques were discovered that ended up faster than recommended approaches
- Hardware was actually based around constant registers (current hw is based around constant buffers)
- Drivers change and they optimize for different patterns in their heuristics.
- Rendering the old "D3D9 way" and then porting straight to D3D11 resulted in horrible performance (9 performing considerably better than 11).
- Rendering the "D3D11 way" and then mimicking the same in D3D9 resulted in good performance (11 beating 9; and 9 having similar or even better performance than the 'other way' using 9).
- Having two separate paths for rendering "the D3D9 way" and "the D3D11 way" is a ton of work, and sometimes too contradictory.
Same could be said about 2008's era DX10 recommendations and 2015's era DX11 recomendations (which still apply to 10). We do things differently now (not completely different, but rather incrementally different), basically because we found new ways that turned out to be better.
Because it's an incremental evolution, the general advises from 2003 still hold true (i.e. batch batch batch presentation from NVIDIA for the GeForce 6).
However I personally consider batch submission a problem solved since D3D11 added the BaseVertexLocation and StartInstanceLocation which lets for multi-mesh rendering without having to change the vertex buffer; and the research on display lists which allowed to fill constant buffers and prepare draw commands from within multiple threads (which btw got enhanced in D3D12); which is basically impossible to do in D3D9 (yet though you can create a layer that emulates this and surprisingly sometimes works even faster due to better cache coherency during playback and being able to distribute work across cores during recording).
But if for some odd reason you're batch limited even in D3D11/12, the general thinking from the batch, batch, batch presentation still holds (i.e. batch limited = increasing vertex count becomes free). For example to combat batching issues, Just Cause 2 introduced the term merge-instancing.
If you need an historical archive of the API evolution. Go to the GDC Vault, and download the API slides in chronological order; then google for SIGGRAPH 2003-2015 slides in chronological order, Google the "Game Fest" slides from Microsoft (I think they were from 2006 through 2009) and visit the NV (2007, 2008, 2009, 2010, change the address bar for the rest of the years) and AMD sites (archive), go to the SDK/Resources section, and look for the older entries.