As previously mentioned I'd settled on using the Effect system to make my life a bit easier so a bit of today was taken up with reading about that and how the vertex declaration system maps into it.
With that clicked in my head the next step was to work out the order of operations.
In the orignal code, as seen a few entries below, we had two major loops;
Loop one summed all the energy in a cell
Loop two redistrubuted it (by gathering from surrounding cells) and built the height map
After that the driving information was updated (one cell had energy added to it).
In the orignal application we then have to build the normal infomation and then render it to the screen before
With this all in mind we need at least 3 render targets;
- height information of the vertex
- normal information for the vertex (2 of these for ping-ponging)
- energy at a given point
On top of that we are going to need source driving data (for this I've selected a single channel 16bit floating point texture) and a vertex buffer for X & Z positional informaion.
Now, with the least possible passes (that I've come up with thus far) you can calculate and render with only 3 drawing calls, however this puts some restrictions on our data types as MRTs must match type when rendering.
Given that the following flow of data can be produced;
With this setup the render types of the energy info and height info need to be the same; as I require 4 pieces of infomation about the energy at a point the only way that's going to work is if a 4 component render target is used. However, the height only needs to be a single channel of information, this means I either need to have 5 render targets (which is impossible on my hardware) or I need to use 2 4 component targets in order to complete the operation in one pass.
However, doing this is very wasteful on space, more so because of the use of floating point render targets. For each height we are wasting 48bits of space (or 3/4 of it), and when you are dealing with large targets this will soon add up. It also means we have more data than we need being shifted around all the time, this is a waste of bandwidth as well, for both the final draw stage and the normal generation stage.
The solution is to add an extra pass, bring us up to a total of 4.
The restructure pass system looks like this;
As you can see, it's all single render target at this point so as to get around the dependancy issue. The only waste here, which was also in the orignal method, is in the normal storing where we only need a vec3 but are having to write a vec4 as there are no 3 component render target types.
So, the final buffer selection is as follows;
- vertex buffer
- render target for height info (single channel fp16)
- render target for energy info (2x 4 channel fp16)
- render target for normal info (4 channel fp16)
- driving texture (single channel fp16)
There is one consideration I haven't looked at yet, and maybe I will when I've got the app up and running; the possibily of producing the normal infomation in pass 1.
This would require a significant change in the way the shader is written and put an increased burden on the fragment processors, but it might be a workable solution to take it back down to 3 passes.
I'll keep it in mind.
With all this planned out it's time to go and