I'm not dying from one but they tend to knock my will todo anything (yay for pressure pain around the eyes!) and generally make me 'meh' at the world. I had a cold for a while and I thought I'd shifted it.. and I had... only for another stealth cold to appear and take over things *grumbles*
So, I've not been in the best 'working' mood the last week or so, however some progress on the TLM app has been made.
As mentioned a few days back I got a simple D3D9 app up and running, mostly so I could go 'oooo stuff' at it and learn how D3D works. The last couple of days, in fits and starts, I've converted it over so that the rendering now exists in a class on it's own. If a certain Jack looked at the code layout he'd probably find it familar [grin] basically I liked his app lay out that much I've gone with pretty much the same thing; a few tweaks here and there as the code is different but it's along the same lines. (Jack; expect thanks in the ack. section of my final project report [grin]).
After a bit of poking and some strangeness with my VBs/IBs I'm back to having a plane drawn on the screen... huzzah! [grin]
Next I need to work out how I'm going to do this, shader wise. In OGL I'd just load up a bunch of shaders and bind/setup as required and it would be easy as I know how it all works in OGL. For D3D I'm thinking I'm going to have to take the effect route, just because I suspect it'll be easier on my brain than not doing so. Again, the D3D SDK and Jack's code should act as a decent enuff guide for what I'm doing.
On a related note; Sunday morning I answered a question in the OGL forum which got me thinking about ATI's CTM program again.. which led to a swift docs update and 3h of reading various pdfs in order to understand things.
I think I get how it all works now, which is a bonus considering I'd like to use it for the project (this is the 'showy' bit in a way; hey look at me I can GPGPU code, gimmie a job! type affair [grin]). I suspect it'll also be faster than the D3D version because, assuming I'm reading this correctly, you can setup a command stream so that in one operation you can effectively do everything you need without leaving the GPU.
Taking the code snippet I gave earlier, doing it in 2 passes would mean;
- bind rt1 as source
- bind rt2 as sink
- execute draw using quad and update vars
- bind rt1 as sink
- bind rt2 as source
- bind rt2 as sink (final 'wave strength' output)
- execute draw using quad
This requires a reasonable amount of API calls to do, it also lacks the 'update driving value' section of code, which could either be folded into the first pass or performed in a 3rd, either way a texture needs updating)
With the CTM method it looks like you'd do the following;
- get the memory addresses for input and output from the impl
-- this would be much like reserving VB/IB and textures
- partion this memory how you want to
-- stored as pointers in your app
- Construct a command buffer
-- this command buffer tells the GPU basically how you want it to work. It includes memory address for read/write, Constant buffer locations, program formats, output formats etc
Then, each time we want to run a TLM step we simply kick the program off, wait for it to finish and read off the data.
Well, the command buffer looks like it allows you to chain multiple programs together, so you effectively do this;
- global setup
- setup pass 1's memory locations
- execute pass 1's code
- wait for finish and flush
- setup pass 2's memory locations
- execute pass 2's code
- wait for finish and signal we are done
The command buffer need never change and the GPU can read/write PCIe memory directly as well or we can pull directly from GPU ram back to system ram for the results.
This is pretty much the point of CTM; remove API call overhead, each time we want to run a step we just execute the command buffer and get the results back, no fiddling with rendertarget, r2vb setups or the like.
I'm assuming I've got this right ofcourse, but it really does look like you can do just that; the chaining program thing looks sane as there is a 'wait until all processor are idle' command in the instruction set, which blocks the command buffer from processing until true, at which point you can carry on. If you could only execute and perform one setup per command buffer this would be somewhat pointless.
So, I really really wanna get all the D3D stuff out of the way as I think this could be alot more intresting in the long run and maybe more use as I like the idea of drifting into GPGPU research, more so at this level.
Anyways, 5am... I might look at some more code before I goto sleep, just to give my over worked brain something else to deal with [grin]