The damned thing is a hybrid between the CPU doing the TLM algo. and the GPU doing the mesh; so throw in two passes over the data to generate the mesh and then the normals and bosh! job done! and it's still a hybrid implimentations [grin]
Probably what I meant in the first place, heh
While doing this code change I found something intresting, mostly while copying the texture setup code from the GPU only method to the hybrid one;
// This can be any format (32bit or 16bit) glGenTextures(1,&positionMap_); glBindTexture(GL_TEXTURE_2D, positionMap_); SetupRenderTarget(vertsperedge,internalformat_); // This can be any format (32bit or 16bit) glGenTextures(2,energyMap_); glBindTexture(GL_TEXTURE_2D, energyMap_[0]); SetupRenderTarget(vertsperedge,internalformat_); glBindTexture(GL_TEXTURE_2D, energyMap_[1]); SetupRenderTarget(vertsperedge,internalformat_); // Due to vertex format limitations this is always 32bit per channel glGenTextures(1,&normalMap_); glBindTexture(GL_TEXTURE_2D, normalMap_); SetupRenderTarget(vertsperedge,GL_RGBA32F_ARB); // This can be any format (32bit or 16bit) glGenTextures(1,&drivingMap_); glBindTexture(GL_TEXTURE_2D, drivingMap_); SetupRenderTarget(vertsperedge,internalformat_);
Now, one of those are 'these must be 32bit!' textures because they will be copied into a Vertex Buffer later for drawing and last I checked we couldn't source 16bit data from a vertex buffer.
However, a mistake was made by me when I was converting the orignal 'all 32bit' code to 'pick internal format' code; positionMap is vertex data, which under my previous assumption meant that it should have been 32bit all the time, yet here we are varying it's internal format.
Later it is copied from that texure to the vertex buffer and sourced as vertex data; this copy doesn't touch the CPU at all.
This caused an eyebrow raising and I changed the code for the normals to also varying; damned thing works correctly as well o.O
Turns out the GPU can do a memory to memory copy and convert from 4 compoents 16bit data to 4 comp. 32bit data on the fly with no detectable speed loss; the 16bit version of the test runs faster than the 32bit version on the higher resoultion meshs which seems to prove that it is in 16bit as a texture and then converted on the fly during the copy.
I'm assuming NV's GPUs can do the same,although if they can't it might explain the crashes people have been seeing on the GPU only methods; I'll have to rig up a test case to see once this is all over with.
Still, I can't help but wonder how it does it;
- Does the GPU just do a copy and extend it while it's in the registers?
- Does it happen purely at memory controller level?
The buffer to buffer copying stuff is also intresting because it is the one thing that DX9 doesn't allow; DX10 allows you to point buffers at anything to have the data reinterrupted but apprently you can't do an on gfx card memory-memory copy with DX9... see, alwasys knew OGL was better [wink]
edit:
I do also appear to have found a driver bug;
For some reason the output isn't correct for 16bit floating point targets of size 40*40, 50*50, 100*100 and 128*128. Once the render target his 256*256 it all seems to work fine.
I can't see owt wrong in my code so this one gets filed under 'XP gfx card bug' for later talking about.