• entries
455
639
• views
423270

# heh, opps...

90 views

So, as I was doing my write up something occured to me; for the GPU/CPU hybrid method I'd said a few entries back that I couldn't compute the per vertex normals on the GPU during deformation because I lacked the information to construct it. Of course, this is true and it was also stupid of me.

The damned thing is a hybrid between the CPU doing the TLM algo. and the GPU doing the mesh; so throw in two passes over the data to generate the mesh and then the normals and bosh! job done! and it's still a hybrid implimentations [grin]

Probably what I meant in the first place, heh

While doing this code change I found something intresting, mostly while copying the texture setup code from the GPU only method to the hybrid one;
// This can be any format (32bit or 16bit)		glGenTextures(1,&positionMap_);		glBindTexture(GL_TEXTURE_2D, positionMap_);		SetupRenderTarget(vertsperedge,internalformat_);		// This can be any format (32bit or 16bit)		glGenTextures(2,energyMap_);		glBindTexture(GL_TEXTURE_2D, energyMap_[0]);		SetupRenderTarget(vertsperedge,internalformat_);		glBindTexture(GL_TEXTURE_2D, energyMap_[1]);		SetupRenderTarget(vertsperedge,internalformat_);		// Due to vertex format limitations this is always 32bit per channel		glGenTextures(1,&normalMap_);		glBindTexture(GL_TEXTURE_2D, normalMap_);		SetupRenderTarget(vertsperedge,GL_RGBA32F_ARB);		// This can be any format (32bit or 16bit)		glGenTextures(1,&drivingMap_);		glBindTexture(GL_TEXTURE_2D, drivingMap_);		SetupRenderTarget(vertsperedge,internalformat_);

Now, one of those are 'these must be 32bit!' textures because they will be copied into a Vertex Buffer later for drawing and last I checked we couldn't source 16bit data from a vertex buffer.

However, a mistake was made by me when I was converting the orignal 'all 32bit' code to 'pick internal format' code; positionMap is vertex data, which under my previous assumption meant that it should have been 32bit all the time, yet here we are varying it's internal format.

Later it is copied from that texure to the vertex buffer and sourced as vertex data; this copy doesn't touch the CPU at all.

This caused an eyebrow raising and I changed the code for the normals to also varying; damned thing works correctly as well o.O

Turns out the GPU can do a memory to memory copy and convert from 4 compoents 16bit data to 4 comp. 32bit data on the fly with no detectable speed loss; the 16bit version of the test runs faster than the 32bit version on the higher resoultion meshs which seems to prove that it is in 16bit as a texture and then converted on the fly during the copy.

I'm assuming NV's GPUs can do the same,although if they can't it might explain the crashes people have been seeing on the GPU only methods; I'll have to rig up a test case to see once this is all over with.

Still, I can't help but wonder how it does it;
- Does the GPU just do a copy and extend it while it's in the registers?
- Does it happen purely at memory controller level?

The buffer to buffer copying stuff is also intresting because it is the one thing that DX9 doesn't allow; DX10 allows you to point buffers at anything to have the data reinterrupted but apprently you can't do an on gfx card memory-memory copy with DX9... see, alwasys knew OGL was better [wink]

edit:
I do also appear to have found a driver bug;
For some reason the output isn't correct for 16bit floating point targets of size 40*40, 50*50, 100*100 and 128*128. Once the render target his 256*256 it all seems to work fine.

I can't see owt wrong in my code so this one gets filed under 'XP gfx card bug' for later talking about.

Do you need anyone to proof-read your paper for typos, grammar mistakes, awkward sentences and whatnot?

I haven't been following this, but I have a 8800GTS card. If you need someone to test it on drop me a line. I have been to busy lately to read much of anything.

Later

Quote:
 Turns out the GPU can do a memory to memory copy and convert from 4 compoents 16bit data to 4 comp. 32bit data on the fly with no detectable speed loss; the 16bit version of the test runs faster than the 32bit version on the higher resoultion meshs which seems to prove that it is in 16bit as a texture and then converted on the fly during the copy.
Have you looked into the finer details of your hardwares architecture?

Bit of a grey area, but you might find that there is no performance difference because they are actually the same [wink]

The retrieval and writing of FP16 will obviously have to be standard, but intermediary processing probably occurs at a higher precision (CPU's do this don't they? Intermediaries at ~80bit?). For example, the Radeon 9x00's always used 24bit internal precision (one reason why they couldn't be SM3 cards). I think the Nvidia 5-series was the last of theirs to have seperate half/single pipelines and I'm pretty sure 6-,7- and 8-series are full FP32...

Quote:
 apprently you can't do an on gfx card memory-memory copy with DX9... see, alwasys knew OGL was better
Only losers would want to do that anyway. We never wanted it anyway... [razz]

D3D10 has GPU-accelerated resource<->resource copying though, so I guess we got there eventually...

Quote:
 For some reason the output isn't correct for 16bit floating point targets of size 40*40, 50*50, 100*100 and 128*128
The latter being 2n probably disproves this, but does OpenGL have the concept of row padding like D3D does? That is, the actual width of one line of texture data might be greater than width*sizeof(pixel)...

Cheers,
Jack