Quote:Original post by jollyjeffersQuote:Turns out the GPU can do a memory to memory copy and convert from 4 compoents 16bit data to 4 comp. 32bit data on the fly with no detectable speed loss; the 16bit version of the test runs faster than the 32bit version on the higher resoultion meshs which seems to prove that it is in 16bit as a texture and then converted on the fly during the copy.Have you looked into the finer details of your hardwares architecture?
Bit of a grey area, but you might find that there is no performance difference because they are actually the same [wink]
The retrieval and writing of FP16 will obviously have to be standard, but intermediary processing probably occurs at a higher precision (CPU's do this don't they? Intermediaries at ~80bit?). For example, the Radeon 9x00's always used 24bit internal precision (one reason why they couldn't be SM3 cards). I think the Nvidia 5-series was the last of theirs to have seperate half/single pipelines and I'm pretty sure 6-,7- and 8-series are full FP32...
I pretty much know about as much as I'm going to told about the hardware arch; and yes, the R5x0 series is full 32bit processing (the 24bit in the previous generations wasn't the only thing which stopped them being SM3.0; lack of branching and looping in the fragment processors also was a slight issue), so there is no change in ALU processing pressure, the 16bit format just means less bandwidth pressure at higher mesh resolutions.
I need to check things out some more I guess, I was just intrested to note that a memory-memory copy could do this conversion on the fly which made me wonder how it does it;
- If it happens via the GPU in some manner (fragment shader read, ROP write) then it would stall the gpu a bit but has intresting implications as to how the GPU is arrange
- If it does it simply by getting the memory controller to do a DMA copy and expand the format then the MC is more flexiable than I expected
The memory-memory copy is more or less a readback (done via glReadPixels, I'm sure D3D has something along those lines to get the content of the framebuffer), but a readback done via a pixel buffer which allows it to be a-sync and transfer between sections of GPU memory without touching system ram; knowing how this readback is performed when a Pixel Buffer is in the mix would help narrow down how it happens; it's done via DMA to system ram I know that much, to allow it to happen async with the rest of the processing stream.
I know the GPU can write directly to PCIe ram, ATI/AMDs CTM docs point this out, however this is a MC programming thing, which just opens up more questions about the async copies, pixel buffers and on the fly conversion.
But, yes, I'd like to come up with a test suite at some point to see how various GPUs react to the copy-conversion process (or indeed if they copy and convert at all; the GPU test crashes with NV hardware are still a mystery to me).
Quote:Quote:apprently you can't do an on gfx card memory-memory copy with DX9... see, alwasys knew OGL was betterOnly losers would want to do that anyway. We never wanted it anyway... [razz]
D3D10 has GPU-accelerated resource<->resource copying though, so I guess we got there eventually...
Well, D3D was never really setup for it, the closest was ATI's 'hack' with R2VB (which I couldn't get working in my own code, but knew worked, which was just annoying).
Quote:Quote:For some reason the output isn't correct for 16bit floating point targets of size 40*40, 50*50, 100*100 and 128*128The latter being 2n probably disproves this, but does OpenGL have the concept of row padding like D3D does? That is, the actual width of one line of texture data might be greater than width*sizeof(pixel)...
Yeah, it does, however it shouldn't matter in this instance because the textures are all under the gfx card's control; so it should know the layout and format of the texture and be able to adjust its reading and writing correctly to it (and as the later sizes show, 16bit rtt is valid).
When I enabled the output visualisation the two textures which were being used for the simulation, instead of showing nice multicoloured circles moving from a central point were only showing a green mess, which wasn't right in the slightest at those levels.
So, yeah, looks like a driver bug in the setup routines for the texture/rtt state...