what do you use to load one pixel value in directx 9, can that still be used in directx 10? if yes, is it faster than Load()?
HLSL load() of directx 10 in directx 9
In D3D9, you have to use the sample functions. And no, it is not faster than load; sample has to do floating-point multiply and cast to integer to resolve the memory address of the sampled texels, as well as (potentially) load much more texels than one (in case you have any filtering on). The upside of sampling is that simple filtering is "free" if all the involved texels are in local cache.
The situation is due to the fact that older hardware did not expose integer operations in shaders (because they likely didn't have programmable integer processing units).
Fun AMD hack: If you set filtering to point, but mipmapping to anisotropic without mipmaps in the texture, and use sample2DLod, the driver will recognize it and you should get faster sampling of single pixels. Same with DX10's SampleLevel.
But profile it first; just to be sure it's done right.