I looked at your vertex shader code as well, and I don't think you understand what it is that you're trying to achieve.
For one thing the DP4 (Vector) style design ended with the 7x00 series of GeForce cards more than 8 years ago. The G80 onwards have all been Scalar designs, they were released in 2006, which means that all you get from trying to exploit Vector instructions is (potentially) a little bit of pipeline improvement.
Also if all you're trying to do is draw things too a screen then your still much worse off doing it on the CPU and then blitting the result to the screen using the GPU and THEN trying to do the blending in a very poor way using a shader.