[Solved] Mapping screen space to clip space

Started by
10 comments, last by tweduk 16 years, 10 months ago
Hello all, For a particular vertex shader that I'm writing, I need to know the precise mapping of screen space coordinates (that is, what you pass to DirectX when you give it vertices that are already transformed and lit) to clip space coordinates (that is, the output of a vertex shader). I know the approximate relationship, but that won't quite be good enough for what I want to do. Imagine that I have a screen that is 8 pixels wide by 6 high and my viewport is the whole screen. Then, according to http://msdn2.microsoft.com/en-us/library/bb219690.aspx the screen-space coordinates are like this, where each cell in the matrix below is a pixel: 8 by 6 screen coordinates The center of the top-left pixel of the screen has coordinates (0, 0), but it actually covers (-0.5, -0.5) to (0.5, 0.5). My question is, how do the positions that I have labelled A, B, C, and D map to clip-space coordinates, ignoring the Z component? I think since clip space from (-1, 1) to (1, -1) is supposed to contain the viewing frustrum, then logically the viewing frustrum should map to the bounds of the screen, so that: A = (-0.5, -0.5) -> clip (-1, 1) D = (7.5, 5.5) -> clip (1, -1) Is this correct? There are of course other possibilities. Apologies if this is documented somewhere but I've missed it. thanks! Tom [Edited by - tweduk on June 7, 2007 7:14:17 AM]
Advertisement
I have a lot more experience with openGL so I'm not quite familiar with the way directX has it's screen spaces defaulted but my advice is don't try and look into it too hard. What is the min and max of the finished T&L in DX?? Can you set it yourself? If you did and/or you do already know those are what the vertice's should be transfered to. If your trying to overlay an image perfectly on the screen I'd recommend a min and max :p...

That is what I think your asking anywho...

If your trying to look to FIND the min and max, my guess is -1 to 1. haha
StephenTC,

I know what the viewing frustrum is in clip space coordinates, (-1, -1, -1) to (1, 1, 1). I also know what the viewport is in screen space coordinates, (-0.5, -0.5) to (w - 0.5, h - 0.5). What I'm not sure about is the precise mapping between those two coordinate systems, although I can make an educated guess based on what seems logical.

My vertex shader works in both clip coordinates and pixels, so knowing the precise mapping between the two is necessary.
Well since I don't know specifically what your doing and it sounds like your trying to get your clip coordinates to cohere with the final rasterization.

This http://www.sjbrown.co.uk/?article=directx_texels might help a tiny bit.

Clip space coordinates are already perspective transformed (or I'm assuming that that is how they are passed on) and the z depth value just translates into the depth buffer so I'm fairly certain that it is a one-to-one thing for width and height. (not EXACTLY one to one, but a linear thing yes).

Hope this helps again :p
Stephen timothy Cooney
OK Now that I reread everything (again) I'm being mildly slow. Anywho, I actually just ran into this problem again, I needed a pixel perfect full-screen quad. That link I gave you helped but from what you wrote, I beleive that you are correct :). DX is odd and a pixel perfect quad needs to be shifted back half a pixel.

I again, hope this helps,
Stephen Timothy Cooney
StephenTC,

That's an interseting link, thanks - I'm still digesting what it's saying.

The problem I have with it though, is that there are a number of ways of interpreting what the article is saying, some of which seem to contradict what the official Direct3D documentation says about how to render a texture to given screen coordinates.

Oh well, good find, and I will bear it in mind, but it doesn't quite answer my question!
I will try to explain why I'm making such a big deal about this:

I'm drawing symbols to represent objects in an 3D editor that I'm developing, and those symbols are each composed of a number of line primitives.

Each object that I represent with a symbol has a world position, which I transform to clip coordinates in the vertex shader. Then, the vertex shader offsets the clip coordinates by a vector that represents a pixel offset. So the vertex shader does the following:

1. Transform world position to clip coordinates. This involves rotations, translations and a perspective projection. Nothing unusual so far. At this point I have clip-space coordinates.
2. Convert clip-space coordinates to screen-space coordinates.
3. Round screen-space coordinates to nearest pixel center using floor() intrinsic.
4. Add an (integer) pixel offset to the screen-space coordinates.
5. Convert screen-space coordinates back to clip-space and output the vertex.

So, if my conversion in step 2 or step 4 is wrong, like a scaling factor being off by 1 pixel or something being offset by 0.5 pixels, I will end up with jitter or misshapen symbols.

Now, I could do the world-to-clip transformations myself in software, get screen coordinates and then just render the symbols using transformed and lit vertices. However, I'd like to try doing everything on the GPU, because (assuming the objects don't move often, which is the case) it should be much faster because I can avoid dynamically modifying the contents of vertex buffers or having to use the likes of DrawPrimtiveUP.
That link I posted converts the transformed vertex coordinates to the final coordinates that should be pixel accurate. It should address any screen space problems. I would THINK that you don't need to literally think in screen space for it to work. (unless for some reason you need to auto-adjust the quads to be the pixel size of the image passed in...)

I'm just assuming all you need to do is pass a position into the shader pass 4 standard vertexes right and transform those to the position/rotation/whatever? Are you doing a DX10 thing where you are using geometry shaders instead?? After you transform them into the clip space then use that function in the earlier link and it will transform it to the pixel-perfect position. If you are worried that you have to do a clip-space transformation that is different than the standard image ratio to get the correct image ratio then you need not worry. Clip-space is logical so you can scale everything to how you think it needs to be scaled. After you do the clip space then you can alter the output verts by (again) that function and it should be in the expected screen-space position (which is just a nudge off clip)

If I'm wrong in interpreting you, I'll try try again :p
Stephen,

Something that I think is central to this problem is that nobody ever bothers to explicitly define what they mean when they give a screen coordinate of "(x, y)" where x and y are integers. Are they referring to the center of a pixel, or are they referring to the corner between two pixels?

This distinction doesn't matter unless you have to work in both clip-space AND screen-space AND you need pixel-perfect positioning. Most people don't need to, so nobody bothers to define what they really mean by screen coordinates.

In Direct3D, the position component of transformed and lit vertices is in screen / viewport coordinates, where integer (x, y) represents a pixel center. That is well-defined by the DirectX documentation. What is not well-defined (as far as I can tell) is whether the clip-space coordinate of (-1, 1) is a pixel corner or a pixel center.

By the way, I should clarify a couple of things: I'm not using texturing and I'm just drawing lines, so the stuff about texturing doesn't really apply here. However, I do agree that the article you linked is relevant, because it's considering screen-space (or viewport-space) coordinates. I understand what it's doing; an X-offset of -(1 / width) is a clip-space offset of 0.5 pixels to the left, and a Y-offset (1 / height) is a clip-space offset of 0.5 pixels up.

If I am interpreting that article correctly, it's saying that the mapping between screen space and clip space is (using my diagram):

B = screen (0.0, 0.0) <-> clip (-1, 1)
and a new point E = screen (8.0, 6.0) <-> clip (1, -1)

In other words, the viewing frustrum maps to a rectangle that isn't entirely onscreen, and clip (-1, 1) and clip (1, -1) both map to a pixel center. There are two 0.5 pixel wide strips at the right and bottom that are not actually on the screen. Well, that seems rather odd, so I'd like second opinions about that. I'm not quite ready to take Simon Brown's word for it :)

Sorry to drag you into this Stephen - I'm probably "doing your head in" to use a local expression :)
And here's what you were looking for...

tF32	CVertexShader::HomogeniseX (tF32 Xc){	// Convert incoming screen x coordinate into -1 to +1 homogeneous space	return ((Xc-0.5f)/((tF32)CRender::CurrentView.Width*0.5f))-1.0f;}tF32	CVertexShader::HomogeniseY (tF32 Yc){	// Convert incoming screen y coordinate into -1 to +1 homogeneous space	return 1.0f-((Yc-0.5f)/((tF32)CRender::CurrentView.Height*0.5f));}

------------------------------Great Little War Game

This topic is closed to new replies.

Advertisement