• FEATURED

View more

View more

View more

Image of the Day Submit

IOTD | Top Screenshots

The latest, straight to your Inbox.

Subscribe to GameDev.net Direct to receive the latest updates and exclusive content.

difficult problem for OpenGL guru

Old topic!

Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

8 replies to this topic

#1maxgpgpu  Members

Posted 15 September 2013 - 11:29 AM

I have a difficult problem that I cannot figure out an efficient way to solve.  Part of my problem is, I'm not intimately familiar with every last nook and cranny of OpenGL and GPU pipelines, and this program can obviously be solved in several ways.  I need to find the most efficient solution (fastest running), not the simplest, because millions of vertices must be processed each frame.

Here is a generic statement of what needs to happen.

#1:  The application program on the CPU contains an array of roughly 2 billion vertices in world-coordinates, each with an RGB color and possibly a couple other items of information.  The original coordinates are spherical coordinates (2 angles + distance), but separate files can be created with x,y,z cartesian coordinates to speed processing by CPU and/or GPU if appropriate.

#2:  Before the application runs, the vertex data is transferred from disk into CPU memory.  This will consume several gigabytes of RAM, but will fit into memory without swapping.

#3:  However, the entirety of vertex data will not fit into the RAM in current generation GPUs.  We can assume the application is only run on GPUs that contain gigabytes of internal RAM, with at least 1 gigabyte always allocated to this vertex data.

#4:  The data on disk is organized in a manner analogous to a cube-map to make the following processes efficient.  The data for each face of the cube-map are subdivided into a 1024x1024 array of subsections called "fields", each of which can be easily and efficiently located and accessed independently by the application program to support efficient culling.

#5:  The vertex data for all currently visible fields will presumably be held in a fixed number of dedicated VBO/VAOs in GPU memory.  For normal viewport angles, probably a few million to several million vertices will be in these VBO/VAOs, and need to be processed and displayed each frame.

#6:  When the camera/viewpoint rotates more than about 0.1 degree, some "fields" will no longer be visible in the viewport, and other "fields" will become visible.  When this happens, the application will call OpenGL functions to write the vertex data of newly visible vertices over the vertex data of no longer visible vertices, so no reallocation of VBO/VAOs is ever required.

#7:  Each frame, the entire scene except for this vertex data is first rendered into the framebuffer and depthbuffer.

#8:  The OpenGL state can be modified as necessary before rendering the vertex data.  New vertex and fragment shader programs can be enabled to implement rendering of the vertex data in the required manner.

#9:  All these special vertex data is now rendered.

-----

The above is just background.  The tricky question for OpenGL gurus is how to most efficiently render the vertex data in the following manner:

#1:  Each vertex is a point.  No lines or triangles are rendered.

The following requirements are what make this problem difficult...

#2:  For each vertex, find the value in the depth buffer that corresponds to where this vertex/point would be displayed.  The nearest point is what we need to find, since we must assume for this step that the actual physical size of each point is zero (infinitesimal).

#3:  If the depth buffer value indicates the depth buffer (and corresponding pixel in color-buffer) has already been written this frame, then no further action may be taken for this vertex (the color-buffer and depth-buffer are not modified).  In effect, we want to discard this vertex and not perform any subsequent processes for this vertex.  Otherwise perform the following steps.

#4:  Based upon the brightness and color of the vertex data (the RGB or ARGB values in each vertex structure), render a "blob" of the appropriate brightness and size (maximum 16x16 ~ 64x64 screen pixels), centered on the screen pixel where the depth-buffer was tested.

NOTE:  The most desirable way to render the image for this "blob" is "procedurally".  In other words, all screen pixels within 1 to 32 pixels of the computed screen pixel would be rendered by a fragment shader program that knows the vertex data (brightness and color), plus how far away each pixel is from the center of the image (dx,dy).  Based on that information, the fragment shader code would appropriately render its pixel of the blob image.

Alternatively, a "point sprite texture" could be selected from an array of tiny textures (or a computed subregion of one texture) based the brightness and color information.  Then a point sprite of the appropriate size and brightness could be rendered, centered on the screen pixel computed for the original vertex.

In either case, the RGB of each screen pixel must be summed (framebuffer RGB = framebuffer RGB + new RGB).

The depth buffer must not be updated.

-----

The above is what needs to happen.

What makes these requirements so problematic?

#1:  We need to make the rendering of an extended region of the screen conditional upon whether a single vertex/point in the depth-buffer has been written.  I infer that the many pixels in a point-sprite rendering are independently subjected to depth tests by their fragment shaders, and therefore the entire point-sprite would not be extinguished just because the center happened to be obscured.  Similarly, I do not see a way for a vertex shader or geometry shader to discard the original vertex before it invokes a whole bunch of independent fragment shaders (either to render the pixels in the point-sprite, or to execute a procedural routine).

#2:  It appears to me that vertex-shaders and geometry-shaders cannot determine which framebuffer pixel corresponds to a vertex, and therefore cannot test the depth-buffer for that pixel (and discard the vertex and stop subsequent steps from happening).

The last couple days I've been reading the new SuperBible and the latest OpenGL specs... and they are so chock full of very cool, very flexible, very powerful capabilities and features.  At many points I thought I found a way to meet these requirements, but... always came up short.  In some cases I could see a feature won't work, but in other cases I had to make certain assumptions about subtle details of how GPU pipelines work, and the OpenGL specification.  So... I'm not convinced there isn't some way to accomplish what I need to accomplish.  In fact, I have the distinct feeling "there must be"... but I just can't find it or figure it out.

But I bet someone who has been mucking around in OpenGL, GLSL and the details of the GPU pipeline understands how everything works in sufficient detail that... they'll immediately flash on the solution!  And tell me!

I'd hate for this process to require two passes.  I suppose we could create a first pass that simply writes or clears a single bit (in a texture or ???) that corresponds to each vertex, indicating whether the screen pixel where the vertex would be displayed as a point is currently written or not (assuming the fragment shader can read the depth buffer, and assume 1.0 means "not drawn").  Then on the second pass the vertex shader could discard any vertex with bit=0 in that texture.  Oops!  Wait... can vertex shaders discard vertices?  Or maybe a whole freaking extra geometry shader would be needed for this.  Gads, I hate multiple passes, especially for something this trivial.

I'll even be happy if the solution requires a very recent version of OpenGL or GLSL.

Who has the solution?

Posted 16 September 2013 - 02:48 AM

If I understand you correctly you have depth buffer & bunch of vertexes, now you want to render pointsprites (or quads) based on the information that this particular vertex passes the depth test.

Assuming that the vertices CANNOT write anything to the depth buffer - the easiest way would be to copy depth buffer as texture before drawing vertices. Then acces it from vertex shader for testing (compute screen space coords in vs, then lockup copied depth and reject in vertex shader using either gl_PointSize=0 if you are happy with pointsprites, or gl_ClipDistance[0]=-1 if you need to use geometry shader for the quad expansion (remember to glEnable(GL_CLIP_DISTANCE0); in code!)). You simply disable depth testing for fragments, all is done in vertex shader, fragments are invoked only for vertices that passed test.

** If the vertices CAN contribute (write) to the depth buffer values i dont see any fast way to do this.

#3pcmaster  Members

Posted 16 September 2013 - 03:57 AM

maxgpgpu:

#1: I don't quite understand, you keep mixing per-fragment and per-vertex depth-tests. Remember that vertex/geometry/hull/tessellation shaders operate with vertices, they know nothing about the final fragments that a rasteriser might generate. They can, however, project anything anywhere and sample any textures they like. Only geometry shader has the ability of not emitting anything and effectively exitting the pipeline.

I assume you don't want to discard a whole primitive (2 triangle sprite) based on its centre. In such case, when you need a per-fragment depth-test, you'll need to do it in the fragment shader, indeed, and the above helps not.

Nevertheless, you might still do some kind of conservative geometry-shader killing, for example using some kind of conservative Hi-Z / "mip-mapped" depth texture and consvative AABB of the final primitive, or something similar, where only a couple texture samples would be enough to safely tell that the whole primitive is "behind".

Edited by pcmaster, 16 September 2013 - 03:59 AM.

#4Hodgman  Moderators

Posted 16 September 2013 - 03:59 AM

#1:  We need to make the rendering of an extended region of the screen conditional upon whether a single vertex/point in the depth-buffer has been written.  I infer that the many pixels in a point-sprite rendering are independently subjected to depth tests by their fragment shaders, and therefore the entire point-sprite would not be extinguished just because the center happened to be obscured.  Similarly, I do not see a way for a vertex shader or geometry shader to discard the original vertex before it invokes a whole bunch of independent fragment shaders (either to render the pixels in the point-sprite, or to execute a procedural routine).
As you've already discovered: Use point sprites and disable depth testing.

To selectively discard a vertex, either return the actual transformed vertex, or return an invalid/off-screen vertex for vertices to be discarded, such as vec4(0,0,0,0)

#2:  It appears to me that vertex-shaders and geometry-shaders cannot determine which framebuffer pixel corresponds to a vertex, and therefore cannot test the depth-buffer for that pixel (and discard the vertex and stop subsequent steps from happening).
Disable hardware depth testing and implement it yourself in the vertex shader. Bind a texture to the vertex shader containing the depth values, and perform the comparison yourself.

#5maxgpgpu  Members

Posted 16 September 2013 - 03:21 PM

If I understand you correctly you have depth buffer & bunch of vertexes, now you want to render pointsprites (or quads) based on the information that this particular vertex passes the depth test.

Assuming that the vertices CANNOT write anything to the depth buffer - the easiest way would be to copy depth buffer as texture before drawing vertices. Then acces it from vertex shader for testing (compute screen space coords in vs, then lockup copied depth and reject in vertex shader using either gl_PointSize=0 if you are happy with pointsprites, or gl_ClipDistance[0]=-1 if you need to use geometry shader for the quad expansion (remember to glEnable(GL_CLIP_DISTANCE0); in code!)). You simply disable depth testing for fragments, all is done in vertex shader, fragments are invoked only for vertices that passed test.

** If the vertices CAN contribute (write) to the depth buffer values i dont see any fast way to do this.

I have a feeling this reply is going to sound stupid, but here goes.  First let me try to clearly answer your first sentence, which is:

If I understand you correctly you have depth buffer & bunch of vertexes, now you want to render pointsprites (or quads) based on the information that this particular vertex passes the depth test.

Yes, the engine has already drawn everything (environment, objects, etc) for this frame.  The only things not yet drawn for this frame are the huge array of vertices.  One point-sprite must be drawn for each visible vertex, where visible means nothing has been drawn on the screen-pixel where the vertex will be drawn.  The peculiarity of my requirement is that we need to suppress or draw the entire point-sprite based upon the contents of the depth-buffer at the screen-pixel where the vertex would be drawn.

Here are a couple examples to clarify what this means.

peculiar case #1:  Assume the vertex would be drawn on a screen-pixel that had never been drawn during this frame, but immediately adjacent to that pixel is a wall or some other large object.  In this case, the entire 3x3 to 65x65 pixel point-sprite must be drawn, including the portion that overlaps the wall.

peculiar case #2:  Assume the vertex would be drawn one pixel closer to the wall or object described in the above case, and thus the vertex would fall on a pixel already drawn to display the wall or object.  The depth-buffer would therefore contain a value less than "infinity" (which is probably 1.0 in practice), and therefore the vertex would not be drawn (since they are all effectively at a distance of "infinity").  In this case, the entire 3x3 to 65x65 pixel point-sprite must be suppressed, and nothing be drawn as a consequence of this vertex.

As I read your sentence, you are correct.  However, unless I am mistaken about quads, the only OpenGL mechanism that works for this application is the point-sprite mechanism.  Why?  Because I need the center of the point-sprite image to be displayed at the screen-pixel where the vertex would be drawn, and from what I can tell the point-sprite does this, but there is no way to know where to draw a variable-size quad to assure the center is located on the screen-pixel where the vertex would have been drawn.

-----

I did say that these vertices cannot write the depth-buffer.  However, that is not absolutely necessary in practice, since presumably we can force the fragment shader to write "infinity" for the depth.  I'm not certain, but this might mean writing 1.0 (or any value greater than 1.0) in the fragment shader.  So if you have some reason to want to write the depth-buffer, I suppose we can do that.  However, since this process will necessarily require switching to special-purpose shaders, we also have the luxury of setting the OpenGL state any way we wish to make the process function as desired.  I looked at those state to see if I could find a way to help make this work, but didn't find any combination that works.

Like you say, if somehow I can compute in the vertex or geometry shader which screen-pixel in the framebuffer will be written by the vertex, then I could do exactly as you say.  Well, that assumes the vertex shader can read the depth-buffer (or before we execute this pass my application can copy the whole depth-buffer into a texture that is available to the vertex shader).  Is this computation possible?  I certainly don't know how to perform that computation.  Can you point me at something to show me how to do that?  I suppose in some sense I already have much of that in my fragment shader, but as far as I know the screen-pixel magically appears between the vertex-shader and fragment-shader (and furthermore, as far as I remember, the fragment shader doesn't even know which pixel in the framebuffer it will draw upon).  I'm probably missing something simple here.

Edited by maxgpgpu, 16 September 2013 - 03:22 PM.

#6maxgpgpu  Members

Posted 16 September 2013 - 03:49 PM

maxgpgpu:

#1: I don't quite understand, you keep mixing per-fragment and per-vertex depth-tests. Remember that vertex/geometry/hull/tessellation shaders operate with vertices, they know nothing about the final fragments that a rasteriser might generate. They can, however, project anything anywhere and sample any textures they like. Only geometry shader has the ability of not emitting anything and effectively exitting the pipeline.

I assume you don't want to discard a whole primitive (2 triangle sprite) based on its centre. In such case, when you need a per-fragment depth-test, you'll need to do it in the fragment shader, indeed, and the above helps not.

Nevertheless, you might still do some kind of conservative geometry-shader killing, for example using some kind of conservative Hi-Z / "mip-mapped" depth texture and consvative AABB of the final primitive, or something similar, where only a couple texture samples would be enough to safely tell that the whole primitive is "behind".

You say "Any shader stage can read the depth texture".  Do you mean the vertex or geometry shader can read individual depth-values from any x,y location in the depth-buffer?  How?  What does that code look like?  Or if you only mean to say the entire depth-buffer can be copied to a "depth texture" (of the same size), what does that code look like?  I understand the general process, but never seem to understand how the default framebuffer or its depth buffers can be specified.

-----

Yes, I probably do sound like I'm "mixing per vertex and per fragment depth tests" in my discussion.  Actually, it only seems that way, and that's my problem.  What I need is for each vertex in the VBO to be depth-tested, but the entire 3x3 to 65x65 pixel point sprite must be drawn or non-drawn on the basis of that one test.  Of course the depth-test of that vertex needs to be tested against the value in the depth-buffer where that vertex would be drawn, but as far as I understand the vertex shader doesn't have a clue at that stage of the pipeline which x,y pixel on the framebuffer or depth-buffer the vertex will fall.

Though the vertex shader can't "discard" a pixel, it can change its coordinates to assure the vertex is far behind the camera/viewpoint, right?  So that may be one way of effectively performing a discard in the vertex shader (for points only, which is what we're dealing with here).  Or do you think that's a stupid idea?

-----

You say, "I assume you don't want to discard a whole primitive (2 triangle sprite) based on its centre".  That is precisely what I need to do!!!!!  And that is what makes this problem difficult (for me, and maybe for anyone).  Read the example I gave in my previous reply (to ADDMX) for an example.  I need to discard (not draw) the entire point-sprite if the vertex (the center of the point-sprite) has been drawn to during the previous normal rendering processes.

This is the correct behavior of the process we're talking about here.  Consider a star for example, or a streetlight or airplane landing lights many miles away.  They are literally (for all practical purposes) "point sources" of light.  However, in our eyeballs, in camera lenses, on film, and on CCD surfaces a bright pinpoint of light blooms into a many pixel blur (or "airy disc" if the optical system is extraordinarily precise).  So, when the line-of-sight to the star or landing-lights just barely passes behind the edge of any object, even by an infinitesimal distance, the entire blur vanishes.

This is the kind of phenomenon I am dealing with, and must represent correctly.  So this is the physical reason why I must in fact do what you imagine I can't possibly want to do, namely "discard the whole primitive (a largish point-sprite) based upon its center".  And thus I do NOT want a "per fragment depth test", unless somehow we can perform a per-fragment depth test ONLY upon the vertex (the exact center of the point-sprite), then SOMEHOW stop all the other pixels of the point-sprite from being drawn.  I don't think that's possible, because all those pixels have already been created and sent to separate shader cores in parallel with the pixel at the exact center of the point-sprite.  That is, unless I don't understand something about how the pipeline works in the case of point-sprites.

I don't understand your last paragraph, but that probably doesn't matter, because it appears I am trying to do something you think I can't possibly want to do!  Hahaha.

#7maxgpgpu  Members

Posted 16 September 2013 - 04:08 PM

#1:  We need to make the rendering of an extended region of the screen conditional upon whether a single vertex/point in the depth-buffer has been written.  I infer that the many pixels in a point-sprite rendering are independently subjected to depth tests by their fragment shaders, and therefore the entire point-sprite would not be extinguished just because the center happened to be obscured.  Similarly, I do not see a way for a vertex shader or geometry shader to discard the original vertex before it invokes a whole bunch of independent fragment shaders (either to render the pixels in the point-sprite, or to execute a procedural routine).
As you've already discovered: Use point sprites and disable depth testing.

To selectively discard a vertex, either return the actual transformed vertex, or return an invalid/off-screen vertex for vertices to be discarded, such as vec4(0,0,0,0)

#2:  It appears to me that vertex-shaders and geometry-shaders cannot determine which framebuffer pixel corresponds to a vertex, and therefore cannot test the depth-buffer for that pixel (and discard the vertex and stop subsequent steps from happening).
Disable hardware depth testing and implement it yourself in the vertex shader. Bind a texture to the vertex shader containing the depth values, and perform the comparison yourself.

You say, "To selectively discard a vertex, either return the actual transformed vertex, or return an invalid/off-screen vertex for vertices to be discarded, such as vec4(0,0,0,0)".  That sounds correct to me.  What I don't understand is:

#1:  How can my vertex shader know where in the framebuffer and depthbuffer the vertex will fall.
#2:  And if you have an answer to the previous question, how can my vertex shader access that value in the depthbuffer to determine whether it has been written or not?

If you have answers to these two questions, I guess the values I will receive back from the depthbuffer will be 0.000 to 1.000 with 1.000 meaning "never written during this frame".

-----

You say, "Disable hardware depth testing and implement it yourself in the vertex shader. Bind a texture to the vertex shader containing the depth values, and perform the comparison yourself".  Okay, I take this to mean you have a valid answer to question #1 above, but not #2 above (in other words, you do not know any way for my vertex shader to read individual x,y locations in the framebuffer or depthbuffer.  And therefore you propose that after rendering the conventional geometry into the framebuffer and depthbuffer, I should then call OpenGL API functions to copy the depth-buffer to a "depth-texture" (a texture having a depthbuffer format), then draw all these VBOs full of vertices with a vertex shader that somehow computes the x,y location in the framebuffer and depthbuffer each vertex would be rendered to, and on the basis of the depth value, draw the point-sprite if (depth < 1.000) and otherwise throw the vertex to some invisible location to effectively make the vertex shader discard the entire point-sprite.

Do I have this correct?  If so, two questions:

#1:  How does the vertex shader compute the x,y location in the depth texture to access?
#2:  Is the value I get back from the depth-texture going to be a f32 value from 0.000 to 1.000?  Or a s16,u16,s24,u24,s32,u32 value with the largest positive value being equivalent to "infinity" AKA "never written during this frame"?

Thanks for helping!

#8Hodgman  Moderators

Posted 16 September 2013 - 08:50 PM

Just as an alternative -- instead of using quads to achieve this effect, you could use a bloom post-process

Sorry, I'm not an OpenGL guru, so this is all API agnostic:

#1:  How can my vertex shader know where in the framebuffer and depthbuffer the vertex will fall.

That is the main job of every vertex shader! The output position variable is the position in the framebuffer where the vertex will be located.

However, the VS outputs values in NDC coordinates, which range from -1 to +1, whereas textures range from 0 to 1.
So:
vertexScreenUV = vertexOutPosition.xy * 0.5 + 0.5;
or depending on the API, sometimes texture coordinates are upside down, so you might need:
vertexScreenUV = vertexOutPosition.xy * vec2(0.5, -0.5) + 0.5;

Oops, I forgot about perspective division:

vertexScreenUV = vertexOutPosition.xy/vertexOutPosition.w * 0.5 + 0.5;

#2:  And if you have an answer to the previous question, how can my vertex shader access that value in the depthbuffer to determine whether it has been written or not?

The same way that you read from a texture in the pixel shader. Create a sampler/texture in your shader, and read from it using the texture, etc,  function.

There will be some kind of API for creating a depth-buffer (it will be separate to the regular, automatically created one that comes with the device), and there will be a way to create a special kind of resource that's both bindable as a depth-buffer, and as a texture.

Older GPUs might not allow you to create a depth buffer that is readable as a texture, but DX10-level GPUs and onwards will allow this.

--To workaround this, you can use MRT (multiple render targets) to create your own depth texture. In your main rendering pass, you output your colour values to render-target #0, and manually output depth values to render-target #1.

Older GPUs also might not allow you to use textures in the vertex shader (but DX10+ ones will).

--There's a workaround for GPUs that don't support VTF (vertex-shader texture support) -- you have the vertex-shader pass the centre point to the pixel-shader as an extra varying/interpolant, and then in the pixel-shader, you fetch the depth value at that coordinate and compare it against the pixel depth.

Is the value I get back from the depth-texture going to be a f32 value from 0.000 to 1.000?  Or a s16,u16,s24,u24,s32,u32 value with the largest positive value being equivalent to "infinity" AKA "never written during this frame"?

The texture function returns a vec4 as usual, no matter what kind of texture it's reading from. The depth value will be in the r/x/[0] component, and yes 1.0 will represent the far plane.
If you've cleared the depth buffer using a value of 1.0, then yes, 1.0 will represent "never written to".

However, unless I am mistaken about quads, the only OpenGL mechanism that works for this application is the point-sprite mechanism.  Why?  Because I need the center of the point-sprite image to be displayed at the screen-pixel where the vertex would be drawn, and from what I can tell the point-sprite does this, but there is no way to know where to draw a variable-size quad to assure the center is located on the screen-pixel where the vertex would have been drawn.

Sure there is. Say that you're drawing a quad primitive using 4 verts:
Each vert has a position and a UV. All 4 verts have the same position, but 4 different UV's, e.g.

{ 42, 64, 13, -1, -1 }
{ 42, 64, 13,  1, -1 }
{ 42, 64, 13, -1,  1 }
{ 42, 64, 13,  1,  1 }

In the vertex shader, you can transform all of these points to the same position (e.g. outPos = mul( inPos, matrix )), but then offset them using the unique UV values (e.g. outPos.xy += inUV * scale).

Edited by Hodgman, 16 September 2013 - 10:22 PM.

#9maxgpgpu  Members

Posted 16 September 2013 - 10:10 PM

Hodgman:

Okay, rather than copy your message, which makes this message a bit difficult to parse, I'll just ask my followup questions here.

As far as I know, the conventional output of the vertex shader has not had the following performed:

1:  perspective division

2: viewport transformation

Nonetheless, I see that your answer might be correct anyway, if the code added to the vertex shader is written properly.  First of all, I've always had a suspicion that gl_Position.w is always 1.000 and therefore the perspective division doesn't change anything, and can therefore be ignored.  However, even if that is not always true (tell me), perhaps my transformed vertices always have gl_Position.z equal to 1.0000 since they are at infinity, and my model-view and projection transformation matrices don't contain anything especially wacko.

Then there's the viewport transformation, which appears like maybe can also be ignored due to the way textures are accessed.  What I mean is, I guess the normal output of the vertex shader is clip coordinates (not NDC = normalized device coordinates), BUT if we assume the output coordinates of the vertex shader in gl_Position always contains gl_Position.w == 1.0000, then "clip coordinates" may be the same as "NDC" (which would then correspond to what you said).

Then the viewport transformation scales the NDC coordinates by the width and height of the framebuffer in order to map the NDC to specific pixels in the framebuffer and depthbuffer.  However, if my vertex shader is not able to directly access the framebuffer or depthbuffer and instead has to access a texture, then there's no reason my vertex shader needs to compute the x,y pixel location in the framebuffer or depthbuffer.  Instead, it needs to compute the corresponding texture coordinates (presumably with "none" or "nearest" filtering or something like that).  And since the range of NDC and texture-coordinates are only a factor of two different, your trivial equation does the trick.

Very cool!

I guess the only thing this depends upon is... gl_Position.w == 1.0000 (but for objects at distance infinity, I'm betting that's pretty much guaranteed).  I know I should remember, but when is the perspective division value in gl_Position.w != 1.0000?  Gads, I can't believe I forget this stuff... it's only been several years since I wrote that part of the engine - hahaha.

-----

I am programming with the latest version of OpenGL and nvidia GTX680 cards (supports the latest versions of OpenGL and D3D), so fortunately I don't need to worry about compatibility with more ancient versions.  But thanks for noting that anyway.

-----

I don't entirely follow your last section, but I probably don't need to unless you tell me there is some speed or convenience advantage to displaying these star images with quads instead of point-sprites.  Is there?

Note that I much prefer to draw computed color values to the framebuffer with the pixel shader rather than just display a point-sprite texture or quad-primitive texture.  That way I can simulate optical aberrations [that are a function of the position relative to the center of the field], or even simulate atmospheric turbulence (twinkling of the stars) with procedural techniques.  At the moment I forget how to do this, so I'll have to hit the books and OpenGL specs again.  But what I need to compute the appropriate color for each pixel in the 3x3 to 65x65 region is to know the x,y offset from the center of the point-sprite.

I suppose the obvious way to do that is to fill the x,y elements in the point-sprite "image" with x,y pixel offset values instead of RG color information (and receive the RGBA color values as separate variables from the original vertex).

I sorta maybe half vaguely recall there is a gl_PointSize output from the vertex shader, which would be perfect, because then I can specify the appropriate point-sprite size (1x1, 3x3, 5x5, 7x7, 9x9... 63x63, 65x65) depending on the star brightness.

I sorta maybe half vaguely also recall there is a gl_PointCoord input to the pixel shader that the GPU provides to identify where in the point-sprite the current pixel is.  If so, that's perfect, because then the pixel shader can compute the appropriate brightness and color to draw each screen pixel based upon the original vertex RGBA color (which presumably is passed through and not interpolated since there is only one such value in a point) and the gl_PointCoord.xy values, plus a uniform variable specifies "time" to based twinkling on.

Oh, and I guess I'll need to have the vertex shader output the NDC of the vertex unless the screen-pixel x,y is available to pixel shaders (which I don't think is).  Hmmm... except I need to multiply by the number of x and y pixels in the frame buffer to make the value proportional to off-axis angle.

Getting close!