Bottom line: I'm trying to figure out the best technique to create extremely high-quality real-time star fields for simulations and games.
I compiled a catalog of the brightest 1.2 billion stars, complete to nearly magnitude 21, a brightnesses range of roughly 500,000,000 to 1. The data includes position (micro-arc-seconds), spectral types, brightness in multiple colors (including 3 that correspond well enough to RGB), proper motion, parallax (distance), and so forth.
My goal is to be able to display accurate, realistic star backgrounds in simulations and games. The display is not limited to what a human can see with their naked eye (without optical aid). That would only be 10,000 stars or so. The goal is to be able to display wide fields of view (30 ~ 150 degrees), but also narrow fields as visually observed or captured with CCDs through binoculars and telescopes, including large telescopes. So the field of view can be as wide as 120 degrees or more, but also as narrow as 0.01 degrees or less. The 3D engine is already capable of unlimited zoom factors (with 90-degrees vertical being designated z0om == 1.00000).
Finally, the display of the sky should look good (in fact, "awesome"), and not take too much CPU time from the simulation or game.
I am fairly certain these goals can be met, but implementation will not be trivial. Some approaches I consider promising adopt aspects of OpenGL that I have little experience with. So I invite anyone who thinks they can give good advice to do so.
Oh, one more requirement that makes certain techniques "not work". The display of occultation of bright stars must be displayed correctly. What does this mean? For practical purposes, all stars are point sources. In fact, the angular extent of most stars in the catalog are less than one-millionth of an arc-second. A small percentage of stars that are close and large enough span a few thousandths of an arc-second. Thus only a tiny sample of stars can even be displayed as larger than a literal point source with the very best modern telescopes and techniques. So for practical purposes (99.99999% of the stars) can be treated as literal point sources. In fact, for the first implementation, all stars will be treated as literal point sources.
So what? As you know, when you look at the night sky from a dark location, the brighter the star, the bigger the star looks to human vision (and photographs, and CCD images, etc). This appearance exists not because brighter stars are larger, but because the light "blooms"... in the eyeball, on the film, on the CCD surface, etc. Stated otherwise, not all light is absorbed by eyes or electronic sensors. Some of the light is scattered, diffused, reflected and generally "bounces around". This is true for all stars, but the scattering near bright stars is bright enough to be seen or photographed, and this is what makes them "appear larger".
So what is the significance of this when it comes to "occultations"? An "occultation" simply means some object closer than the star passes between the star and the observer. Consider the instant just before the object covers the star, when the point image of the star almost touches the object from the point of view of the observer. In terms of pixels on the display, the star is within a fraction of a pixel of the object.
Because this bright star is a point source, the object blocks none of the star light. The image of this bright star still "blooms" on the eye, film or CCD sensor. So the image of the star may be 4, 8, 16, 32, maybe even 64 pixels in diameter on the screen, even though the star may be some infintesimal fraction of a pixel away from the object. In fact, we are displaying a view through a large telescope, the bloom of the star might fill the entire display!
The point is, up to half the "image" of a star covers the image of nearby object on the display. The object is completely opaque, but on our display, up to half of every star overlaps opaque objects next to them.
To choose a specific case, let's say this bright star image is (blooms to) 64-pixels in diameter (as far as we can see). As the image of the star slowly moves closer and closer to the object on the display, first the leading edge of the 64-pixel diameter star image overlaps the object, then more and more of that 64-pixel star image overlaps the object... until...
At the instant when the exact center of the star image reaches the edge of the object, ALL light from the star vanishes... instantly.
That is how my engine must display such events. As long as the exact center of a star is covered by another (closer) object, no light from the star reaches our eyes or sensors, and our display shows no evidence of the star. And further, as long as the exact center of any star image is not covered by any other (closer) object, our display shows the entire circular image of the star, including 100% of the intensity of the star.
Remember the way this works, because it means certain common and/or easy techniques "just won't work properly".
Perhaps the most obvious technique that "bytes the dust" is the classic "skybox" technique, which I assume is always implemented with a "cube map". Clearly if we display the sky with a cube map skybox, then any object that slowly moves in front of a 64-pixel diameter star image will gradually cover larger and large portions of the 64-pixel image of the star. Which is wrong, wrong, wrong!
My first "brilliant idea" was to omit the brightest 100 or so stars when I create the skybox, so every star on the skybox is only about 1 pixel in diameter. Then I'd have the engine display the skybox first, then all objects, then individually draw the brightest 100 stars as a final "special step". This would need to be a "special step", because the engine would have to read the pixel in the framebuffer at the exact center of the star image, then either draw or fail-to-draw the entire star image depending on the depth of that pixel. If the depth was "infinity" or "frustum far plane", then the star is not covered by anything, and the engine draws the entire image. Otherwise, it skips the star and repeats this process for every bright star.
The problem with this is fairly obvious. Which stars bloom to more than 1 pixel in diameter is extremely variable! Oh, if I was only making a trivial application that only had a single zoom factor and "sensor sentivity" and no ability to "observe through optical instruments", then I might get away with that approach. But clearly that technique does not support many situations, scenarios and cases my engine needs to support. A star that is too faint to even register to the naked eye might bloom to hundreds of pixels diameter through a large telescope! And the star brightness that begins to bloom even varies as an observer cranks the sensitivity or integration knob on the application.
I tried to find tricky, then elaborate, then... well... just about psychotic ways to patch up this approach. But in the end, I realized this approach was futile.
I won't elaborate on those crazy attempts, or mention other "image based" techniques (where "image based" just means we create images (of various sorts, in various forms) of portions of the sky in advance, then display them (and patch them up) as needed.
At this point, I'm fairly sure any successful technique will be entirely "procedural". Which means, the GPU will execute procedures that draw every star within the frustum [that is detectable under the current circumstances, and is not obscured by any object].
Of course, this whole category of approaches has its own demons.
For example, culling! For example, CPU/transfer efficiency! For example, GPU efficiency! Are we really going to keep a database of 1.2 billions stars in GPU memory? I don't think so! Especially since that database is over 150GB on my hard drive. Maybe in 10~20 years nvidia will sell GPU cards with 256GB of video memory, but until then, not gonna happen!
Finally we're getting down to the nitty gritty, where I need to tap into the experiences others have had with niche parts of OpenGL, and niche aspects of game engines --- like fancy culling, perhaps?
At the moment, it seems clear that the engine needs to have one to several big, fixed-size static buffers in GPU memory that my engine "rolls stars through". Let's say we allocate 1GB of GPU memory for stars. I think my nvidia GTX-680 card has something like 4GB of RAM, so 1GB for stars seems plausible (especially since I'm designing this engine for applications to be competed 3~5 years from now, with whatever GPUs are medium to high end then). If the data for each star consumes 64 bytes, that means we have room in GPU memory for up to 200 million stars. Since a 2560x1600 screen has 4 million pixels, our 1GB of video memory can hold up to 50 stars for each pixel. That should be sufficient! There's no point in turning the entire display white.
Of course, as the camera rotates we will be feeding new stars into video RAM, overwriting stars in "least recently accessed" regions of the sky. In fact, depending on many factors, we might need to predictively feed stars into GPU memory 2 ~ 4 frames before they appear in the frustum. So it is entirely possible only 1/4 of our 1GB of video memory is being actively displayed on the current frame. And there might even be cases where that is optimistic, but let's hope not too much so. After allowing for these memory inefficiencies, our 1GB of GPU memory still holds something like 8 to 16 stars for each pixel on the display. Which should be plenty sufficient, maybe even conservative.
This analysis raises a few questions. Can the GPU render 32 million stars per frame [without consuming too much GPU bandwidth]? Let's see, 32-million * 60 frames per second == 2-billion stars per second. Can the GPU render 2-billion stars per second without breaking a sweat... leaving enough GPU compute time for the rest of the simulation or game (including any GPGPU processing)?
Maybe this question is a bit more complicated. For example, if 90% of the display is covered by other objects, the GPU can and will automatically discard a huge percentage of stars in the fragment shader on the basis of z-buffer test. Even more interesting, it is my understanding that modern GPUs know how to discard even earlier in the pipeline. Since the vast majority of stars are tiny... as in very, very tiny (as in 1 or 3 pixels diameter), they should be the easiest entities to discard early in the pipeline. If this is true, then the average GPU resources consumed per pixel are greatly reduced.
In the extreme opposite case the entire display is covered by stars, with only 0% to 10% of the display covered by other objects (spacecraft and asteroids, perhaps). In this case almost all stars in the frustum will be drawn, and consume the maximum GPU bandwidth. However, this scenario presumes that very little else is happening in the engine, or at least, not much more than stars are being rendered by the GPU.
Which raises the question that is giving me the worst nightmares lately... culling. Or to put this another way, how does the engine decide which stars need to be moved into GPU memory, and over which "old stars" in GPU memory should those "new stars" be written over? AARG.
Though this question is driving me nuts, it may be a tiny bit less overwhelming today than last week. It seems one benefit of trying in vain to find a way to implement the sky with a skybox forced me to question the original (and current) organization of my star database. Thinking like the lifelong astronomy and telescope nerd I am, I sliced and diced the sky into 5400 x 2700 "square" chunks of the sky, each of which is 4 arc-minutes square. I refer to standard astronomy "polar coordinates" thinking, with 5400 chunks in "right-ascension" (west-to-east), and 2700 chunks in "declination" (south-to-north). While this is a totally natural way for any astronomy type to think about the sky, it has some extremely annoying characteristics. For example, at the north and south poles, each "square" chunk of the sky is actually extremely tiny in terms of how much area on the celestial sphere it contains. For example, the 5400 "square" chunks at the north and south poles cover only an 8 arc-minute diameter circle on the sky each... while a single "square" chunk at the equator covers a 4 arc-minute square!
This organization of the database might seem silly and inconvenient. But I originally created this database for an automated telescope, and for that application the organization is perfectly fine, natural, intuitive and plenty convenient in most ways (except chasing objects close to the polar regions, but not exactly across them).
Last night I realized a far more natural and efficient organization for this star database for the purposes of this 3D engine is a cube map. If we imagine a cube map with 1K x 1K pixels on each face, then each square pixel covers a 360-degree / 4096 "square" region of the sky. So each pixel covers roughly a 5.2 arc-minutes square of the sky, and all of these regions are fairly close to the same actual size on the celestial sphere than my old configuration. So the cube map approach is not only more evenly distributed over the sky, but also contains no "problematic" pseudo-singularities like the north and south celestial poles.
In case the above isn't entirely clear, the organization is this. All the stars that lie within that part of the sky covered by each pixel of the 1K x 1K pixel square cube map faces are stored together (one after the other) in the database. Furthermore, for reasons too complex and specific to the telescope application, individual stars within each region were stored from west-to-east order in the current organization, but can be stored "brightest-to-faintest" in the new configuration. This is extremely convenient and efficient for the 3D engine application, because we always need to display for each region "the brightest stars down through some faintness limit determined by the sensor, zoom-level, and other factors).
Nonetheless, I still have the same problem! Which is a question for one of the math-geniuses here! That question is... how to quickly and efficiently compute which regions/chunks/cube-map-pixels are within the current frustum (and just outside in any given direction we might be rotating the camera towards)? If this seems like a brain-dead simple task, remember that the camera can be rotated and tipped in any orientation on the 3D axes. There is no "up" direction, for example! Oh no! Not in space there isn't (though you are free to consider the north celestial pole to be "up" if you wish). Clearly the stars and the camera are both in "world-coordinates"... or to be more precise in this case, "universe coordinates" (though claiming the natural coordinates of the universe somehow accidentally correspond to the rotational axis of earth is delusion of grandeur to the point of being a bit wacko).
If you imagine a wide angle of the sky is currently displayed (perhaps well over 180-degrees in the horizontal or diagonal directions), the outline of the frustum might even cover some portion of all 6 faces of the cube map! It is definitely not at all obvious to me how to quickly or conveniently determine which regions of sky (which pixels on those 1K x 1K faces of the cube map) are within the frustum.
Perhaps the quick and conventional answer is... "just have the CPU test all 6 million of those cube-map pixels against the frustum, and there you have your answer". And yes, indeed we have... after making the CPU test 6 million projected squares against the frustum every frame. Hear what we're saying here? That's "the CPU" and "6 million tests against the frustum" and "every frame". It sure would be a lot quicker (assuming someone is smart enough to figure out "how"), to compute a list of "the first and final" pixel on each row of each face of the cube map that is within the frustum! Then we simply include all those other pixels between the first and final on each row, and not need to test them against the frustum. Does someone know how to walk the boundary of the frustum across the cube map faces to compile such a list of "first and finals"? Or perhaps some other technique. But this one requires someone a whole lot smarter and more intuitive in math than me.
A couple thoughts that I forgot to mention in the natural places above:
#1: Note that we must draw all objects before we draw the stars. Otherwise we waste our time drawing zillions of stars that will later be covered up by objects. But more importantly, we must draw the objects first so the bloomed star images overlap the objects. Remember, star images must overlap parts of objects, but objects must never overlap parts of star images (only extinguish them entirely).
#2: When we draw stars into the framebuffer, we must add or accumulate intensity at each framebuffer pixel. We must always fetch the current RGB intensity at that pixel in the framebuffer (whether from an object the star image overlaps, or from previously drawn stars), add the RGB intensity of the current star at that pixel, and write the sum back to the framebuffer. In this way multiple stars can contribute to each pixel. Remember, there may be zero, one, two, five, twenty, fifty or hundreds of stars within each individual pixel. When many stars are within a framebuffer pixel, the intensity of that pixel is the sum of the intensity of all star images that contribute to that pixel.
Finally, we slide into the question "how do we draw each 1, 3, 5, 7, 9.... 59, 61, 63... (??? or larger ???) star image?
For stars near the limit of being capable of detection at the current settings (sensor sensitivity, integration time, etc), we only need to add a tiny quantity of intensity to one pixel. Or is this even true? To be sure, if the star is precisely in the center of a pixel, and the star is barely bright enough to contribute, there is no need to add any intensity to any other, adjacent pixels. So we can say this is a 1-pixel star image. But even this simple case raises questions. What if a faint star falls on (or close to) the edges of two adjacent pixels... or even the corner of 4 pixels? Do we contribute light to 2 or 4 pixels?
This is not a simple question for me to answer. On the one hand, I want stars to be as tiny and sharp as possible, as they might look in a perfect optical system. I'm not particularly interested in adding coma, astigmatism and other aberrations to images, though I certainly could given the fact that I've written optical design software. So I don't want to blur images without cause. Furthermore, consider this question. Consider the situation I just mentioned. We have 3 stars of exactly the same color and brightness. One falls smack in the center of a pixel, another star close-by falls on the edge of two pixels, and another star close-by falls on the corner of 4 pixels. Now, given the fact that all 3 stars are absolutely identical, should they not look identical? Should one be 1/2 as bright and smeared into a 2-by-1 pixel horizontal smear? Should another be 1/4 as bright and smeared into a 2-by-2 pixel square? Or should all 3 stars appear the same?
My intuition says "they should look the same". Which means, my intuition says "let 1-pixel stars contribute only to 1 pixel". But... what other consequences might flow from this decision? Would it appear like stars "jerk from pixel to pixel"? Would we see any hint of "swimming effects" similar to what aliasing causes in extended objects? My guess is... no (or not much, especially for fainter stars). But that is only a guess. Maybe some of you know. I suspect that even if some negative effects can be visible, they will not be substantial, if only because the typical framebuffer pixel will contain intensity contributions from multiple stars, and therefore the instantaneous transfer of intensity in or out of any given pixel by any one star will be less "jarring", since the pixel intensity itself is usually an average. Or put another way, sky pixels won't often change from "back" to "white" (and the adjacent pixel from "white" to "black"), they will usually change from 30% intensity reddish-grey to 50% intensity yellowish-grey. But I welcome any comments about the practical consequences of handling faint stars in various different ways.
Once the contribution of a star exceeds 1/2 or 3/4 or 7/8 of the maximum intensity a pixel can achieve, even brighter stars can only be represented by star images larger than 1 pixel in size. Here we run into a whole new series of questions similar to the questions we asked about 1-pixel faint stars, except more complex. For example, should we represent stars as only odd-pixel sizes (1, 3, 5, 7, 9... 59, 61, 63)? Clearly we want to put most of the light intensity in the center pixel of a 3-pixel by 3-pixel star image, and spread a small quantity of light into the 8 pixels surrounding the central pixel. This keeps brighter stars as bright and sharp as we can make them, but also lets them grow larger in a way that corresponds with the blooming star images in real sensors and the real world. But if the position of the star is exactly on the edge between two pixels, or the corner of 4 pixels, we are clearly "cheating" slightly on the location of the center of star images... in order to get a more realistic appearance. Or so it would seem... that contributing only 1/4 as much light to four adjacent pixels would make the star appear different than other stars nearby... and even significantly different than it would appear if the star is to move half a pixel away... to a pixel center. So maybe not spreading light from a star into adjacent pixels causes as much or more "swimming effects" than always drawing stars on pixel centers! Hmmm. Strange!
A related practical question is... how should the engine draw stars? For 1-pixel diameter stars, we have the option to draw them as literal points... no textures, just RGB intensity on some 1 pixel. A question... are there certain modes in which OpenGL would draw a 1-pixel diameter "simple point" over more than 1 pixel? I would guess not, but that is only a guess (having very limited experiences drawing points, or with fancy OpenGL modes). Do NEAREST and LINEAR mean anything to 1-pixel points?
For larger than 1-pixel diameter stars, we can't just draw simple OpenGL points. One option that sounds attractive to me based upon reading various places is OpenGL "point sprites". I'm guessing we don't want to adopt any form of free "mipmapping" that OpenGL might offer, partly because we only want sprite textures that are an odd number of pixels on a side (so the bright center always falls in the middle of "one central pixel", not across "four almost central pixels"). So I'll ignore mipmapping unless someone drags me back to this option for some reason.
I guess one question is... do we really want to have separate 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63 pixel diameter textures in order to support those 32 star sizes? I don't even think OpenGL provides a way to do this (have more than 32 texture units). All textures in a texture array must be the same size, don't they? Maybe we can figure out a way to put all 32 (or more) sizes of star images on a single texture, and somehow manipulate the texture-coordinates to the appropriate offset and scale based upon the star brightness.
While I'm not 100% certain how to make this happen yet, my general feeling is to "do everything procedurally" in the fragment shader (starting with "point sprites" primitives). Somehow the fragment shader needs to know at every invocation (at every pixel in the [point sprite] image) how bright is the star, and "where are we" in the [point sprite] image. As I understand it, OpenGL automatically sends texture coordinates into every invocation of a fragment shader processing a point sprite to indicate precisely where in the n-by-n pixel diameter sprite we currently are. If we also know how large the [point sprite] star image is, we should be able to generate an intensity from the texture coordinates with some simple function. For example, perhaps something like:
float ss = (s * 2.0) - 1.0;
float tt = (t * 2.0) - 1.0;
float intensity == (1.000 - ((ss * ss) + (tt * tt))); // center == 1.000 : edges == 0.0000
That's not the "response function" shape I'd like to achieve, but some math wizard needs to tell me a fast, efficient equation for a response function given those texture coordinate inputs. The shape I refer-to is a bit like a sine function from minimum through minimum with maximum in the center... except something more concentrated in the central region, falling off closer to the center, and a wide, low, gradually falling outer region. Whatever the intensity at each pixel, we then multiply this per-pixel intensity by the star RGB color to get the final pixel color.
I guess we can't have "simple 1-pixel points" for faint stars, and "point sprites" for brighter stars, because all stars need to be the same primitive and processed with the same OpenGL "mode/state settings". So I guess all stars need to be "point sprites" (or something).
A procedural approach like this is very attractive to me for several reasons. First, no GPU texture memory accesses required. With thousands of GPU cores making simultaneous texture fetches to draw millions of stars... ehhh... I'd rather avoid so many simultaneous texture fetches! Second, it is so trivial to add "twinkle" in a procedural routine by simply adding a line or two to tweak the intensity with the output of a noise function. Third, it is also fairly simple to add "seeing" (atmospheric turbulance) in a procedural routine. Optimally this requires two parts... one part to tweak the size of the [point sprite] star somewhat larger and smaller with a noise function as time passes, and another part to distort the brightness at various locations within the star image with noise-like functions. It is also natural and convenient in procedural routines to support adjustable contrast and other image processing settings (though this can be done whether the rest of the routine is procedural or not).
Well, I guess I've spent more time than I wanted describing my currently favored approach, and less time asking for opinions about other approaches that the many 3D graphics geniuses here might already know or invent on the spot. Nonetheless, hopefully the extensive description of what I'm doing also states clearly what my requirements are. That's important, because most "obvious" approaches are not capable of satisfying my requirements.
Oh, one more thing I forgot to mention. I assume the engine will compute pixel intensities in 32-bit floating point variables. Exactly how the GPU ends up converting those 32-bit values to displayable intensities is a question I'd love for someone to explain to me.
Better approaches, anyone?
Are OpenGL "point sprites" efficient for this application?
Do you know any efficient "response functions" that might work for me.
Any comments about aliasing or "swimming effect" problems I might have?
Any brilliant ways to efficiently perform the sky-region culling process that I mentioned, or a better alternative?
Any gotchas that make my approach not work, not achieve my goals and requirements, or be "just too damn slow"?