Note: video available at the end of the article.
Our galaxy, the Milky Way, contains an estimated 100 to 400 billion stars. As you can imagine, generating those in a pre-processing step is impossible. The procedural galaxy generator must be able to generate stars data in specific areas, "regions of interest", usually around the players ( or NPCs, or star systems in which events happen ).
The jumpdrive system will allow a player to select any star and attempt to jump to it. The range doesn't matter. What's important is the mass of the target and the distance to it. Let's start with a simple linear formula where the probability to successfully jump is a function of M / D ( M = target's mass and D = distance ). Of course, the "real" formula is a lot more complicated and isn't linear, but let's forget about that now.
Under that simple formula, you will have the same chance of jumping to a star that has a mass of 1.0 and that is located 10 LY ( light-years ) away than you have to jump to a star of mass 10.0 that is located 100 LY away..
The mass of stars ( for stars that are on their main sequence ) is defining their color. Stars that are many times as massive as the Sun are blue; Sun-like stars are white/yellow; low-mass stars appear redish and are often called red dwarves.
How does all of that relate to the galaxy generator ? Well, it defines a fundamental constraint to it: it must be hierarchical. In other words, massive blue stars must be generated even when they're very far away, while lighter red dwarves only need to be generated in a volume much closer to the player.
If you don't fully understand that previous sentence very well, read it again and again until you fully realize what it means, because it's really important. Red dwarves that are far away aren't generated. At all. They're not displayed, but they're not even in memory, and do not consume memory. More subtely, it is impossible to "force" them to appear, until you "physically" approach them closer. This also implies that you will not be able to search a star by its name unless it's a "special" star stored in the database.
Generating a point cloud of the galaxy
The algorithm is based on an octree. Remember that stars must be generated hierarchically. The octree is subdivided around the player recursively until the maximum depth ( 12 ) is reached. Each node in the octree has a depth level ( starting at 0 for the root node ) and increased by 1 at each recursion level ( so the maximum will be 12 ). This depth level is important because it determines the type of stars that are generated in that node.
This level is used as an index into a probability table. The table stores probabilities for various star classes at different depths. For the root node ( level #0 ) for example, there may be a 40% chance to generate an O-class ( hot blue ) star, a 40% chance to generate a B-class and a 20% chance to generate an A-class star.
That way, it's possible to drive the algorithm to generate the good proportion of star classes.
The potential number of stars per node is only a function of the depth level. At the root level, there are 50 million stars. At the deepest level ( #12 ) there are 200 stars. Note that the actual amount of stars generated will be lower than that, because stars need to pass a decimation test. That's how you shape the galaxy... with a density function.
The density function takes as input some 3D coordinates in the galaxy and returns the probability in [0-1] that a star exists for the given coordinates.
To generate the spherical halo, the distance to the galactic origin is computed and fed into an inverse exponential ( with some parameters to control the shape ).
To generate the spiral arms, the probability is looked up from a "density map" ( similar to a grayscale heightmap ). The 2D coordinates as well as the distance to the galactic plane are then used to determine a density.
To generate globular clusters, the calculation is similar to the spherical halo, except that each cluster has a non-zero origin and a radius on the order of a few dozen light-years.
The final density function is taken as the maximum of all those densities.
To generate stars for a given node, a random 3D coordinate inside the node's bounding box is generated for each potential star. The density is evaluated for this location. Then a random number is generated, and if that number is lower than the density, the star actually gets generated and stored into the node.
When the node gets recursively split into 8 children, all stars from the parent node gets distributed into the correct child ( selected based on their coordinates ).
As a note, all nodes are assigned a seed, and when a node gets subdivided, a new seed is generated for each child. That seed is used in various places when random numbers need to be generated. Therefore, if the player goes away and a node gets merged, then comes closer again and the node gets split, the exact same stars will be generated. They will have the exact same location, the same color, the same class, etc..
The drawback of procedural generation is that any change made to any parameter of the algorithm ( like the number of stars per node, or the probability tables ) will result in a completely different galaxy. None of the stars will end up at the same place ( or if they do, it's just a coincidence ). So all the probabilities and parameters better be correctly adjusted before the game gets released, because after, it will lead to the apocalypse..
The algorithm as described above suffers from performance problems. The reason is quite simple: if for a given node you have 1000 potential stars, then you need to generate 1000 coordinates and test them against the density function at each coordinate, to see if a real star has been generated.
I quickly noticed that in the terminal nodes, the densities were pretty low. Imagine a cube of 100x100x100 LY located in the halo of the galaxy, far away from the origin: the density function over this volume will be pretty regular, and low ( I'm making this up, but let's say 10% ). This means that for 1000 potential stars, the algorithm will end up generating 1000 coordinates, evaluate the density 1000 times, and 10% of the candidates will pass the test, resulting in 100 final stars. Wouldn't it be better to generate 100 candidates only ? That would be 10 times faster !
Fortunately it's possible to apply a simple trick. Let's assume that the density function is relatively uniform over the volume: 10%. It's statistically equivalent to generate 1000 stars from which 1 out of 10 will succeed, than to generate 100 stars from which 10 out of 10 will succed. In other words, when the density is uniform, you can simply reduce the amount of stars by the correct ratio ( 1 / density ), or said otherwise, multiply the number of stars by the density ! 1000 stars * 10% = 100 stars.
Most of the time, the density isn't uniform. The lower the depth level of the node is, the larger the volume is, the less chance the density will be uniform over that volume. But even when the density isn't uniform, you can still use its maximum probability to reduce the number of potential candidates to generate.
Let's take a node of 1000 candidates where you have a 1% density on one corner and 20% on another corner (the maximum in the volume). It's still statistically equivalent to a node of 200 candidates ( 1000 * 20% ) with a density of 5% on the first corner and 100% on the other corner.
As you can see, there's no way around evaluating the density function for each candidate, but the number of candidates has been reduced by a factor of 5 while at the same time, the probability of the density function has been multiplied by 5. Less stars to generate, and for each star, a higher chance to pass the test: a win-win situation !
Until now, I've explained how to generate the galaxy model and how stars are procedurally distributed on-the-fly without any pre-processing. But keep in mind that the algorithm is primarily used on the server, and that there won't be just one player, but thousands of them. How does the galaxy generation works with N viewpoints ?
To keep it short, I modified the standard octree algorithm to split nodes as soon as needed, but delayed merging nodes together until more memory is needed.
The galaxy manager works as a least-recently-used ( LRU ) cache. Stars data and nodes consume memory. When the target memory budget is reached, a "garbage collector" routine is launched. This routine checks all nodes and determines which nodes have been the least recently used ( that is: the nodes that have been generated long ago, but that aren't in use currently ). Those nodes are then merged and memory is freed.
It's a bit tricky to stress test the galaxy generator for performance and memory with multiple players, simply because it's extremely dependent on where players will be located in the galaxy. The worst case would probably be players randomly distributed in the galaxy, all far from each other. But, doing an educated guess, I don't expect this to be the norm in reality: most players will tend to concentrate around the cores, or around each other, forming small groups. But even then, can we say that 90% of the players will be at less than 1000 LY from the cores ? Is it even possible to estimate that before beta starts ?
Galactic map considerations
I've followed with interest suggestions of players in the galactic map thread, and how the galactic map should look like. After testing the galaxy generator, I arrived to the conclusion that everybody severely under-estimates the amount of stars there can be in a volume close to you. For example, in a radius of 100 LY, in the spiral arms with an average density, it's not uncommon to find 5000 stars.
Remember that the jump-drive is not limited exclusively by range. Or more exactly, while distance is a factor, there's no "maximal range". This means that it's perfectly possible to try to jump at a red dwarf that is 5000 LY away. The probability to succeed is ridiculously small ( more than winning at the lottery ), but non-zero. Of course, for the galactic map, this means that even stars that are far away should be displayed ( provided that you don't filter them out ). That's an insane number of dots that may appear on your map...
One of the more effective filters, I think, will be the jump-probability filter. That one is a given: only display stars with a minimum of 50% jump success.
In the following screenshots, you can see a blue sphere in wireframe. This defines the range in which stars are displayed. It's just an experiment to make people realize how many stars there are at certain ranges: by no means it shows how the galactic map will work ( plus, it's all on the server, remember ! ).
I can select any star by pressing a key, and it gets highlighted in pink. On the top-left, you can see some informations about the selected star: first, a unique number that defines the "address" ( in the galaxy ) of the star. On the line below, the 3 values are the X Y and Z coordinates of the star compared to the galactic origin. Then, the star class, its distance in light-years, and finally the jumping probability.
In the coming weeks, I will probably move the galaxy algorithm to the client and start to add some volumetric/particle effects on the stars/dust to "beautify" it. The reason the algorithm will also be running on the client is to avoid having to transfer a million coordinates from the server to the client each time the player opens his galactic map. That wouldn't be kind to our bandwidth...
I produced a demonstration video.