Jump to content

  • Log In with Google      Sign In   
  • Create Account

Journal of Ysaneya

Tech Demo Video 2010

Posted by Ysaneya, 04 May 2010 - - - - - - · 2,774 views

It's been many years since the release of the last video showcasing the seamless planetary engine, so I'm happy to release this new video. This is actually a video of the game client, but since there's little gameplay in it, I decided to label it as a "tech demo". It demonstrates an Earth-like planet with a ring, seamless transitions, a little spaceship ( the "Hornet" for those who remember ), a space station and a couple of new effects.

You can view it in the videos section of the gallery.

Making-of the video

Before I get into details of what's actually shown in the video, a few words about the making-of the video itself, which took more time than expected.

What a pain ! First of all, it took many hours to record the video, as each time I forgot to show something. In one case, the framerate was really low and the heavy stress required to dump a 1280x720 HQ uncompressed video to the disk. The raw dataset is around 10 GB for 14 minutes of footage.

14 minutes ? Yep, that video is pretty long. Quite boring too, which is to be expected since there's no action in it. But I hope you'll still find it interesting.

Once the video was recorded, I started the compression process. My initial goal was to upload a HQ version to YouTube and a .FLV for the video player embedded on the website. The second was quite easily done, but the quality after compression was pretty low. The bitrate is capped to 3600 kbps for some reason, and I didn't find a way to increase it. I suspect it's set to this value because it's the standard with flash videos.

I also wanted to upload a HQ version to YouTube to save bandwidth on the main site, but so far it's been disappointing. I tried many times, each time YouTube refused to recognize the codec I used for the video ( surprisingly, H264 isn't supported ). After a few attempts I finally found one that YouTube accepted, only to discover that the video was then rejected due to its length: YouTube has a policy to not accept videos that are more than 10 minutes long. What a waste of time.

So instead I uploaded it to Dailymotion , but it's very low-res and blurry, which I cannot understand since the original resolution is 1280x720; maybe it needs many hours to post-processing, I don't know. There's also now a two parts HQ video uploaded to youtube: part 1 and part 2 . If you're interested in watching it, make sure you switch to full screen :)

Content of the video

The video is basically split in 3 parts:

1. Demonstration of a space station, modelled by WhiteDwarf and using textures from SpAce and Zidane888. Also shows a cockpit made by Zidane888 ( I'll come back on that very soon ) and the Hornet ( textured by Altfuture ).

2. Planetary approach and visit of the ring. Similar to what's already been demonstrated in 2007.

3. Seamless planetary landings.


I've been very hesitant in including the cockpit in the video, simply because of the exceptations it could potentially generate. So you must understand that it's an experiment, and in no way guarantees that cockpits will be present for all ships in the game at release time. It's still a very nice feature, especially with the free look around. You will notice that you can still see the hull of your ship outside the canopy, which is excellent for immersion. Note that the cockpit isn't functionnal, so if we indeed integrate it to the game one day, I would like that all instruments display functionnal informations, that buttons light on/off, etc..



The backgrounds you see in the video ( starfield, nebula ) are dynamically generated and cached into a cube map. This means that if you were located in a different area of the galaxy, the background would be dynamically refreshed and show the galaxy from the correct point of view.

Each star/dot is a star system that will be explorable in game. In the video, as I fly to the asteroids ring, you will see that I click on a couple stars to show their information. The spectral class is in brackets, and follows is the star's name. At the moment, star names are using a unique code which is based on the star location in the galaxy. It is a triplet formed of lower/upper case characters and numbers, like q7Z-aH2-85n. This is the shortest representation that I could find that would uniquely identify a star. This name is then followed by the distance, in light-years ( "ly" ).

I still have to post a dev-journal about the procedural rendering of the galaxy on the client side, in which I'll come back on all the problems I've had, especially performance related.



I'm not totally happy with the look of the planet, so it is likely that in the future, I will at least do one more update of the planetary engine. There are various precision artifacts at ground level, as the heightmaps are generated on the GPU in a pixel shader ( so are limited to 32-bits of floating point precision ). I've also been forced to disable the clouds, which totally sucks as it totally changes the look & feel of a planet seen from space. The reason for that is that I implemented the Z-Buffer precision enchancement trick that I described in a previous dev journal, and it doesn't totally work as expected. With clouds, the clouds surface is horribly Z-fighting with the ground surface, which wasn't acceptable for a public video. At the moment, I use a 32-bits floating point Z-Buffer, reverse the depth test and swap the near/far clipping planes, which is supposed to maximize Z precision.. but something must have gone wrong in my implementation, as I see no difference with a standard 24-bits fixed point Z Buffer.

The terrain surface still lacks details ( vegetation, rocks, etc.. ). I still have to implement a good instancing system, along with an impostor system, to get an acceptable performance while maintening a high density of ground features.



Look & Feel

Don't think for one second that the "look & feel" of the camera and ship behavior is definitive in this video. I'm pretty happy with the internal view and the cockpit look, but the third-person camera still needs a lot of work. It theorically uses a non-rigid system, unlike the ICP, but it still needs a lot of improvements. 


As you may notice, the ship's thrusters correctly fire depending on the forces acting on the ship, and the desired accelerations. Interestingly, at one given point in time, almost all thrusters are firing, but for different reasons. First, the thrusters that are facing the planet are continuously firing to counter-act the gravity. It is possible to power down the ship ( as seen at the end of the video ), in which case the thrusters stop to work. Secondly, many thrusters are firing to artifically simulate the drag generated by the auto-compensation of inertia. For example when you rotate your ship to the right, if you stop moving the mouse the rotation will stop after a while. This is done by firing all the thrusters that would generate a rotation to the left. Of course, some parameters must be fined tuned.

When the ship enters the atmosphere at a high velocity, there's a friction/burning effect done in shaders. It still lacks smoke particles and trails.

This video will also give you a first idea of how long it takes to land or take off from a planet. The dimensions and scales are realistic. Speed is limited at ground level for technical reasons, as higher speeds would make the procedural algorithms lag too much behind, generating unacceptable popping. At ground level, I believe you can fly at modern airplanes speeds. A consequence of this system is that if you want to fly to a far location on the planet, you first have to fly to low space orbit, then land again around your destination point.



ASEToBin 1.0 release

Posted by Ysaneya, 06 October 2009 - - - - - - · 966 views

Finally, the long awaited ASEToBin 1.0 has been released !

ASEToBin is a tool that is part of the I-Novae engine ( Infinity's engine ). It allows contributors and artists to export their model from 3DS Max's .ASE file format and to visualize and prepare the 3D model for integration into the game.

This new release represents more or less 200 hours of work, and is filled with tons of new features, like new shaders with environmental lighting, skyboxes, a low-to-high-poly normal mapper, automatic loading/saving of parameters, etc..

ASEToBin Version 1.0 release, 06/10/2009:


Changes from 0.9 to 1.0:

- rewrote "final" shader into GLSL; increase of 15% performance (on a Radeon 4890).
- fixed various problems with normal mapping: artifacts, symmetry, lack of coherency between bump and +Z normal aps, etc.. hopefully the last revision. Note that per-vertex interpolation of the tangent space can still lead to smoothing artifacts, but that should only happen in extreme cases (like the cube with 45° smoothed normals) that should be avoided by artists in the first place.
- removed anisotropic fx in the final shader and replaced it by a fresnel effect. Added a slider bar to control the strength of the fresnel reflection ("Fresnel").
- changed the names of the shaders in the rendering modes listbox to be more explicit on what they do.
- set the final shader (now "Full shading") to be the default shader selected when the program is launched.
- added a shader "Normal lighting" that shows the lighting coming from per-pixel bump/normal mapping.
- added support for detail texturing in "Full Shading" shader. The detail texture must be embedded in the alpha channel of the misc map.
- increased accuracy of specular lighting with using the real reflection vector instead of the old lower precision half vector.
- added support for relative paths.
- added support for paths to textures that are outside the model's directory. You can now "share" textures between different folders.
- added automatic saving and reloading of visual settings. ASEToBin stores those settings in an ascii XML file that is located next to the model's .bin file.
- ase2bin will not exit anymore when some textures could not be located on disk. Instead it will dump the name of the missing textures in the log file and use placeholders.
- fixed a crash bug when using the export option "merge all objects into a single one".
- ambient-occlusion generator now takes into account the interpolated vertex normals instead of the triangle face. This will make the AO map look better (non-facetted) on curved surfaces. Example:
Before 1.0: http://www.infinity-universe.com/Infinity/Docs/SDK/ASEToBin/ao_before.jpg
In 1.0: http://www.infinity-universe.com/Infinity/Docs/SDK/ASEToBin/ao_after.jpg
- added edge expansion to AO generator algorithm, this will help to hide dark edges on contours due to bilinear filtering of the AO map, and will also fix 1-pixel-sized black artifacts. It is *highly recommended* to re-generate all AO maps on models that were generated from previous version of ASEToBin, as the quality increase will be tremendous.
- automatic saving/loading of the camera position when loading/converting a model
- press and hold the 'X' key to zoom the camera (ICP style)
- press the 'R' key to reset the camera to the scene origin
- reduced the znear clipping plane distance. Should make it easier to check small objects.
- program now starts maximized
- added a wireframe checkbox, that can overlay wireframe in red on top of any existing shader mode.
- added a new shader "Vertex lighting" that only shows pure per-vertex lighting
- fixed a crash related to multi-threading when generating an AO map or a normal map while viewing a model at the same time.
- added a skybox dropdown that automatically lists sll skyboxes existing in the ASEToBin's Data/Textures sub-directories. To create your own skyboxes, create a folder in Data/textures (name doesn't matter), create a descr.txt file that will contain a short description of the skybox, then place your 6 cube map textures in this directory. They'll be automatically loaded and listed the next time ASEToBin is launched.
- the current skybox is now saved/reloaded automatically for each model
- added a default xml settings file for default ASEToBin settings when no model is loaded yet. This file is located at Data/settings.xml
- removed the annoying dialog box that pops up when an object has more than 64K vertices
- fixed a bug for the parameter LCol that only took the blue component into account for lighting
- added support for environment cube map lighting and reflections. Added a slider bar to change the strength of the environment lighting reflections ("EnvMap"). Added a slider bar to control the strength of the environment ambient color ("EnvAmb").
- added experimental support for a greeble editor. This editor allows to place greeble meshes on top of an object. The greeble is only displayed (and so only consumes cpu/video resources) when the camera gets close to it. This may allow kilometer-sized entities to look more complex than they are in reality.
- added experimental support for joypads/joysticks. They can now be used to move the camera in the scene. Note that there's no configuration file to customize joystick controls, and the default joystick is the one used. If your joystick doesn't work as expected, please report any problem on the forums.
- added a slider bar for self-illumination intensity ("Illum")
- added a slider bar for the diffuse lighting strength ("Diffuse")
- added a Capture Screenshot button
- added a new shader: checkerboard, to review UV mapping problems (distortions, resolution incoherency, etc..)
- added the number of objects in the scene in the window's title bar
- added a button that can list video memory usage for various resources (textures, buffers, shaders) in the viewer tab
- added a Show Light checkbox in the visualization tab. This will display a yellowish sphere in the 3D viewport in the direction the sun is.
- added new shaders to display individual texture maps of a model, without any effect or lighting (Diffuse Map, Specular Map, Normal Map, Ambient Map, Self-illumination Map, Misc Map, Detail Map)
- fixed numerous memory/resources leaks
- added a button in the visualization tab to unload (reset) the scene.
- added an experimental fix for people who don't have any OpenGL hardware acceleration due to a config problem.
- added a button in the visualization tab to reset the camera to the scene origin
- added a checkbox in the visualization tab to show an overlay grid. Each gray square of the grid represents an area of 100m x 100m. Each graduation on the X and Y axis are 10m. Finally, each light gray square is 1 Km.
- added a feature to generate ambient-occlusion in the alpha channel of a normal map when baking a low-poly to a high-poly mesh. Note: the settings in the "converter" tab are used, even if disabled, so be careful!

Note: Spectre's Phantom model is included as an example in the Examples/ directory !

Screenshots (click to enlarge):










Tip of the day: logarithmic zbuffer artifacts fix

Posted by Ysaneya, 20 August 2009 - - - - - - · 6,548 views

Logarithmic zbuffer artifacts fix

In cameni's Journal of Lethargic Programmers, I've been very interested by his idea about using a logarithmic zbuffer.

Unfortunately, his idea comes with a couple of very annoying artifacts, due to the linear interpolation of the logarithm (non-linear) based formula. It particularly shows on thin or huge triangles where one or more vertices fall off the edges of the screen. As cameni explains himself in his journal, basically for negative Z values, the triangles tend to pop in/out randomly.

It was suggested to keep a high tesselation of the scene to avoid the problem, or to use geometry shaders to automatically tesselate the geometry.

I'm proposing a solution that is much more simple and that works on pixel shaders 2.0+: simply generate the correct Z value at the pixel shader level.

In the vertex shader, just use an interpolator to pass the vertex position in clip space (GLSL) (here I'm using tex coord interpolator #6):

void main()
vec4 vertexPosClip = gl_ModelViewProjectionMatrix * gl_Vertex;
gl_Position = vertexPosClip;
gl_TexCoord[6] = vertexPosClip;

Then you override the depth value in the pixel shader:

void main()
gl_FragColor = ...
const float C = 1.0;
const float far = 1000000000.0;
const float offset = 1.0;
gl_FragDepth = (log(C * gl_TexCoord[6].z + offset) / log(C * far + offset));

Note that as cameni indicated before, the 1/log(C*far+1.0) can be optimized as a constant. You're only really paying the price for a mad and a log.

Quality-wise, I've found that solution to work perfectly: no artifacts at all. In fact, I went so far as testing a city with centimeter to meter details seen from thousands of kilometers away using a very very small field-of-view to simulate zooming. I'm amazed by the quality I got. It's almost magical. ZBuffer precision problems will become a thing of the past, even when using large scales such as needed for a planetary engine.

There's a performance hit due to the fact that fast-Z is disabled, but to be honnest in my tests I haven't seen a difference in the framerate. Plus, tesselating the scene more or using geometry shaders would very likely cost even more performance than that.

I've also found that to control the znear clipping and reduce/remove it, you simply have to adjust the "offset" constant in the code above. Cameni used a value of 1.0, but with a value of 2.0 in my setup scene, it moved the znear clipping to a few centimeters.


Settings of the test:
- znear = 1.0 inch
- zfar = 39370.0 * 100000.0 inches = 100K kilometers
- camera is at 205 kilometers from the scene and uses a field-of-view of 0.01°
- zbuffer = 24 bits

Normal zbuffer:


Logarithmic zbuffer:

Future works

Could that trick be used to increase precision of shadow maps ?

Seamless filtering across faces of dynamic cube map

Posted by Ysaneya, 19 August 2009 - - - - - - · 3,278 views

Tip of the day

Anybody who tried to render to a dynamic cube map probably has encountered the problem of filtering across the cube faces. Current hardware does not support filtering across different cube faces AFAIK, as it treats each cube face as an independent 2D texture (so when filtering pixels on an edge, it doesn't take into account the texels of the adjacent faces).

There are various solutions for pre-processing static cube maps, but I've yet to find one for dynamic (renderable) cube maps.

While experimenting, I've found a trick that has come very handy and is very easy to implement. To render a dynamic cube map, one usually setups a perspective camera with a field-of-view of 90 degrees and an aspect ratio of 1.0. By wisely adjusting the field-of-view angle, rendering to the cube map will duplicate the edges and ensure that the texel colors match.

The formula assumes that texture sampling is done in the center of texels (ala OpenGL) with a 0.5 offset, so this formula may not work in DirectX.

The field-of-view angle should equal:

fov = 2.0 * atan(s / (s - 0.5))

where 's' is half the resolution of the cube (ex.: for a 512x512x6 cube, s = 256).

Note that it won't solve the mipmapping case, only bilinear filtering across edges.

Dynamic 8x8x6 cube without the trick:

Dynamic 8x8x6 cube with the trick:

Audio engine and various updates

Posted by Ysaneya, 08 July 2009 - - - - - - · 936 views

In this journal, no nice pictures, sorry :) But a lot to say about various "small" tasks ( depending on your definition of small. Most of them are on the weekly scale ). Including new developments on the audio engine and particle systems.

Audio engine

As Nutritious released a new sound pack ( of an excellent quality! ) and made some sample tests, I used the real-time audio engine to perform those same tests and check if the results were comparable. They were, with a small difference: when a looping sound was starting or stopping, you heard a small crack. It seems like this artifact is generated when the sound volume goes from 100% to 0% ( or vice versa ) in a short amount of time. It isn't related to I-Novae's audio engine in particular, as I could easily replicate the problem in any audio editor ( I use Goldwave ). It also doesn't seem to be hardware specific, since I tested both on a simple AC'97 integrated board and on a dedicated Sound Blaster Audigy, and I had the crack in both cases.

A solution to that problem is to use transition phases during which the sound volume smoothly goes from 100% to 0%. It required to add new states to the state machine used in the audio engine, and caused many headaches. But it is now fixed. I've found that with a transition of 0.25s the crack has almost completely disappeared.

One problem quickly became apparant: if the framerate was too low, the sound update ( adjusting the volume during transition phases ) wasn't called often enough and the crack became noticeable again. So I moved the sound update into a separate thread ( which will be good for performance too, especially on multi-core machines ) which updates at a constant rate independently of the framerate.

Since I was working on the audio engine, I also took some time to fix various bugs and to add support for adjusting the sound pitch dynamically. I'm not sure yet where it will be used, but it's always good to have more options to choose from.

Particle systems

In parallel I've been working on a massive update ( more technically a complete rewrite ) of the particle system. So far I was still using the one from the combat prototype ( ICP ), dating from 2006. It wasn't flexible enough: for example, it didn't support multi-texturing or normal mapping / per pixel lighting. Normal mapping particles is a very important feature, especially later to reimplement volumetric nebulae or volumetric clouds.

Particles are updated in system memory in a huge array and "compacted" at render time into a video-memory vertex buffer. I don't use geometry shaders yet, so I generate 4 vertices per particle quad, each vertex being a copy of the particle data with a displacement parameter ( -1,-1 for the bottom-left corner to +1,+1 for the top-right corner ). The vertices are displaced and rotated like a billboard in a vertex shader.

Performance is decent: around 65K particles at 60-70 fps on a GF 8800 GTS, but I'm a bit baffled that my new Radeon HD 4890 is getting similar framerates, as it's supposed to be much faster than a 8800 GTS. I ran a profiler and most of the time seems to be spent into uploading the vertex buffer rather than updating or rendering. I don't know whether I should blame Vista or ATI/AMD...

I still have a few ideas to experiment to better manage the vertex buffer and avoid re-filling it completely every frame, especially when some particles are static ( example: all particles in a nebulae ).

Visual Studio 2008

I migrated all my projects to Visual Studio 2008. While doing so I switched the C-runtime library from dynamic to static, hopefully avoiding future problems with missing dependencies. Unfortunately, most of the external libraries I was using were compiled with the dynamic CRT, so I had to update and recompile every single of those libraries, which took a lot of time. I also used that occasion to move the automatic linking of INovae's IKernel from the header files to the cpps.

Normal mapping

SpAce reported normal mapping problems in ASEToBin. He generated a cube in 3ds max, duplicated it, applied a UV map to one of them and used it as the "low poly" mesh, while the other version is the "high poly". Then he baked the normal maps from the hi-poly to the low-poly into a texture and loaded it in ASEToBin. The results were wrong: in 3ds max the cube render was as expected, but in ASEToBin, there was some strange smoothing/darkening artifacts.

I played with that code for days and was able to improve it, but arrived to the conclusion that they were caused by vertex interpolation of the tangent space. 3ds max doesn't interpolate the tangent space per vertex, but actually re-calculates the tangent space per pixel. The only way I could do that in ASEToBin ( or more generally in the game ) is to shift this calculationto the pixel shader, but for various reasons it's a bad idea: it'd hurt performance quite a bit; it'd raise the hardware requirements, etc..

So far I haven't seen any real-time engine/tool that took 3ds max's normal map and rendered the cube with good lighting, which comforts my in my conclusion that it can only be fixed if you perform the calculations per pixel.

Gathering Texture packs

In the past years, many people have made tiling texture packs. Those texture packs have variable quality; some of the textures inside the packs are excellent; others are "good enough"; others aren't so nice. Almost none of them were made with a specific faction in mind - which is partially due to us not providing clear guidelines on the visual style of faction textures -. In any case, I think it's time to collect all those textures, filter them by quality, sort them by faction and re-publish them in a single massive pack everybody can use.

It will take a while to sort everything. A few devs are currently working on new textures ( especially SFC textures ), but I think it would be nice if in the coming weeks some contributors could help. We are primarily looking for generic textures, like plating for hulls, greeble, hangar/garages elements, etc.. Also, if you have work-in-progress textures sitting on your hard drive in a decent ( usable ) state, now would be a good time to submit them.

Galaxy generation

Posted by Ysaneya, 19 May 2009 - - - - - - · 3,561 views

In the past weeks, I've been focusing my efforts on the server side. A lot of things are going on, especially on the cluster architecture. But one particular area of interest is the procedural galaxy generator. In this journal, I will be speaking of the algorithm used to generate the stars and the various performance/memory experiments I made to stress the galaxy generator.


Note: video available at the end of the article.

Our galaxy, the Milky Way, contains an estimated 100 to 400 billion stars. As you can imagine, generating those in a pre-processing step is impossible. The procedural galaxy generator must be able to generate stars data in specific areas, "regions of interest", usually around the players ( or NPCs, or star systems in which events happen ).

The jumpdrive system will allow a player to select any star and attempt to jump to it. The range doesn't matter. What's important is the mass of the target and the distance to it. Let's start with a simple linear formula where the probability to successfully jump is a function of M / D ( M = target's mass and D = distance ). Of course, the "real" formula is a lot more complicated and isn't linear, but let's forget about that now.

Under that simple formula, you will have the same chance of jumping to a star that has a mass of 1.0 and that is located 10 LY ( light-years ) away than you have to jump to a star of mass 10.0 that is located 100 LY away..

The mass of stars ( for stars that are on their main sequence ) is defining their color. Stars that are many times as massive as the Sun are blue; Sun-like stars are white/yellow; low-mass stars appear redish and are often called red dwarves.

How does all of that relate to the galaxy generator ? Well, it defines a fundamental constraint to it: it must be hierarchical. In other words, massive blue stars must be generated even when they're very far away, while lighter red dwarves only need to be generated in a volume much closer to the player.

If you don't fully understand that previous sentence very well, read it again and again until you fully realize what it means, because it's really important. Red dwarves that are far away aren't generated. At all. They're not displayed, but they're not even in memory, and do not consume memory. More subtely, it is impossible to "force" them to appear, until you "physically" approach them closer. This also implies that you will not be able to search a star by its name unless it's a "special" star stored in the database.

Generating a point cloud of the galaxy

The algorithm is based on an octree. Remember that stars must be generated hierarchically. The octree is subdivided around the player recursively until the maximum depth ( 12 ) is reached. Each node in the octree has a depth level ( starting at 0 for the root node ) and increased by 1 at each recursion level ( so the maximum will be 12 ). This depth level is important because it determines the type of stars that are generated in that node.

This level is used as an index into a probability table. The table stores probabilities for various star classes at different depths. For the root node ( level #0 ) for example, there may be a 40% chance to generate an O-class ( hot blue ) star, a 40% chance to generate a B-class and a 20% chance to generate an A-class star.

That way, it's possible to drive the algorithm to generate the good proportion of star classes.

The potential number of stars per node is only a function of the depth level. At the root level, there are 50 million stars. At the deepest level ( #12 ) there are 200 stars. Note that the actual amount of stars generated will be lower than that, because stars need to pass a decimation test. That's how you shape the galaxy... with a density function.

The density function takes as input some 3D coordinates in the galaxy and returns the probability in [0-1] that a star exists for the given coordinates.

To generate the spherical halo, the distance to the galactic origin is computed and fed into an inverse exponential ( with some parameters to control the shape ).

To generate the spiral arms, the probability is looked up from a "density map" ( similar to a grayscale heightmap ). The 2D coordinates as well as the distance to the galactic plane are then used to determine a density.

To generate globular clusters, the calculation is similar to the spherical halo, except that each cluster has a non-zero origin and a radius on the order of a few dozen light-years.

The final density function is taken as the maximum of all those densities.

To generate stars for a given node, a random 3D coordinate inside the node's bounding box is generated for each potential star. The density is evaluated for this location. Then a random number is generated, and if that number is lower than the density, the star actually gets generated and stored into the node.

When the node gets recursively split into 8 children, all stars from the parent node gets distributed into the correct child ( selected based on their coordinates ).

As a note, all nodes are assigned a seed, and when a node gets subdivided, a new seed is generated for each child. That seed is used in various places when random numbers need to be generated. Therefore, if the player goes away and a node gets merged, then comes closer again and the node gets split, the exact same stars will be generated. They will have the exact same location, the same color, the same class, etc..

The drawback of procedural generation is that any change made to any parameter of the algorithm ( like the number of stars per node, or the probability tables ) will result in a completely different galaxy. None of the stars will end up at the same place ( or if they do, it's just a coincidence ). So all the probabilities and parameters better be correctly adjusted before the game gets released, because after, it will lead to the apocalypse..

Performance considerations

The algorithm as described above suffers from performance problems. The reason is quite simple: if for a given node you have 1000 potential stars, then you need to generate 1000 coordinates and test them against the density function at each coordinate, to see if a real star has been generated.

I quickly noticed that in the terminal nodes, the densities were pretty low. Imagine a cube of 100x100x100 LY located in the halo of the galaxy, far away from the origin: the density function over this volume will be pretty regular, and low ( I'm making this up, but let's say 10% ). This means that for 1000 potential stars, the algorithm will end up generating 1000 coordinates, evaluate the density 1000 times, and 10% of the candidates will pass the test, resulting in 100 final stars. Wouldn't it be better to generate 100 candidates only ? That would be 10 times faster !

Fortunately it's possible to apply a simple trick. Let's assume that the density function is relatively uniform over the volume: 10%. It's statistically equivalent to generate 1000 stars from which 1 out of 10 will succeed, than to generate 100 stars from which 10 out of 10 will succed. In other words, when the density is uniform, you can simply reduce the amount of stars by the correct ratio ( 1 / density ), or said otherwise, multiply the number of stars by the density ! 1000 stars * 10% = 100 stars.

Most of the time, the density isn't uniform. The lower the depth level of the node is, the larger the volume is, the less chance the density will be uniform over that volume. But even when the density isn't uniform, you can still use its maximum probability to reduce the number of potential candidates to generate.

Let's take a node of 1000 candidates where you have a 1% density on one corner and 20% on another corner (the maximum in the volume). It's still statistically equivalent to a node of 200 candidates ( 1000 * 20% ) with a density of 5% on the first corner and 100% on the other corner.

As you can see, there's no way around evaluating the density function for each candidate, but the number of candidates has been reduced by a factor of 5 while at the same time, the probability of the density function has been multiplied by 5. Less stars to generate, and for each star, a higher chance to pass the test: a win-win situation !

Memory considerations

Until now, I've explained how to generate the galaxy model and how stars are procedurally distributed on-the-fly without any pre-processing. But keep in mind that the algorithm is primarily used on the server, and that there won't be just one player, but thousands of them. How does the galaxy generation works with N viewpoints ?

To keep it short, I modified the standard octree algorithm to split nodes as soon as needed, but delayed merging nodes together until more memory is needed.

The galaxy manager works as a least-recently-used ( LRU ) cache. Stars data and nodes consume memory. When the target memory budget is reached, a "garbage collector" routine is launched. This routine checks all nodes and determines which nodes have been the least recently used ( that is: the nodes that have been generated long ago, but that aren't in use currently ). Those nodes are then merged and memory is freed.

It's a bit tricky to stress test the galaxy generator for performance and memory with multiple players, simply because it's extremely dependent on where players will be located in the galaxy. The worst case would probably be players randomly distributed in the galaxy, all far from each other. But, doing an educated guess, I don't expect this to be the norm in reality: most players will tend to concentrate around the cores, or around each other, forming small groups. But even then, can we say that 90% of the players will be at less than 1000 LY from the cores ? Is it even possible to estimate that before beta starts ?

Galactic map considerations

I've followed with interest suggestions of players in the galactic map thread, and how the galactic map should look like. After testing the galaxy generator, I arrived to the conclusion that everybody severely under-estimates the amount of stars there can be in a volume close to you. For example, in a radius of 100 LY, in the spiral arms with an average density, it's not uncommon to find 5000 stars.

Remember that the jump-drive is not limited exclusively by range. Or more exactly, while distance is a factor, there's no "maximal range". This means that it's perfectly possible to try to jump at a red dwarf that is 5000 LY away. The probability to succeed is ridiculously small ( more than winning at the lottery ), but non-zero. Of course, for the galactic map, this means that even stars that are far away should be displayed ( provided that you don't filter them out ). That's an insane number of dots that may appear on your map...

One of the more effective filters, I think, will be the jump-probability filter. That one is a given: only display stars with a minimum of 50% jump success.

In the following screenshots, you can see a blue sphere in wireframe. This defines the range in which stars are displayed. It's just an experiment to make people realize how many stars there are at certain ranges: by no means it shows how the galactic map will work ( plus, it's all on the server, remember ! ).

I can select any star by pressing a key, and it gets highlighted in pink. On the top-left, you can see some informations about the selected star: first, a unique number that defines the "address" ( in the galaxy ) of the star. On the line below, the 3 values are the X Y and Z coordinates of the star compared to the galactic origin. Then, the star class, its distance in light-years, and finally the jumping probability.

In the coming weeks, I will probably move the galaxy algorithm to the client and start to add some volumetric/particle effects on the stars/dust to "beautify" it. The reason the algorithm will also be running on the client is to avoid having to transfer a million coordinates from the server to the client each time the player opens his galactic map. That wouldn't be kind to our bandwidth...


I produced a demonstration video. Watch it on Youtube in HD ! ( I will also uploaded it later to the website as I convert it to .flv ).

Deferred lighting and instant radiosity

Posted by Ysaneya, 03 April 2009 - - - - - - · 2,500 views

In the past months, I've been wondering how to approach the problem of lighting inside hangars and on ship hulls. So far, I had only been using a single directional light: the sun. The majority of older games precompute lighting into textures ( called lightmaps ) but clearly this couldn't work well in the context of a procedural game, where content is generated dynamically at run time. Plus, even if it did.. imagine the amount of texture memory needed to store all the lighting information coming from surfaces of kilometers-long battleship !

Fortunately, there's a solution to the problem.. enter the fantastic universe of deferred lighting !

Deferred lighting

Traditionally, it is possible implement dynamic lighting without any precomputations via forward lighting. The algorithm is surprisingly simple: in a first pass, the scene is rendered to the depth buffer and to the color buffer using a constant ambient color. Then, for each light you render the geometry that is affected by this light only, with additive blending. This light pass can include many effects, such as normal mapping/per pixel lighting, shadowing, etc..

This technique, used in games silmilar to Doom 3, does work well, but is very dependent on the granularity of the geometry. Let's take an object of 5K triangles that is partially affected by 4 lights. This means that to light this object, you will need to render 25K triangles over 5 passes total ( ambient pass + 4 lights passes, each 5K ). An obvious optimization is, given one light and one object, to only render the triangles of the object that are affected by the light, but this would require some precomputations that a game such as Infinity cannot afford, due to its dynamic and procedural nature.

Now let's imagine the following situation: you've got a battleship made of a dozen of 5K-to-10K triangles objects, and you want to place a hundred lights on its hull. How many triangles do you need to render to achieve this effect with forward lighting ? Answer: a lot. Really, a lot. Too much.

Another technique that is getting more and more often used in modern games is deferred lighting. It was a bit impractical before shader model 3.0 video cards, as it required many passes to render the geometry too. But using multiple render targets, it is possible to render all the geometry once, and exactly once ! independently of the number of lights in the scene. One light or a hundred lights: you don't need to re-render all the objects affected by the lights. Sounds magical, doesn't it ?

The idea with deferred lighting is that, in a forward pass, geometric informations are rendered to a set of buffers, usually called "geometry buffers" ( abbrev: G-buffers ). Those informations usually include the diffuse color ( albedo ), the normal of the surface, the depth or linear distance between the pixel and the camera, the specular intensity, self-illumination, etc.. Note that no lighting is calculated yet at this stage.

Once this is done, for each light, a bounding volume ( which can be as simple as a 12-triangles box for a point light ) is rendered with additive blending. In the pixel shader, the G-buffers are accessed to reconstruct the pixel position from the current ray and depth, then this position is then used to compute the light color and attenuation, do normal mapping or shadowing, etc..



There are a few tricks and specificities in Infinity. Let's have a quick look at them. First of all, the G-buffers.

I use 4 RGBAF16 buffers. They store the following data:

- R G B A
Buffer 1 FL FL FL Depth
Buffer 2 Diffuse Diffuse Diffuse Self-illum
Buffer 3 Normal Normal Normal Specular
Buffer 4 Velocity Velocity Extinction MatID

'FL' = Forward lighting. That's one of the specificity of Infinity. I still do one forward lighting pass, for the sun and ambient lighting ( with full per-pixel lighting, normal mapping and shadowing ) and store the result in the RGB channels of the first buffer. I could defer it too, but then I'd have a problem related to atmospheric scattering. At pixel level, the scattering equation is very simple: it's simply modulating an extinction color ( Fex ) and adding an in-scattering color ( Lin ):

Final = Color * Fex + Lin

Fex and Lin are computed per vertex, and require some heavy calculations. Moving those calculations per pixel would kill the framerate.

If I didn't have a forward lighting pass, I'd have to store the scattering values in the G-buffers. This would require 6 channels ( 3 for Fex and 3 for Lin ). Here, I can get away with only 4 and use a grayscale 'Extinction' for the deferred lights ( while sun light really needs an RGB color extinction ).

'Velocity' is the view-space velocity vector used for motion blur ( computed by taking the differences of positions of the pixel between the current frame and the last frame ).

'Normal' is stored in 3 channels. I have plans to store it in 2 channels only and recompute the 3rd in the shader. However this will require to encode the sign bit in one of the two channels, so I haven't implemented it yet. Normals ( and lighting in general ) are computed in view space.

'MatID' is an ID that can be used in the light shader to perform material-dependent calculations.

As you can see, there's no easy way to escape using 4 G-buffers.

As for the format, I use F16. It is necessary both for storing the depth, but also encoding values in HDR.


At first, I was a bit disapointed by the performance hit / overhead caused by G-buffers. There are 4 buffers after all, in F16: that requires a lot of bandwidth. On an ATI X1950 XT, simply setting up the G-buffers and clearing them to a constant color resulted in a framerate of 130 fps at 1280x1024. That's before even sending a single triangle. As expected, changing the screen resolution dramatically changed the framerate, but I found this overhead to be linear with the screen resolution.

I also found yet-another-bug-in-the-ATI-OpenGL-drivers. The performance of clearing the Z-buffer only was dependent on the number of color attachments. Clearing the Z-buffer when 4 color buffers are attached ( even when color writes are disabled ) took 4 more time than clearing the Z-buffer when only 1 color buffer was attached. As a "fix", I simply dettach all color buffers when I need to clear the Z-buffer alone.

Light pass

Once the forward lighting pass is done and all this data is available in the G-buffers, I perform frustum culling on the CPU to find all the lights that are visible in the current camera's frustum. Those lights are then sorted by type: point lights, spot lights, directional lights and ambient point lights ( more on that last category later ).

The forward lighting ( 'FL' ) color is copied to an accumulation buffer. This is the buffer in which all lights will get accumulated. The depth buffer used in the forward lighting pass is also bound to the deferred lighting pass.

For each light, a "pass" is done. The following states are used:

* depth testing is enabled ( that's why the forward lighting's depth buffer is reattached )
* depth writing is disabled
* culling is enabled
* additive blending is enabled
* if the camera is inside the light volume, the depth test function is set to GREATER, else it uses LESS

A threshold is used to determine if the camera is inside the light volume. The value of this threshold is chosen to be at least equal to the znear value of the camera. Bigger values can even be used, to reduce a bit the overdraw. For example, for a point light, a bounding box is used and the test looks like this:

const SBox3DD& bbox = pointLight->getBBoxWorld();
SBox3DD bbox2 = bbox;
bbox2.m_min -= SVec3DD(m_camera->getZNear() * 2.0f);
bbox2.m_max += SVec3DD(m_camera->getZNear() * 2.0f);
bbox2.m_min -= SVec3DD(pointLight->getRadius());
bbox2.m_max += SVec3DD(pointLight->getRadius());
TBool isInBox = bbox2.isIn(m_camera->getPositionWorld());
m_renderer->setDepthTesting(true, isInBox ? C_COMP_GREATER : C_COMP_LESS);

Inverting the depth test to GREATER as the camera enters the volume allows to discard pixels in the background / skybox very quickly.

I have experimented a bounding sphere for point lights too, but found that the reduced overdraw was cancelled out by the larger polycount ( a hundred polygons, against 12 triangles for the box ).

I haven't implemented spot lights yet, but I'll probably use a pyramid or a conic shape as their bounding volume.

As an optimization, all lights of the same type are rendered with the same shader and textures. This means less state changes, as I don't have to change the shader or textures between two lights.

Light shader

For each light, a Z range is determined on the cpu. For point lights, it is simply the distance between the camera and the light center, plus or minus the light radius. When the depth is sampled in the shader, the pixel is discarded if the depth is outside this Z range. This is the very first operation done by the shader. Here's a snippet:

vec4 ColDist = texture2DRect(ColDistTex, gl_FragCoord.xy);
if (ColDist.w < LightRange.x || ColDist.w > LightRange.y)

There isn't much to say about the rest of the shader. A ray is generated from the camera's origin / right / up vectors and current pixel position. This ray is multiplied by the depth value, which gives a position in view space. The light position is uploaded to the shader as a constant in view space; the normal, already stored in view space, is sampled from the G-buffers. It is very easy to implement a lighting equation after that. Don't forget the attenuation ( color should go to black at the light radius ), else you'll get seams in the lighting.


In a final pass, a shader applies antialiasing to the lighting accumulation buffer. Nothing particularly innovative here: I used the technique presented in GPU Gems 3 for Tabula Rasa. An edge filter is used to find edges either in the depth or the normals from the G-buffers, and "blur" pixels in those edges. The parameters had to be adjusted a bit, but overall I got it working in less than an hour. The quality isn't as good as true antialiasing ( which cannot be done by the hardware in a deferred lighting engine ), but it is acceptable, and the performance is excellent ( 5-10% hit from what I measured ). Here's a picture showing the edges on which pixels are blurred for antialiasing:

Instant radiosity

Once I got my deferred lighting working, I was surprised to see how well it scaled with the number of lights. In fact, the thing that matters is pixel overdraw, which is of course logical and expected given the nature of deferred lighting, but still I found it amazing that as long as overdraw remained constant, I could spawn a hundred light and have less than a 10% framerate hit.

This lead me to think about using the power of deferred lighting to add indirect lighting via instant radiosity.

The algorithm is relatively simple: each light is set up and casts N photon rays in a random direction. At each intersection of the ray with the scene, a photon is generated and stored in a list. The ray is then killed ( russian roulette ) or bounces recursively in a new random direction. The photon color at each hit is the original light color multiplied by the surface color recursively at each bounce. I sample the diffuse texture with the current hit's barycentric coordinates to get the surface color.

In my tests, I use N = 2048, which results in a few thousands photons in the final list. This step takes around 150 ms. I have found that I could generate around 20000 photons per second in a moderately complex scene ( 100K triangles ), and it's not even optimized to use many CPU cores.

In a second step, a regular grid is created and photons that share the same cell get merged ( their color is simply averaged ). Ambient point lights are then generated for each cell with at least one photon. Depending on N and the granularity of the grid, it can result in a few dozen ambient point lights, up to thousands. This step is very fast: around one millisecond per thousand photons to process.

You can see indirect lighting in the following screenshot. Note how the red wall leaks light on the floor and ceiling. Same for the small green box. Also note that no shadows are used for the main light ( located in the center of the room, near the ceiling ), so some light leaks on the left wall and floor. Finally, note the ambient occlusion that isn't fake: no SSAO or precomputations! There's one direct point light and around 500 ambient point lights in this picture. Around 44 fps on an NVidia 8800 GTX in 1280x1024 with antialiasing.


I have applied deferred lighting and instant radiosity to Wargrim's hangar. I took an hour to texture this hangar with SpAce's texture pack. I applied a yellow color to the diffuse texture of some of the side walls you'll see in those screenshots: note how light bounces off them, and created yellow-ish ambient lighting around that area.

There are 25 direct point lights in the hangar. Different settings are used for the instant lighting, and as the number of ambient point lights increase, their effective radius decrease. Here are the results for different grid sizes on a 8800 GTX in 1280x1024:

Cell size # amb point lights Framerate
0.2 69 91
0.1 195 87
0.05 1496 46
0.03 5144 30
0.02 10605 17
0.01 24159 8

I think this table is particularly good at illustrating the power of deferred lighting. Five thousand lights running at 30 fps ! And they're all dynamic ( although in this case they're used for ambient lighting, so there would be no point in that ): you can delete them or move every single of them in real time without affecting the framerate !

In the following screenshots, a few hundred ambient point lights were used ( sorry, I don't remember the settings exactly ). You'll see some green dots/spheres in some pics: those highlight the position of ambient lights.

Full lighting: direct lighting + ambient lighting

Direct lighting only

Ambient ( indirect ) lighting only

Detail textures

Posted by Ysaneya, 18 March 2009 - - - - - - · 990 views

Many people have been worried by the lack of updates recently. No, we haven't got lazy, in fact quite the contrary :) We've been awfully busy at work.

In this journal I'm going to review some of the recent work, without going too far into details. In a future dedicated update I'll come back more extensively on the recent server side development.

Detail textures

I've added support for detail textures to the game. It was in fact quite easy to implement ( two hours of work ) but that feature was requested by many artists, so I took a bit of time to experiment it. The test scene is Kickman's shipyard as seen in this screenshot:

Now, this station is huge. Really huge. More than 8 kilometers in height. It was textured using spAce's generic texture pack, but despite using tiling textures, the texture still looks blurry when you move the camera close to the hull:

And now here's the difference using a detail texture:

Since I didn't have any detail texture ready, I simply used mtl_5_d.tga ( from spAce's texture pack ), increased the contrast and converted it to grayscale. I then placed this texture into the alpha channel of the misc map ( mtl_5_m.tga ).

Here, I said it: surprise! surprise!, details textures make use of the unused ( so far ) channel of the misc map. The nice "side effect" is that, like the other kind of maps, you can assign a detail texture for each sub material, which means that a single object can use different detail textures in different areas of the object..

The detail texture does not use a secondary UV map though: the shader takes the diffuse UV map and applies a scale factor ( x8 in this picture ) to increase the frequency. The result is that you see "sub-plating" inside the big plates of the texture.

So what does the shader do exactly ? It acts as a modifier to the shading; please remember that the detail texture is a grayscale image.

1. It is additively added ( with a weight ) to the diffuse color. Note that the intensity 128 / 256 is interpreted as the neutral value: all intensities lower than 128 will subtract, while all intensities over 128 will add. The formula is COL = COL + ( DETAIL - 0.5 ) * weight

2. It is additively added ( with a weight ) to the specular value. Formula is the same than above, with a different weight.

3. It is interpreted as a heightmap and converted to a normal ( by computing the gradient ) on the fly. This normal is then used to displace the original normal coming from the normal map, before the lighting / shading computations are done.

If you're going to update existing textures / packs, you should probably think of the detail texture as a detail heightmap added at a higher frequency on top of the surface.

Instead of interpreting the same texture in two different ways ( additively for the diffuse / spec, and as a heightmap for the normal map ), I could have used a new RGBA map storing the detail normal in RGB and the detail color in alpha, and this would have saved the detail normal computation in the shader. However, this would have required one more texture, wasting precious video memory.

It is unlikely that I'll update ASEToBin to support detail textures anytime soon. ASEToBin uses obsolete assembly shaders, so I'd have to port all those shaders to GLSL, which is many days of work.

Recent work

In the past 2 months, I've been working on various tasks. Some of them are secret, so I can't say much about them. They include work on Earth ( which I can't show for obvious reasons ), and work on networking security ( which I can't explain either, to not give precious hints to potential hackers ). I've also done a couple of experiments with spAce, which will hopefully go public once we get nice results.

What I can say, however, is that I've done some nice progress on the infamous terrain engine. No, it's still not finished, but it's getting closer every day. At the moment it's on hold, as I wanted to progress a bit on the gameplay and server side. I'll probably come back on it with a dedicated journal and more details once it's complete.

I've implemented new automatic geometry optimization algorithms into the engine, which are automatically used by ASEToBin. Those involve re-ordering vertices and indices for maximum vertex cache efficiency. For interested programmers, I've been using Tom Forsyth's Linear-Speed Vertex Cache Optimization . It increases the framerate by 10-20% in scenes with simple shading. When shading is more of a bottleneck, like on the planetary surfaces, it didn't help at all, but the good news is that it didn't hurt the framerate either.

I added a couple of performance / memory usage fixes to the terrain engine. Some index buffers are now shared in system memory; the maximum depth level in the terrain quadtree is limited, saving texture memory on the last depth levels. I'm storing the normals of each terrain patch in a LA ( luminance/alpha ) normal map texture, in two 8-bits channels, and recompute the missing component in a shader. Unfortunately, texture compression cannot be used, since the textures are rendered dynamically. I've also introduced new types of noise to give more variety to the types of terrain that can be procedurally generated.

I added support for renderable cube maps, and I have some ideas to improve the space backgrounds and nebulae, which aren't in HDR yet.

I've also done some serious progress on the server side. The global architecture ( meta-server, SQL server, cluster server and node servers ) is set up. The various network protocols are on their way. I'm now working on dynamic load balancing, so that star systems ( managed by a node ) can migrate to another node when the cpu is too busy. I'll probably come back on the architecture in details in a future update.

Darkfall Launch

Darkfall Online ( a fantasy MMO ) has launched. Why do I mention it ? Well, it's a very interesting case of study for us. Like Infinity, it is produced by an independent company ( although they do have millions of dollars of funding ). Like Infinity, it went for a niche market ( twitch-based combat and full PvP ) which isn't "casual". And like Infinity, it took forever to launch and has been labelled as "vaporware" for years ( although we still have some margin compared to them ).

So, what are the lessons learned from Darkfall's launch ? What can we do to prevent the same problems from happening ?

Unfortunately, I'm a bit pessimistic in that area. Of course that doesn't mean that we won't do our best to have a good launch. But, realistically, we won't have the resources to open more than one server, even if we need a lot more to support all the players trying to connect. This means.. waiting queues. A lot of Darkfall players are, understandably, angry: they paid the full price for the client ( 50$, if not more ? ) but can't get into the game, due to waiting queues that are many hours long. The good news is that, for Infinity, the initial download will probably be free ( but don't quote me on that, nothing is 100% set in stone yet ).

Will the server be stable ? Will it crash ? Will it lag ? Nobody can say for sure. As I see it, it depends on three factors:

- the number of players that try to connect, and more accurately, how much stress they cause on the server ( 1000 players connecting within 1 hour causes less stress than 1000 players trying to connect every minute.. ).

- the server ( physical machine ) performance, network quality and bandwidth available.

- the client / server code performance and quality: hopefully, not too many bugs.

On those three factors, the only one we can directly control is the third one. The machine's performance is mostly a financial problem, and as independent developers, we definitely won't be able to afford a large cluster that can handle tens of thousands of players at the beginning. Finally, how many players try to connect is a double-edged sword: more players means more income, but also mean more stress on the server, maintenance, support problems, etc..

The last lesson learned from Darkfall, IMO, is to communicate with your player base, especially at launch. I can understand the frustration of many players when the game has launched, but the most recent news on the website is a month old or more. Of course, I can also understand the developers who are busy trying to fix their code, but it only takes a few minutes..