• entries
232
1462
• views
959065

# Streaming, debugging and cleaning..

454 views

As you have noticed, I haven't posted many updates recently.. I haven't been less active than usually, but I just don't really have any nice picture to show. Also note that this update will probably be a bit more technical/programmer oriented than usual. Code incoming..!

Minas Tirith v12

... is pretty much finished. The physics problems have been solved. One of those problems came from the fact that the city 3D meshes contained some degenerated triangles. To reject those degenerated triangles when building the ODE data structures, I used the area of the triangle to determine if the triangle is degenerated or not. I used the following code:

SVec3DF ab = m_v0.cross(m_v1);SVec3DF bc = m_v1.cross(m_v2);SVec3DF ca = m_v2.cross(m_v0);SVec3DF sum = ab + bc + ca;return(0.5f * sum.getLength());

Unfortunately this formula didn't give correct results. For example, for a perfect degenerated triangle, it returned a very small epsilon value ( above my threshold to reject triangles: that was the problem ) while it should have returned 0.

I ended up using this code to get the area of a triangle:

SVec3DF vu = m_v1 - m_v0;SVec3DF vv = m_v2 - m_v0;SVec3DF n = vu.cross(vv);return(0.5f * sqrtf(n.dot(n)));

It has a much higher accuracy, and correctly returns 0 for degenerated triangles. Thanks to that, crashes disappeared in ODE.

The next problem related to physics is the "tunneling effect". When falling down, due to gravity, your velocity increases exponentially. After some time, your velocity is so high than it travels in one physics step a larger distance than the size of the capsule ( used to simulate the character ). That means that if at the previous time step you're in front of a wall, and at the next time step you're behind the wall ( never interescting it ), no collision is detected. Nothing surprising here, all physics engines that are discrete-steps based ( ie. pretty much all of them ) exibit the same behavior. To avoid this, I simply limited the maximum velocity the character ( and the balls that you can launch with the left mouse click ). Now the Denethor jump works and doesn't crash :)

So what's left to do on Minas Tirith ? Not a lot. I have already re-exported the meshes, added new options in the setup dialog.. I still have another option to add ( for Qwerty vs Azerty keyboards ), test and re-test, then pack and upload everything.

Update of .BIN file format

... fortunately backwards compatible, as I'm using a version ID in each file, so you won't have to re-export your .BIN ( if you had any ).

Normals are no longer saved to the file.

At this point you're probably wondering, okay, but how do you perform lighting and shading without normals ? Well here is the trick: I *do* use them, they're now simply calculated at load time. I see your second question coming: if they're calculated at load time, smoothing groups are lost, aren't they ? Wrong! Because the mesh topology ( and in particular how the vertex and index arrays were set up ) was still done taking the normals into account. They're simply not saved. Imagine a cube: if it has 8 vertices and you generate normals at load-time, you'll end up with the normals of a sphere. If it already has 24 different vertices, and each triangle correctly references the vertices, when building the normals you'll end up with the normals of a cube again. Not saving the normals can save up to 25% of the file size, and generating them at load time is lightning-fast. All benefit.

Textures streaming

My main work in the past 2 weeks has been to implement texture streaming. To do that, I introduced a new file format: .TXB files. TXB litteraly means "TeXture Binary". In a .TXB file is stored the texture in DXT compressed ( or uncompressed ) format, and all its mipmaps.

I'm using the squish library to perform texture compression. It's extremely easy to use, and features many encoding algorithms, of variable performance/quality. When a texture is stored compressed on disk, the compression ratio is around 1:4 to 1:6, and can be uploaded directly to the video card without any trick. It's an order of magnitude faster than the old way ( which was to upload the uncompressed texture, and do the texture compression on the fly via the driver ).

Small story: when I first benchmarked the upload of textures in compressed form, I was disapointed by the performance. It was slower than uploading an uncompressed texture and compressing it through the driver. After some debugging, I realized that I forgot to upload the mipmaps, and had let the driver do the mipmaps generation. I bet that internally, it had to decompress my newly-uploaded compressed texture, generate the mipmaps, and for each of those do the compression again :) After I submitted the mipmaps in compressed form too, performance went as expected :) End of small story.

The main interest of .TXB textures is that they support streaming. A 64x64 version of the .TXB texture is first read and uploaded to the video card, while the upper mipmaps get submitted into a queue for a thread to load from disk. When the thread has done its work and when all the higher-resolution mipmaps are available, the final texture is getting uploaded.

Benchmarks have shown that it takes around 200 milliseconds to "get the hand back" before the rendering can continue, after you load a ship such as the Kasparov, that contains tens of 1024x1024 textures. It takes many seconds before the high-res textures become available, and so you can see those textures "pop" in real-time.. but now it doesn't freeze for tens of seconds like in the ICP anymore. Of course, in game, I'm expecting this popping to be invisible, since the ship/objects will be quite far from the camera when they enter in view and starting to stream their textures.

Memory leaks

Every 6 months or so, I enable a special mode in the engine to set the memory allocator to track memory allocations/deallocations, to detect memory leaks. I was surprised to notice a lot of leaks coming from my CString class, and more particularly when I had an array of strings.

After a lot of debugging, I came to the conclusion that my CArray class was fundamentally flawed, which is quite ironic considering that so far I've been using it for 3 years without any serious problem..

I should have known better..

Programmers have a big ego. I'm no exception to the rule. I only trust my code. And so, my CArray class is a class that I have coded myself over the years, with its own interface, and that I thought was pretty much bug-free. I was always sceptikal about using the C++ standard library's classes, as I did not know what they were doing internally, and prefered to trust my own classes for performance. Over the years, I regularly so experienced programmers on sites such as gamedev.net claim that the std library was very well done and bug free. Despite this, I continued to use my own classes. Why take the trouble of switching to the std lib when my classes were doing their job fine, and had ideal performance ?

Well, I was wrong.

First lesson learnt: my CArray class works well as long as I have integral types ( integer, floats, etc.. ) or pointers to objects. It also works fine when I store simple structures. It doesn't work when I store structures that contain pointers to other objects, or complex objects. The reason ? Constructors/destructors.

Now it sounds obvious, but when I allocated memory in my CArray class, I used malloc/realloc/free, and there's no object constructor/destruction with those. I didn't have much trouble so far because as soon as my classes start to become complex, I prefer to store pointers to them. But every once in a while, I store an object itself, and that's where leaks and bugs appear.

Second lesson learnt: the std::vector class is not slow. It is bug-free ( unlike mine ), and performs pretty much as fast. By debugging the std::vector code, I noticed it used an allocator that grew the number of elements by 2 each time. My own class doing the same, it's quite ironic that I've been using my own class for so long just because I prefered to trust it as far as performance goes, while the std::vector class used the same allocation scheme and was as fast.

After carefully studing the way std::vector worked, I decided to give another go at using my own CArray class. No, I'm no masochist :) But I saw some potential for optimization. The first one being: using malloc/realloc avoids the copy/fill mechanism used in std::vector, meaning no construction/destructions needed when the array grows. A first good point for me. Of course, this time I had to take care of correctly constructing/destructing objects, meaning playing with "placement new" and other goodies.. but no big deal.

Another good point: because I knew what is the most frequent usage I'm doing in my engine of my CArray class ( namely, appending data ), I could optimize the case and even create a special interface to append many elements to the array. I called this the "beginAppend/doAppend/endAppend" mechanism.

After this, I benchmarked my classes: the old ( buggy ) one, std::vector, and my new CArray class. Verdict: std::vector and my old class are on par as far as performance goes, but my new CArray class, in the context of appending millions of elements, is in average 2 to 3.5 times faster ! So I haven't wasted my time, finally.

After I updated my CArray class in the engine core, I tracked memory leaks again and was happy to find none.

While I was fixing the CArray class, I also reviewed the manager/managed objects mechanisms ( reference-count based ) since it was, in rare occasions, corrupting memory. The problem was coming from a CMesh object being referenced by a CObject; the CMesh was deleted but the CObject still had a pointer to it, and when trying to delete the CObject, it tried to release() ( decrease the reference count ) to it, which was corrupting memory.

Next week, I'm planning to resume work on the terrain engine.. and to play with texturing again :)

*goes into screenshot-withdrawl and starts twitching*

Must...have...screenshots...*gah*

*passes out*

Quote:
 Over the years, I regularly so experienced programmers on sites such as gamedev.net claim that the std library was very well done and bug free. Despite this, I continued to use my own classes. Why take the trouble of switching to the std lib when my classes were doing their job fine, and had ideal performance ?
I know exactly where you're coming from. I did the same thing but once I started using the std objects, my coding sessions improved, my self esteem went up, I got laid more, people bought me free beers, and I was elected to be the first president of the moon. So the short story is, sacrifice a little ego and get international recognition in return. Put that in your vector and smoke it. I did. [smile]

Ok. I made all of that up. Except vextors are better.

Quote:
 Another good point: because I knew what is the most frequent usage I'm doing in my engine of my CArray class ( namely, appending data ), I could optimize the case and even create a special interface to append many elements to the array. I called this the "beginAppend/doAppend/endAppend" mechanism.

Is that really any faster than this:
std::vector<T> v;
// ...
v.reserve(v.size() + 2);

v.push_back(a1);
v.push_back(a2);



Maybe some syntax improvement is in order, but can the performance really be improved that much?

Quote:
 The first one being: using malloc/realloc avoids the copy/fill mechanism used in std::vector, meaning no construction/destructions needed when the array grow
This is not true. The C++ specifications explicitly state that vector<T> should NOT copy/fill if there is space available, pretty much the same way realloc works. That's why malloc/realloc have been deprecated.
Quote:
 could optimize the case and even create a special interface to append many elements to the array
This is not necessary. When you properly use the stl algorithms (i.e. copy()), the needed space is automatically pre-allocated, without the need of adding specific class-level algorithms. That's the beauty of (most of) the stl (forget about strings) and of boost.
Quote:
 but my new CArray class, in the context of appending millions of elements, is in average 2 to 3.5 times faster
This really surprises me, unless you're preallocating the data and then using memcpy (which can be done with vector too). How did you append your data? Using a naive .push_back() for every element?

Quote:
 Original post by nova77This is not true. The C++ specifications explicitly state that vector should NOT copy/fill if there is space available, pretty much the same way realloc works. That's why malloc/realloc have been deprecated.

When i wrote "when the array grows", I don't mean when appending an element, but when the pre-allocated maximum size is full. The array then grows by a factor of 2. To perform that operation, std::vector performs a new(), a copy of the data, and a delete() on the old array. Using realloc is potentially faster when the chunk of memory can "grow over itself", and you don't have to call the copy constructor.

Quote:
 Original post by Ysaneya When i wrote "when the array grows", I don't mean when appending an element, but when the pre-allocated maximum size is full. The array then grows by a factor of 2. To perform that operation, std::vector performs a new(), a copy of the data, and a delete() on the old array. Using realloc is potentially faster when the chunk of memory can "grow over itself", and you don't have to call the copy constructor.
And this is the case I was talking about. If a free bloc of memory is available right after the allocated one, then according to the standards vector does *not* performs a new() and then a copy(), but it just allocates the growing amount of data (by the given factor) right after the previous bloc. If there is no contiguous space, then it will have to new(), copy() and finally delete the old data, but that's *exactly* what realloc() does!

Besides, if you are really concerned about memory management, you can specify your own memory manager for the container. It is pretty trivial to write, and after that you don't have to deal with all the rest of the stuff. ;)

Quote:
 Original post by nova77And this is the case I was talking about. If a free bloc of memory is available right after the allocated one, then according to the standards vector does *not* performs a new() and then a copy(), but it just allocates the growing amount of data (by the given factor) right after the previous bloc.

And how can it do that ?

int *x = new int[10];
fill_array_with_random_values(x, 10);


How do you grow this array to, say 20 elements, without a copy/fill ?

The only thing I can think of is doing:

int *newX = new int[20];
memcpy(newX, x, sizeof(int) * 10);