• Create Account

## Using 3rd party libraries or code your own?

Old topic!

Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

22 replies to this topic

### #1rAm_y_  Members

686
Like
0Likes
Like

Posted 24 June 2014 - 04:36 AM

So I was looking through the Unreal 4 source and I see that they have everything as you would imagine, a whole string library pretty much the same as Boost. Now I guess they are so used to always writing their own as libraries as libs like boost haven't been around for long but I am guessing they could knock maybe up to a third of their code off from using libs like Boost, and various other highly optimised libs.

So what are your thoughts here, write your own or use third party?

### #2SmkViper  Members

5364
Like
8Likes
Like

Posted 24 June 2014 - 07:42 AM

POPULAR

It's going to depend.

For my own personal projects I tend to write my own because I enjoy learning new things by writing "from scratch" as it were. Though for things I've already written before and/or understand I'll use a library (i.e. C++ STL library) to save time. In these cases execution speed isn't a huge deal because I'm not writing huge AAA games with the latest tech at home.

Professionally it's also going to be a mix. You'll find a lot of companies using middleware like Havok or Unreal, texture loaders like DirectXTex, XML parsing libraries, etc. Not aware of many using boost - it tends to very quickly bloat compilation times and has the same problem STL has in that it's optimized for the general case.

There is, however, a bit of a stigma against the C++ STL library due to studios being burned by badly written implementations in the past - and in some cases, companies will write their own implementation of things like arrays or strings because they can actually write them faster or more memory efficient then the "standard" version for their particular use case.

A good example is the strings you brought up. A common thing to do is to write a string class that stores string data in a large table, and then if someone makes another string with the same data it points at the table instead of allocating more memory. Not only does this save on memory over the STL implementation (which has to allocate memory for every string copy), but you can get lightning-fast string comparisons because you only have to compare pointers and not whole strings. The downside is that these kinds of strings are immutable - but for a game, that is rarely a problem.

Edit: Most of the above should probably be prefixed with "In my experience" - though that may be assumed

Edited by SmkViper, 24 June 2014 - 07:50 AM.

### #3frob  Moderators

41243
Like
10Likes
Like

Posted 24 June 2014 - 08:01 AM

POPULAR

a whole string library pretty much the same as Boost.

When it comes to the C++ standard library, and also tasks like string manipulation, there is a lot of subtlety involved beyond just the public-facing API.

Carefully read this report given to the C++ standards committee. You will probably find it enlightening, as it describes many of the tiny reasons the standard library doesn't precisely fit our industry's needs. Some of the reasons are common, even after 20 years there is no really good solution to the problem of allocators inside template collections as they relate to code boundaries. Others are less common, like the desire to bulk-destruct objects without calling their destructors.

Check out my book, Game Development with Unity, aimed at beginners who want to build fun games fast.

Also check out my personal website at bryanwagstaff.com, where I occasionally write about assorted stuff.

### #4Stainless  Members

1875
Like
3Likes
Like

Posted 24 June 2014 - 03:44 PM

The decision to use a third party library is never an easy one.

I personally don't believe in re-inventing the wheel, if someone has already done a really good job of doing something, why should I invest the time doing the same ?

However there are times when "the best out there" isn't good enough. Then you need to get involved and do some serious coding.

It's a cost versus advantage equation.

If using an external library saves you 3 man months of coding at the cost of 1ms per loop, it's probably an easy decision ... until you really need that millisecond

We use third party code, and pay a lot for the privilege. It's not always a good call. I've just saved 100's of milliseconds in a game because I noticed a problem in a third party library.

Now this library is very good, don't get me wrong, at times it's brilliant. However it was designed for a small number of animation tracks. We use thousands of animation tracks. Huge numbers of the damn things. So the code was doing something it wasn't designed to do, and hence it was slow.

I added three lines of code and increased the speed by several thousand percent. The code was simple, finding the problem was a bitch. I'm talking a 20 year old bitch with massive PMT.

Would we have had that problem if we wrote it ourselves? Hell no!

Would we get the game out on time if we had wrote it ourselves...... probably not.

### #5L. Spiro  Members

24826
Like
8Likes
Like

Posted 24 June 2014 - 06:05 PM

POPULAR

As frob mentioned you will find in certain cases that the motivation behind rewriting certain things (such as the STL or standard library) is related to needs specific to our industry.
High-level companies and individuals do it because they can write code that is faster, more suited to our needs, or both.
Electronic Arts explains the motivations behind EASTL here.

I myself use my own CVector and CVectorPoD, which is optimized for plain-ol’-data objects that can be moved in large chunks at a time (in memory resizes etc.) instead of one-by-one.
I also use my own MemCmp, which behaves the same as std::memcmp (and performs the same), but I also have MemCmpF which saves cycles by returning only true or false (as apposed to -1, 0, or 1, which requires a per-byte scan once a discrepancy between the inputs is found).

So when you are rewriting specific functions or classes, it typically boils down to:
• Am I satisfying a specific need while retaining similar performance or better, or;
• Am I simply improving performance significantly?
Boost is something you definitely do not want to use in your game projects.

In regards to actual libraries, things get more complex.
I originally wrote all my image-loading routines for .BMP, .TGA, .GIF, .PNG, and .DDS, which was a good experience and fun to do, but finally recently decided to add support for every file format ever to exist by just using FreeImage.
I don’t regret taking the time to reinvent a few wheels at least temporarily.  Reinventing wheels is a great way to get a hands-on understanding of how things work, and if everyone never reinvented wheels then technology would improve at a snail’s pace.

Additionally, the whole reason I started my engine was to make a physics engine.  I did make a miniature one and it was used in a 爆丸バトルブローラーズ (Bakugan Battle Brawlers) game for Activision, which was a nice experience, but it was very basic and I will be using Bullet and PhysX this time around.

Ultimately, you need to decide if using a 3rd-party library is worth it for you based on what it brings and how much you care about your own personal balance on time, but nothing stops you from trying to make the same thing if you are interested in trying to learn that type of technology first-hand.

Libraries help you get things done faster but simple things such as string-compares are good exercises and larger things such as physics engines really help you understand how complex systems work under-the-hood.

L. Spiro

Edited by L. Spiro, 26 June 2014 - 01:17 AM.

### #6ChaosEngine  Members

4745
Like
2Likes
Like

Posted 24 June 2014 - 08:40 PM

The main advantage of a 3rd party library is not that the code is written for you. The main advantage is that the code has been tested and debugged for you.

Do not underestimate that. You might look at a library and be horrified at the bloat and hacks and god knows what else, but most of the time, that boat and those hacks represent hours/day/weeks of work tracking down obscure bugs in strange environments.

Always be aware of that if you decide to write your own code instead of using a library.

if you think programming is like sex, you probably haven't done much of either.-------------- - capn_midnight

### #7SeraphLance  Members

2403
Like
0Likes
Like

Posted 24 June 2014 - 10:33 PM

The main issue with pre-existing libraries is that they rarely do exactly what you want, and it's frequently the case that said library needs a lot of wrapping before you can even use its unwieldly API.  You end up doing things like writing C interface layers to poorly documented image processing libraries that think they're cute because they use longjmp.

That said, as terrible as most libraries tend to be, there tends to be at least one that's a very close fit to your requirements.  The hard part is finding it, and the harder part is testing it, because those libraries aren't typically the massively battle-tested ones.

Edited by SeraphLance, 24 June 2014 - 10:33 PM.

### #8Buster2000  Members

4055
Like
0Likes
Like

Posted 25 June 2014 - 05:34 AM

The other thing to consider as well as the very valid points mentioned above is that the Unreal 4 code base is based on code that has been around for a long time.  Sure Epic may claim that it has been rewritten form scratch but I highly doubt that they wrote every single line.  If they have a string or a collection library that they have been using and maintaining since they release Unreal in 1998 then I really doubt they are going to replace it.

### #9Kian  Members

243
Like
2Likes
Like

Posted 25 June 2014 - 08:00 AM

but I also have MemCmpF which saves cycles by returning only true or false (as apposed to -1, 0, or 1, which requires a byte-per-byte scan once a discrepancy between the inputs is found).

You'd be at most saving one comparison actually. As soon as you find a byte that's different (not sure if it compares byte by byte or word by word or whatever, but the case is the same regardless of how it does it), you check if the byte is bigger and return based on that.

If you have four bytes 0x01 0x02 0x03 0x04 and compare them to 0x01 0x03 0x03 0x04, as soon as you check the second byte you know the second string is bigger, so you return 1. If the second string was 0x01 0x00 0x03 0x04 you'd return -1 after checking the second byte. The first byte that differs is the most significant byte that differs, so that's all you need.

As for the actual topic, there's always a desire to roll you own when you realize you need something. You always need to be aware of the 80/20 rule, though: 80% of the work will be done in 20% of the time. The last 20% will take 80% of the time. That is to say, you can get most of the way in a short time, which may make you think that you saved a lot of time, but actually taking care of all the edge cases, testing, debugging and the like that would get you all the functionality you need will take you the most time. Deciding whether to roll your own or use 3rd party libraries needs to be carefully considered, there's no simple answer.

The first thing you should do is understand what your requirements are, exactly. The better you can define your requirements, the easier it is to make a decision. Then look at the documentation for the library. Does it match your requirements? How much work would you need to do to get it all the way there? Look around, too, don't just settle for the first library you find. After this you should have a list of libraries that match your requirements. Compare those to determine which one is the best. Does it have an active community of users (good for getting help if you're stuck, and probably means it is in active development), does it have good documentation, does it have a bug tracker, are there any disqualifying bugs still around, is the interface reasonable and easy to use, is the build system simple, etc. Once you have the best of these, compare what it does to how much effort it would take you to match the parts that you actually need.

If you are convinced you can do it better in less time that it would take you to learn the interface, go for it. Otherwise, use the library.

Of course, writing libraries is great for learning too. If time isn't an issue (meaning, generally, a hobby project), do it yourself if you want to. What I generally do is write from scratch things that I think are interesting, and use libraries for things that are boring.

### #10cozzie  Members

4623
Like
0Likes
Like

Posted 25 June 2014 - 10:36 AM

For me it depends, when it comes to the standard libraries like vector, map, strings etc., I believe I couldn't do it better/ more efficient. When it comes to for example graphics handling I tend to just use the API and create libraries/ code around it myself. Not that I can do that better maybe then other available libraries, but it helps me in learning a lot.

Crealysm game & engine development: http://www.crealysm.com

Looking for a passionate, disciplined and structured producer? PM me

### #11frob  Moderators

41243
Like
1Likes
Like

Posted 25 June 2014 - 12:27 PM

but I also have MemCmpF which saves cycles by returning only true or false (as apposed to -1, 0, or 1, which requires a byte-per-byte scan once a discrepancy between the inputs is found).

You'd be at most saving one comparison actually. ... As for the actual topic, there's always a desire to roll you own when you realize you need something. ... The better you can define your requirements, the easier it is to make a decision.
Looks like you might also benefit from taking time to understand the requirements.

While it is true that the return value is a tiny difference, the actual comparisons can be radically different. Parallel search for mismatch is an algorithm family that can have superlinear speedup. So if you split it up into 4 chunks of work you can do the job much faster than 1/4 the time on average; sometimes dividing into 4 parts and not caring abot exact mismatch position or direction could give 8x speedup rather than a linear 4x speedup, for example. If you are not that particular about the exact nature of the mismatch and are willing to do some algorithmic transformations you can get much faster results than a standard library's memory compare in many situations.

Check out my book, Game Development with Unity, aimed at beginners who want to build fun games fast.

Also check out my personal website at bryanwagstaff.com, where I occasionally write about assorted stuff.

### #12Kian  Members

243
Like
0Likes
Like

Posted 25 June 2014 - 07:38 PM

Well, if we're getting right down to the stated requirements, L. Spiro said the goal of his memcmp was to save cycles, not time.

The parallel search will have strictly worse cost in cycles spent, just to get the basic boolean result. Trying to get the +1 or -1 result would actually require a significant increase in complexity (compared to checking the first byte that differed).

This is because the single threaded compare only needs to search until it finds a discrepancy, and it can quit early. Parallel compare, on the other hand, has to split the memory into chunks and start every chunk at the same time. Even if you allow for threads to quit as soon as they find a discrepancy, the threads that didn't find one will continue until they are done.

Worse case for the library memcmp is if both strings are equal, since it has to compare it all to return 0. This is the same for the parallel compare, since all the threads will have to do the same number of comparisons. In every other case, every chunk that compares equal has to be run to completion in either parallel or single threaded compare, but the single threaded model won't compare any chunks after the first difference, while the parallel compare will have to compare at the very least the first byte of each chunk. A consequence of this, in fact, is that as you increase the number of chunks, your algorithm becomes faster but also more wasteful. Take the best case scenario, for example, where every byte is different. memcmp will check the first byte and return. Parallel compare will check n bytes and return, where n is the number of chunks you have.

And I'm not even comparing the cost of launching all the threads. Even for perfect threads that cost nothing to create, the algorithm is strictly worse for every case in terms of cycles spent.

### #13L. Spiro  Members

24826
Like
2Likes
Like

Posted 26 June 2014 - 01:14 AM

The parallel search will have strictly worse cost in cycles spent, just to get the basic boolean result. Trying to get the +1 or -1 result would actually require a significant increase in complexity (compared to checking the first byte that differed).

Putting an on-topic spin to this, this is one of the reasons reinventing wheels can be a good thing.

Kian thinks std::memcmp() is so basic, “It’s just comparing bytes until it finds a discrepancy,” but if he had taken the time to write his own routine he would have found it to be at least 4 times slower than std::memcmp().

Even if you don’t think you can write a faster routine, nor think reinventing wheels is a good idea, you should still rewrite some of the STD routines as a learning exercise. You really lose nothing. If yours is faster, then you’ve just made a faster version (just for bugs). If yours is substantially slower, you will be inclined to ask why that is, investigate, and learn.

So, first-off, Kian, it is not implemented as a byte-to-byte compare.
Look at these 2 arrays of 8 bytes:

const u8 pu8One[] = { 0, 1, 2, 3, 4, 5, 6, 7 };
const u8 pu8Two[] = { 0, 1, 2, 3, 4, 5, 7, 6 };

You could do this:

for ( u32 I = 0; I < 8; ++I ) {
if ( (*pu8One++) != (*pu8Two++) ) { /* Found a mismatch. */ }
}

…which takes 7 iterations to discover a mismatch.

Or you could compare either 4 or 8 bytes at a time (depending on the architecture):

const UINT_PTR * puptrOne = reinterpret_cast<const UINT_PTR *>(pu8One);
const UINT_PTR * puptrTwo = reinterpret_cast<const UINT_PTR *>(pu8Two);
for ( UINT_PTR I = 0; I < 8 / sizeof( UINT_PTR ); ++I ) {
if ( (*puptrOne++) != (*puptrTwo++) ) { /* Found a mismatch. */ }
}

…which finds a mismatch in 2 iterations on x86 and 1 iteration on x64.

This is just basic but is already a speed-up of 4 or 8 times. When you don’t need to return -1 or 1 on a mismatch, you are done. Since you aren’t done if you do, by definition returning -1 or 1 is slower than just returning true or false.

Can you think of a way to make it even faster?

L. Spiro

Edited by L. Spiro, 26 June 2014 - 07:48 AM.

### #14belfegor  Members

2833
Like
1Likes
Like

Posted 26 June 2014 - 02:01 AM

^ Nice, but what if pu8One/pu8Two have odd and non-POT number of elements?

### #15Olof Hedman  Members

5699
Like
2Likes
Like

Posted 26 June 2014 - 02:18 AM

The main advantage of a 3rd party library is not that the code is written for you. The main advantage is that the code has been tested and debugged for you.

Although this is certainly true, it's also dangerous to assume it is extremely well tested for your exact cases.

Taking in 3rd party code is also a risk.

It could speed up development a lot, but it can also trip you on the finish line when some obscure bug in it appears, and you have to scramble to either have it fixed or work around it.

If you have the source code to it, you are a bit safer, but understanding a 3rd party library enough to safely make changes to it also takes some time.

Just wanted to underline that the choice isn't straight forward...

Edited by Olof Hedman, 26 June 2014 - 02:19 AM.

### #16Bacterius  Members

13100
Like
1Likes
Like

Posted 26 June 2014 - 02:46 AM

On big-endian architectures you can actually just do the final comparison for 4/8 byte chunks just as you would for a byte-by-byte approach (of course, most consumer hardware is little-endian). I would also imagine that unless your buffers are in cache to begin with, your implementation is likely going to be memory-bound. In short, memcmp is not the first function I would look at when micro-optimizing, compared to, say, memory allocation patterns, excessive use of virtual functions, poor cache usage, etc... but, yes, it will probably make you a better programmer to really understand how it is typically implemented on a variety of architectures.

“If I understand the standard right it is legal and safe to do this but the resulting value could be anything.”

### #17Stainless  Members

1875
Like
0Likes
Like

Posted 26 June 2014 - 04:25 AM

Just wanted to underline that the choice isn't straight forward...

Agree one hundred percent.

When people write a library, they write it with a set of assumptions in mind. Those assumptions may or may not be true for your use case.

I've just had to write a work around for a very well known and respected library. They made an assumption that an animation rig would only have a very small number of float tracks attached to it. This assumption is very wrong in our case. Simply getting an integer from the system using a string as a key was taking several milliseconds because they just did a linear walk looking for the string in the array.

So every time you used this function you were doing a hell of a lot of string compares.

I simply used a hashmap to replace the linear walk and saved 100's of milliseconds per frame. A massive speed up for very little code.

Did take a hell of a long time to find the problem though :>

Third party libraries are great as long as you know there limitations

### #18Kian  Members

243
Like
0Likes
Like

Posted 26 June 2014 - 07:14 AM

Kian thinks std::memcmp() is so basic, “It’s just comparing bytes until it finds a discrepancy,” but if he had taken the time to write his own routine he would have found it to be at least 4 times slower than std::memcmp().

Actually, on my first reply to you I clarified:

(not sure if it compares byte by byte or word by word or whatever, but the case is the same regardless of how it does it)

I'm using byte to stand in for whatever your largest integral type is, since regardless of the size of your comparisons, the algorithm is the same. I didn't want to get into architectural details like how many bytes you can check per instruction. The effect is that you are checking for the first bit that differs. A byte just lets you check 8 bits at a time, a 32 bit pointer 32 bits at a time and a 64 bit pointer 64 bits at a time.

I did agree that returning true or false was faster, I just didn't perhaps entirely understand what you meant by needing to check byte by byte to get -1 or +1. With your example:

const UINT_PTR * puptrOne = reinterpret_cast<const UINT_PTR *>(pu8One);
const UINT_PTR * puptrTwo = reinterpret_cast<const UINT_PTR *>(pu8Two);
for ( UINT_PTR I = 0; I < 8 / sizeof( UINT_PTR ); ++I ) {
if ( (*puptrOne++) != (*puptrTwo++) ) {
/* Found a mismatch. */
if ( *--puptrOne > *--puptrTwo ){
return +1;
}
else {
return -1;
}
return 0;
}
There's no byte-by-byte checking required. As soon as you find a mismatch it takes a single comparison, regardless of how many bits you're checking at a time.

### #19SmkViper  Members

5364
Like
0Likes
Like

Posted 26 June 2014 - 07:53 AM

Kian thinks std::memcmp() is so basic, “It’s just comparing bytes until it finds a discrepancy,” but if he had taken the time to write his own routine he would have found it to be at least 4 times slower than std::memcmp().

Actually, on my first reply to you I clarified:

(not sure if it compares byte by byte or word by word or whatever, but the case is the same regardless of how it does it)

I'm using byte to stand in for whatever your largest integral type is, since regardless of the size of your comparisons, the algorithm is the same. I didn't want to get into architectural details like how many bytes you can check per instruction. The effect is that you are checking for the first bit that differs. A byte just lets you check 8 bits at a time, a 32 bit pointer 32 bits at a time and a 64 bit pointer 64 bits at a time.

I did agree that returning true or false was faster, I just didn't perhaps entirely understand what you meant by needing to check byte by byte to get -1 or +1. With your example:

const UINT_PTR * puptrOne = reinterpret_cast<const UINT_PTR *>(pu8One);
const UINT_PTR * puptrTwo = reinterpret_cast<const UINT_PTR *>(pu8Two);
for ( UINT_PTR I = 0; I < 8 / sizeof( UINT_PTR ); ++I ) {
if ( (*puptrOne++) != (*puptrTwo++) ) {
/* Found a mismatch. */
if ( *--puptrOne > *--puptrTwo ){
return +1;
}
else {
return -1;
}
return 0;
}
There's no byte-by-byte checking required. As soon as you find a mismatch it takes a single comparison, regardless of how many bits you're checking at a time.

That's actually more checking then you need. You'll note that memcmp (and several other C comparison functions) does NOT return -1, 0, and 1. They return <0, 0, or >0. This is because they are most likely doing the following:

auto result = value2 - value1;
return result;

No conditionals. If you're having to iterate, then you only need to do a single compare of the result against 0 and return if the result is non-0. That's going to be generally faster then two conditionals and exact -1,0,1 return values.

### #20L. Spiro  Members

24826
Like
0Likes
Like

Posted 26 June 2014 - 08:02 AM

I did agree that returning true or false was faster, I just didn't perhaps entirely understand what you meant by needing to check byte by byte to get -1 or +1. With your example:

It returns -1 or 1 on the first byte that mismatches.

A value greater than zero indicates that the first byte that does not match in both memory blocks has a greater value in ptr1 than in ptr2 as if evaluated as unsigned char values; And a value less than zero indicates the opposite.

As Bacterius correctly mentions, you can compare all 4 or 8 bytes at a time on big-endian machines and return the correct value, but I chose the 2 arrays I did intentionally to illustrate the problem you have with little-endian machines (Windows, iOS, Android, Mac OS X…).

If you cast pu8One and pu8Two to DWORD’s for the compare you would find a difference between them on reinterpret_cast<DWORD *>(&pu8One[4]) and reinterpret_cast<DWORD *>(&pu8Two[4]) (let’s call those addresses pdwOne and pdwTwo).
(*pdwOne) == 117835012 (0x07060504).
(*pdwTwo) == 101123332 (0x06070504).
This means you would return 1, when in fact you should return -1 (on a per-byte compare, the first non-matching byte is greater in pu8Two).

I am not going to just give you the solution.  Coming up with the correct solutions on your own is good for training and is exactly the purpose in challenging yourself to rewrite seemingly mundane things such a memory compares, string compares, string lengths, etc.

This doesn’t directly answer the original poster’s question as to why big names rewrite things, but it illustrates that reinventing wheels, even when not necessarily providing a real-world application, is a great way to practice critical thinking and learn computer-specific tricks that will help you forever into the future (or at least until you stop programming).
You quickly brushed off memcmp as trivial, but if you had ever taken the time to try it for yourself you would find its hidden complexities lie in its subtleties.

L. Spiro

Edited by L. Spiro, 26 June 2014 - 08:36 AM.

Old topic!

Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.