Followers 0

# Using 3rd party libraries or code your own?

## 22 posts in this topic

So I was looking through the Unreal 4 source and I see that they have everything as you would imagine, a whole string library pretty much the same as Boost. Now I guess they are so used to always writing their own as libraries as libs like boost haven't been around for long but I am guessing they could knock maybe up to a third of their code off from using libs like Boost, and various other highly optimised libs.

So what are your thoughts here, write your own or use third party?

0

##### Share on other sites

The main advantage of a 3rd party library is not that the code is written for you. The main advantage is that the code has been tested and debugged for you.

Do not underestimate that. You might look at a library and be horrified at the bloat and hacks and god knows what else, but most of the time, that boat and those hacks represent hours/day/weeks of work tracking down obscure bugs in strange environments.

Always be aware of that if you decide to write your own code instead of using a library.

2

##### Share on other sites

The main issue with pre-existing libraries is that they rarely do exactly what you want, and it's frequently the case that said library needs a lot of wrapping before you can even use its unwieldly API.  You end up doing things like writing C interface layers to poorly documented image processing libraries that think they're cute because they use longjmp.

That said, as terrible as most libraries tend to be, there tends to be at least one that's a very close fit to your requirements.  The hard part is finding it, and the harder part is testing it, because those libraries aren't typically the massively battle-tested ones.

Edited by SeraphLance
0

##### Share on other sites

The other thing to consider as well as the very valid points mentioned above is that the Unreal 4 code base is based on code that has been around for a long time.  Sure Epic may claim that it has been rewritten form scratch but I highly doubt that they wrote every single line.  If they have a string or a collection library that they have been using and maintaining since they release Unreal in 1998 then I really doubt they are going to replace it.

0

##### Share on other sites

but I also have MemCmpF which saves cycles by returning only true or false (as apposed to -1, 0, or 1, which requires a byte-per-byte scan once a discrepancy between the inputs is found).

You'd be at most saving one comparison actually. As soon as you find a byte that's different (not sure if it compares byte by byte or word by word or whatever, but the case is the same regardless of how it does it), you check if the byte is bigger and return based on that.

If you have four bytes 0x01 0x02 0x03 0x04 and compare them to 0x01 0x03 0x03 0x04, as soon as you check the second byte you know the second string is bigger, so you return 1. If the second string was 0x01 0x00 0x03 0x04 you'd return -1 after checking the second byte. The first byte that differs is the most significant byte that differs, so that's all you need.

As for the actual topic, there's always a desire to roll you own when you realize you need something. You always need to be aware of the 80/20 rule, though: 80% of the work will be done in 20% of the time. The last 20% will take 80% of the time. That is to say, you can get most of the way in a short time, which may make you think that you saved a lot of time, but actually taking care of all the edge cases, testing, debugging and the like that would get you all the functionality you need will take you the most time. Deciding whether to roll your own or use 3rd party libraries needs to be carefully considered, there's no simple answer.

The first thing you should do is understand what your requirements are, exactly. The better you can define your requirements, the easier it is to make a decision. Then look at the documentation for the library. Does it match your requirements? How much work would you need to do to get it all the way there? Look around, too, don't just settle for the first library you find. After this you should have a list of libraries that match your requirements. Compare those to determine which one is the best. Does it have an active community of users (good for getting help if you're stuck, and probably means it is in active development), does it have good documentation, does it have a bug tracker, are there any disqualifying bugs still around, is the interface reasonable and easy to use, is the build system simple, etc. Once you have the best of these, compare what it does to how much effort it would take you to match the parts that you actually need.

If you are convinced you can do it better in less time that it would take you to learn the interface, go for it. Otherwise, use the library.

Of course, writing libraries is great for learning too. If time isn't an issue (meaning, generally, a hobby project), do it yourself if you want to. What I generally do is write from scratch things that I think are interesting, and use libraries for things that are boring.
2

##### Share on other sites

For me it depends, when it comes to the standard libraries like vector, map, strings etc., I believe I couldn't do it better/ more efficient. When it comes to for example graphics handling I tend to just use the API and create libraries/ code around it myself. Not that I can do that better maybe then other available libraries, but it helps me in learning a lot.

0

##### Share on other sites

but I also have MemCmpF which saves cycles by returning only true or false (as apposed to -1, 0, or 1, which requires a byte-per-byte scan once a discrepancy between the inputs is found).

You'd be at most saving one comparison actually. ... As for the actual topic, there's always a desire to roll you own when you realize you need something. ... The better you can define your requirements, the easier it is to make a decision.
Looks like you might also benefit from taking time to understand the requirements.

While it is true that the return value is a tiny difference, the actual comparisons can be radically different. Parallel search for mismatch is an algorithm family that can have superlinear speedup. So if you split it up into 4 chunks of work you can do the job much faster than 1/4 the time on average; sometimes dividing into 4 parts and not caring abot exact mismatch position or direction could give 8x speedup rather than a linear 4x speedup, for example. If you are not that particular about the exact nature of the mismatch and are willing to do some algorithmic transformations you can get much faster results than a standard library's memory compare in many situations.
1

##### Share on other sites

Well, if we're getting right down to the stated requirements, L. Spiro said the goal of his memcmp was to save cycles, not time.

The parallel search will have strictly worse cost in cycles spent, just to get the basic boolean result. Trying to get the +1 or -1 result would actually require a significant increase in complexity (compared to checking the first byte that differed).

This is because the single threaded compare only needs to search until it finds a discrepancy, and it can quit early. Parallel compare, on the other hand, has to split the memory into chunks and start every chunk at the same time. Even if you allow for threads to quit as soon as they find a discrepancy, the threads that didn't find one will continue until they are done.

Worse case for the library memcmp is if both strings are equal, since it has to compare it all to return 0. This is the same for the parallel compare, since all the threads will have to do the same number of comparisons. In every other case, every chunk that compares equal has to be run to completion in either parallel or single threaded compare, but the single threaded model won't compare any chunks after the first difference, while the parallel compare will have to compare at the very least the first byte of each chunk. A consequence of this, in fact, is that as you increase the number of chunks, your algorithm becomes faster but also more wasteful. Take the best case scenario, for example, where every byte is different. memcmp will check the first byte and return. Parallel compare will check n bytes and return, where n is the number of chunks you have.

And I'm not even comparing the cost of launching all the threads. Even for perfect threads that cost nothing to create, the algorithm is strictly worse for every case in terms of cycles spent.

0

##### Share on other sites

The parallel search will have strictly worse cost in cycles spent, just to get the basic boolean result. Trying to get the +1 or -1 result would actually require a significant increase in complexity (compared to checking the first byte that differed).

Putting an on-topic spin to this, this is one of the reasons reinventing wheels can be a good thing.

Kian thinks std::memcmp() is so basic, “It’s just comparing bytes until it finds a discrepancy,” but if he had taken the time to write his own routine he would have found it to be at least 4 times slower than std::memcmp().

Even if you don’t think you can write a faster routine, nor think reinventing wheels is a good idea, you should still rewrite some of the STD routines as a learning exercise. You really lose nothing. If yours is faster, then you’ve just made a faster version (just for bugs). If yours is substantially slower, you will be inclined to ask why that is, investigate, and learn.

So, first-off, Kian, it is not implemented as a byte-to-byte compare.
Look at these 2 arrays of 8 bytes:

const u8 pu8One[] = { 0, 1, 2, 3, 4, 5, 6, 7 };
const u8 pu8Two[] = { 0, 1, 2, 3, 4, 5, 7, 6 };

You could do this:

for ( u32 I = 0; I < 8; ++I ) {
if ( (*pu8One++) != (*pu8Two++) ) { /* Found a mismatch. */ }
}

…which takes 7 iterations to discover a mismatch.

Or you could compare either 4 or 8 bytes at a time (depending on the architecture):

const UINT_PTR * puptrOne = reinterpret_cast<const UINT_PTR *>(pu8One);
const UINT_PTR * puptrTwo = reinterpret_cast<const UINT_PTR *>(pu8Two);
for ( UINT_PTR I = 0; I < 8 / sizeof( UINT_PTR ); ++I ) {
if ( (*puptrOne++) != (*puptrTwo++) ) { /* Found a mismatch. */ }
}

…which finds a mismatch in 2 iterations on x86 and 1 iteration on x64.

This is just basic but is already a speed-up of 4 or 8 times. When you don’t need to return -1 or 1 on a mismatch, you are done. Since you aren’t done if you do, by definition returning -1 or 1 is slower than just returning true or false.

Can you think of a way to make it even faster?

L. Spiro

Edited by L. Spiro
2

##### Share on other sites

^ Nice, but what if pu8One/pu8Two have odd and non-POT number of elements?

1

##### Share on other sites

The main advantage of a 3rd party library is not that the code is written for you. The main advantage is that the code has been tested and debugged for you.

Although this is certainly true, it's also dangerous to assume it is extremely well tested for your exact cases.

Taking in 3rd party code is also a risk.

It could speed up development a lot, but it can also trip you on the finish line when some obscure bug in it appears, and you have to scramble to either have it fixed or work around it.

If you have the source code to it, you are a bit safer, but understanding a 3rd party library enough to safely make changes to it also takes some time.

Just wanted to underline that the choice isn't straight forward...

Edited by Olof Hedman
2

##### Share on other sites

On big-endian architectures you can actually just do the final comparison for 4/8 byte chunks just as you would for a byte-by-byte approach (of course, most consumer hardware is little-endian). I would also imagine that unless your buffers are in cache to begin with, your implementation is likely going to be memory-bound. In short, memcmp is not the first function I would look at when micro-optimizing, compared to, say, memory allocation patterns, excessive use of virtual functions, poor cache usage, etc... but, yes, it will probably make you a better programmer to really understand how it is typically implemented on a variety of architectures.

1

##### Share on other sites

Just wanted to underline that the choice isn't straight forward...

Agree one hundred percent.

When people write a library, they write it with a set of assumptions in mind. Those assumptions may or may not be true for your use case.

I've just had to write a work around for a very well known and respected library. They made an assumption that an animation rig would only have a very small number of float tracks attached to it. This assumption is very wrong in our case. Simply getting an integer from the system using a string as a key was taking several milliseconds because they just did a linear walk looking for the string in the array.

So every time you used this function you were doing a hell of a lot of string compares.

I simply used a hashmap to replace the linear walk and saved 100's of milliseconds per frame. A massive speed up for very little code.

Did take a hell of a long time to find the problem though :>

Third party libraries are great as long as you know there limitations

0

##### Share on other sites

Kian thinks std::memcmp() is so basic, “It’s just comparing bytes until it finds a discrepancy,” but if he had taken the time to write his own routine he would have found it to be at least 4 times slower than std::memcmp().

Actually, on my first reply to you I clarified:

(not sure if it compares byte by byte or word by word or whatever, but the case is the same regardless of how it does it)

I'm using byte to stand in for whatever your largest integral type is, since regardless of the size of your comparisons, the algorithm is the same. I didn't want to get into architectural details like how many bytes you can check per instruction. The effect is that you are checking for the first bit that differs. A byte just lets you check 8 bits at a time, a 32 bit pointer 32 bits at a time and a 64 bit pointer 64 bits at a time.

I did agree that returning true or false was faster, I just didn't perhaps entirely understand what you meant by needing to check byte by byte to get -1 or +1. With your example:

const UINT_PTR * puptrOne = reinterpret_cast<const UINT_PTR *>(pu8One);
const UINT_PTR * puptrTwo = reinterpret_cast<const UINT_PTR *>(pu8Two);
for ( UINT_PTR I = 0; I < 8 / sizeof( UINT_PTR ); ++I ) {
if ( (*puptrOne++) != (*puptrTwo++) ) {
/* Found a mismatch. */
if ( *--puptrOne > *--puptrTwo ){
return +1;
}
else {
return -1;
}
return 0;
}
There's no byte-by-byte checking required. As soon as you find a mismatch it takes a single comparison, regardless of how many bits you're checking at a time.
0

##### Share on other sites

Kian thinks std::memcmp() is so basic, “It’s just comparing bytes until it finds a discrepancy,” but if he had taken the time to write his own routine he would have found it to be at least 4 times slower than std::memcmp().

Actually, on my first reply to you I clarified:

(not sure if it compares byte by byte or word by word or whatever, but the case is the same regardless of how it does it)

I'm using byte to stand in for whatever your largest integral type is, since regardless of the size of your comparisons, the algorithm is the same. I didn't want to get into architectural details like how many bytes you can check per instruction. The effect is that you are checking for the first bit that differs. A byte just lets you check 8 bits at a time, a 32 bit pointer 32 bits at a time and a 64 bit pointer 64 bits at a time.

I did agree that returning true or false was faster, I just didn't perhaps entirely understand what you meant by needing to check byte by byte to get -1 or +1. With your example:

const UINT_PTR * puptrOne = reinterpret_cast<const UINT_PTR *>(pu8One);
const UINT_PTR * puptrTwo = reinterpret_cast<const UINT_PTR *>(pu8Two);
for ( UINT_PTR I = 0; I < 8 / sizeof( UINT_PTR ); ++I ) {
if ( (*puptrOne++) != (*puptrTwo++) ) {
/* Found a mismatch. */
if ( *--puptrOne > *--puptrTwo ){
return +1;
}
else {
return -1;
}
return 0;
}
There's no byte-by-byte checking required. As soon as you find a mismatch it takes a single comparison, regardless of how many bits you're checking at a time.

That's actually more checking then you need. You'll note that memcmp (and several other C comparison functions) does NOT return -1, 0, and 1. They return <0, 0, or >0. This is because they are most likely doing the following:

auto result = value2 - value1;
return result;

No conditionals. If you're having to iterate, then you only need to do a single compare of the result against 0 and return if the result is non-0. That's going to be generally faster then two conditionals and exact -1,0,1 return values.
0

##### Share on other sites

I did agree that returning true or false was faster, I just didn't perhaps entirely understand what you meant by needing to check byte by byte to get -1 or +1. With your example:

And herein lies your problem.
Think about what std::memcpy() returns.
It returns -1 or 1 on the first byte that mismatches.

A value greater than zero indicates that the first byte that does not match in both memory blocks has a greater value in ptr1 than in ptr2 as if evaluated as unsigned char values; And a value less than zero indicates the opposite.

As Bacterius correctly mentions, you can compare all 4 or 8 bytes at a time on big-endian machines and return the correct value, but I chose the 2 arrays I did intentionally to illustrate the problem you have with little-endian machines (Windows, iOS, Android, Mac OS X…).

If you cast pu8One and pu8Two to DWORD’s for the compare you would find a difference between them on reinterpret_cast<DWORD *>(&pu8One[4]) and reinterpret_cast<DWORD *>(&pu8Two[4]) (let’s call those addresses pdwOne and pdwTwo).
(*pdwOne) == 117835012 (0x07060504).
(*pdwTwo) == 101123332 (0x06070504).
This means you would return 1, when in fact you should return -1 (on a per-byte compare, the first non-matching byte is greater in pu8Two).

I am not going to just give you the solution.  Coming up with the correct solutions on your own is good for training and is exactly the purpose in challenging yourself to rewrite seemingly mundane things such a memory compares, string compares, string lengths, etc.

This doesn’t directly answer the original poster’s question as to why big names rewrite things, but it illustrates that reinventing wheels, even when not necessarily providing a real-world application, is a great way to practice critical thinking and learn computer-specific tricks that will help you forever into the future (or at least until you stop programming).
You quickly brushed off memcmp as trivial, but if you had ever taken the time to try it for yourself you would find its hidden complexities lie in its subtleties.

L. Spiro Edited by L. Spiro
0

##### Share on other sites

No conditionals. If you're having to iterate, then you only need to do a single compare of the result against 0 and return if the result is non-0. That's going to be generally faster then two conditionals and exact -1,0,1 return values.

Ah, you're right. I didn't go back to check the spec for memcpy.

You quickly brushed off memcmp as trivial, but if you had ever taken the time to try it for yourself you would find its complexities lie in its hidden subtleties.

I didn't brush it off as trivial. I was confused by you saying that you would need to check byte-by-byte after finding a mismatch. Yes, little endian vs big endian is something I don't often think about, since I'm not generally working at that level.

If we have the pattern 0x01 0x02 0x03 0x04, in a little endian architecture, and compare against 0x02 0x02 0x03 0x03, we would want this to find the second pattern to be larger. But when we read it, they get interpreted as 0x04 0x03 0x02 0x01 and 0x03 0x03 0x02 0x02. So the processor will think the second pattern is smaller.

I suppose that I would then do
const UINT_PTR * puptrOne = reinterpret_cast<const UINT_PTR *>(pu8One);
const UINT_PTR * puptrTwo = reinterpret_cast<const UINT_PTR *>(pu8Two);
for ( UINT_PTR I = 0; I < 8 / sizeof( UINT_PTR ); ++I ) {
if ( (*puptrOne++) != (*puptrTwo++) ) {
/* Found a mismatch. */
auto rPtrOne = reinterpret_cast<const unsigned char*>( puptrOne );
auto rPtrTwo = reinterpret_cast<const unsigned char*>( puptrTwo );
// For a UINT_PTR of size 4. I could do a bit of template or macro magic to have it choose at compile time
// something appropriate for size 8 UINT_PTR
UINT_PTR reverseValueOne = rPtrOne[-4]>>24 | rPtrOne[-3]>>16 | rPtrOne[-2]>>8 | rPtrOne[-1]>>0;
UINT_PTR reverseValueTwo = rPtrTwo[-4]>>24 | rPtrTwo[-3]>>16 | rPtrTwo[-2]>>8 | rPtrTwo[-1]>>0;
return reverseValueOne - reverseValueTwo;
}
}

I suppose checking byte by byte also works. I'd need to check how the ifs compare, though I generally believe branching logic is much more expensive than following a single path.
0

##### Share on other sites
I agree. This is specially true for your language's libraries. Whoever implemented it for your system will have more information about the environment than you do. They don't need to worry about portability. If you try to roll your own, however, you need to make sure it can run on every platform you might want to support. It can be a good learning experience, but I wouldn't use it in production code.
0

## Create an account or sign in to comment

You need to be a member in order to leave a comment

## Create an account

Sign up for a new account in our community. It's easy!

Register a new account