Using 3rd party libraries or code your own?

Started by
21 comments, last by Kian 9 years, 10 months ago

No conditionals. If you're having to iterate, then you only need to do a single compare of the result against 0 and return if the result is non-0. That's going to be generally faster then two conditionals and exact -1,0,1 return values.

Ah, you're right. I didn't go back to check the spec for memcpy.

You quickly brushed off memcmp as trivial, but if you had ever taken the time to try it for yourself you would find its complexities lie in its hidden subtleties.

I didn't brush it off as trivial. I was confused by you saying that you would need to check byte-by-byte after finding a mismatch. Yes, little endian vs big endian is something I don't often think about, since I'm not generally working at that level.

If we have the pattern 0x01 0x02 0x03 0x04, in a little endian architecture, and compare against 0x02 0x02 0x03 0x03, we would want this to find the second pattern to be larger. But when we read it, they get interpreted as 0x04 0x03 0x02 0x01 and 0x03 0x03 0x02 0x02. So the processor will think the second pattern is smaller.

I suppose that I would then do

const UINT_PTR * puptrOne = reinterpret_cast<const UINT_PTR *>(pu8One);
const UINT_PTR * puptrTwo = reinterpret_cast<const UINT_PTR *>(pu8Two);
for ( UINT_PTR I = 0; I < 8 / sizeof( UINT_PTR ); ++I ) {
    if ( (*puptrOne++) != (*puptrTwo++) ) { 
        /* Found a mismatch. */ 
        auto rPtrOne = reinterpret_cast<const unsigned char*>( puptrOne );
        auto rPtrTwo = reinterpret_cast<const unsigned char*>( puptrTwo );
        // For a UINT_PTR of size 4. I could do a bit of template or macro magic to have it choose at compile time 
        // something appropriate for size 8 UINT_PTR
        UINT_PTR reverseValueOne = rPtrOne[-4]>>24 | rPtrOne[-3]>>16 | rPtrOne[-2]>>8 | rPtrOne[-1]>>0;
        UINT_PTR reverseValueTwo = rPtrTwo[-4]>>24 | rPtrTwo[-3]>>16 | rPtrTwo[-2]>>8 | rPtrTwo[-1]>>0;
        return reverseValueOne - reverseValueTwo;
    }
}
I suppose checking byte by byte also works. I'd need to check how the ifs compare, though I generally believe branching logic is much more expensive than following a single path.
Advertisement

Since we've derailed to the point of discussing memory comparison...

What about alignment? Imagine I give you starting pointers that are not properly aligned for the architecture; lets say the first is aligned at 3 bytes above a 16-byte boundary, the other is 9 bytes above it. Now by blindly treating them as larger values you are getting a nasty penalty. Wrong alignment could destroy your performance far worse than a few additional comparisons.

What about the locations they are reading from? What are the distances from each other? If by some horrible luck your memory is located at LLC_size/cache_page_size distances from each other at offsets that don't work with the chip's cache associations, suddenly the chip's cache is completely useless. Every comparison causes a cache invalidation. Cache thrashing will far outweigh just about any other performance problem.

How long are the chunks of memory to be compared? Once the length is sufficiently large the overhead of bigger parallelization, either across a single chip or across multiple chips, will eventually cross the threshold and become faster solutions. ... but the overhead of detecting the situation will require several cycles which are not necessary for very small memory sizes.

It is really hard to write code that works well in all cases. If you have some specialized knowledge you can usually write some limited algorithm that works much faster than a general purpose algorithm.

You might be able to craft an algorithm that works best because you know the memory does not overlap, is located exactly in 64-byte alignment, and is an exact 64-byte incremental length, and has a total length of less than or exactly 4096 bytes. With all those guarantees you can blow away the performance of a general purpose memory comparison function.

Generally it is faster and easier for programmers to rely on other people's code.

You do it all the time.

You use the Windows libraries to handle your disk loads rather than writing huge amounts of code to directly handle every possible type of disk driver, attached through any type of connection from SCSI to IDE or SATA to PATA to a bunch of chained USB devices connecting to an arbitrary storage device. You just use FileOpen and let the third party code do the work.

You use Direct3D or OpenGL libraries to handle all the graphics work, rather than writing huge amounts of code for every kind of card and chipset that will map memory windows and transfer card-specific data for everything.

I remember back in the bad old days having to do much of that myself as a hobby developer in the late 1980s. Detecting an official SoundBlaster 16 Pro card usually works, unless they had a long list of inferior cards where detecting it incorrectly could freeze the computer. But maybe they had one of the Turtle Beach cards, so you needed to account for those nicer computers either directly or by using their emulation layers. Then for graphics you needed to code directly to various EGA banks using a standard set of commands ... except for a short list of incompatible cards that didn't precisely implement the standard.

Use third party libraries, unless you have a good reason to use your own. Educational reasons are a good reason to use your own. Documented performance reasons are a good reason to use your own. Knowledge of the underlying systems are a good reason to use your own. Your ability to make guarantees is a good reason to use your own.

The original question was about Unreal using a collection of their own libraries that were similar to (but different from) several existing implementations. The reason is that Epic, for whatever reason, decided to implement their own. Usually game studios don't rewrite standard libraries unless they have some specific reason to do so. We can speculate for their reasons, or you can look up the exact implementation, see if they left any comments describing the motivation, and maybe even ask on their support forums if you still don't see it.

I agree. This is specially true for your language's libraries. Whoever implemented it for your system will have more information about the environment than you do. They don't need to worry about portability. If you try to roll your own, however, you need to make sure it can run on every platform you might want to support. It can be a good learning experience, but I wouldn't use it in production code.

This topic is closed to new replies.

Advertisement