Ah, you're right. I didn't go back to check the spec for memcpy.No conditionals. If you're having to iterate, then you only need to do a single compare of the result against 0 and return if the result is non-0. That's going to be generally faster then two conditionals and exact -1,0,1 return values.
I didn't brush it off as trivial. I was confused by you saying that you would need to check byte-by-byte after finding a mismatch. Yes, little endian vs big endian is something I don't often think about, since I'm not generally working at that level.You quickly brushed off memcmp as trivial, but if you had ever taken the time to try it for yourself you would find its complexities lie in its hidden subtleties.
If we have the pattern 0x01 0x02 0x03 0x04, in a little endian architecture, and compare against 0x02 0x02 0x03 0x03, we would want this to find the second pattern to be larger. But when we read it, they get interpreted as 0x04 0x03 0x02 0x01 and 0x03 0x03 0x02 0x02. So the processor will think the second pattern is smaller.
I suppose that I would then do
const UINT_PTR * puptrOne = reinterpret_cast<const UINT_PTR *>(pu8One);
const UINT_PTR * puptrTwo = reinterpret_cast<const UINT_PTR *>(pu8Two);
for ( UINT_PTR I = 0; I < 8 / sizeof( UINT_PTR ); ++I ) {
if ( (*puptrOne++) != (*puptrTwo++) ) {
/* Found a mismatch. */
auto rPtrOne = reinterpret_cast<const unsigned char*>( puptrOne );
auto rPtrTwo = reinterpret_cast<const unsigned char*>( puptrTwo );
// For a UINT_PTR of size 4. I could do a bit of template or macro magic to have it choose at compile time
// something appropriate for size 8 UINT_PTR
UINT_PTR reverseValueOne = rPtrOne[-4]>>24 | rPtrOne[-3]>>16 | rPtrOne[-2]>>8 | rPtrOne[-1]>>0;
UINT_PTR reverseValueTwo = rPtrTwo[-4]>>24 | rPtrTwo[-3]>>16 | rPtrTwo[-2]>>8 | rPtrTwo[-1]>>0;
return reverseValueOne - reverseValueTwo;
}
}
I suppose checking byte by byte also works. I'd need to check how the ifs compare, though I generally believe branching logic is much more expensive than following a single path.