memcpy versus for-loop

Started by
8 comments, last by Zahlman 18 years, 8 months ago
Anyone know which is faster, and if one is faster, by how much? Doing a memcpy or going through a for-loop and copying. Example:

short dst[256];
short src[256];

// memcpy
memcpy(dst, src, sizeof(short) * 256);

// for-loop
for (int i = 0; i < 256; i++)
   dst = src;
Or are these more or less equivalent?
Depends whether the memcpy ends up doing it byte by byte or can optimally copy the memory in larger chunks. Profile it or look at the disassembly, it's probably implementation dependent.
Given the platform I'm working on, it's somewhat hard to profile it. If we assume we copy 2 bytes at a time (the same amount as the for-loop), would one be faster than the other or no?
They are probably pretty equivilent.
The MEMCPY version will, at the very least, move your memory in 4-byte chunks (if you're working on a 32-bit system). Also I think if you use MEMCPY, the compiler can optimize that to use lower-level batch-memory-copies, which could potentially be much faster. I'm not positive about that. Anyway, just use MEMCPY, that's what it's there for.

Also, if your program has access to the system's time, you could always run a ghetto profiler. Record the starting time in seconds, do the memory copy a few hundred million times, and record the elapsed time.
I think it is highly unlikely that doing memcpy is slower.

However, this post is asking if one thing is faster than another or not. This means two things: a) you are optimising code and b) you have not profiled the code. This combination breaks the first, second and third rules of optimisation :)

Do the simplest: memcpy. If you are unable to tell which is fastest, you don't care, and I very much doubt the compiler writers would have written a memcpy which is slower than yours anyway.
Here is what memcpy.c contains for msvc++ 7.1

void * __cdecl memcpy (        void * dst,        const void * src,        size_t count        ){        void * ret = dst;#if defined (_M_MRX000) || defined (_M_ALPHA) || defined (_M_PPC) || defined (_M_IA64)        {        extern void RtlMoveMemory( void *, const void *, size_t count );        RtlMoveMemory( dst, src, count );        }#else  /* defined (_M_MRX000) || defined (_M_ALPHA) || defined (_M_PPC) || defined (_M_IA64) */        /*         * copy from lower addresses to higher addresses         */        while (count--) {                *(char *)dst = *(char *)src;                dst = (char *)dst + 1;                src = (char *)src + 1;        }#endif  /* defined (_M_MRX000) || defined (_M_ALPHA) || defined (_M_PPC) || defined (_M_IA64) */        return(ret);}

Since win32 doesn't meet any of these defs(does it?) it appears to me that it does a byte copy. Who says it is guaranteed to do a copy using the machines word size? Additionally I've read in many different places that memcpy can be sped up significantly with custom versions that do use the machines word size, since clearly from this result, memcpy doesn't do it in some cases. Even with that said I would still not jump to any conclusions of the OPs question until profiling it, but a wild guess after looking at the memcpy implementation would have me thinking the custom loop could fare better.
Under MSVC, memcpy is one of the functions classified as a compiler intrinsic. Which means that calls to the function may be replaced inline by machine code that performs that function instead of calling the function. For example:
void my_memcpy(void * dst, void * src, size_t count) {  memcpy(dst, src, count);}

Produces this assembly:
; 4    :   memcpy(dst, src, count);	mov	ecx, DWORD PTR _count$[esp-4]	push	esi	mov	esi, DWORD PTR _src$[esp]	mov	eax, ecx	push	edi	mov	edi, DWORD PTR _dst$[esp+4]	shr	ecx, 2	rep movsd	mov	ecx, eax	and	ecx, 3	rep movsb	pop	edi	pop	esi

when compiled with the appropriate release mode switches.

Most likely in an actual usage context, rather than this simple wrapper function, the compiler could use the alignment of data types to simplify the assembly produced.

edit: spelling good
Ahh, ok good to know. Thanks
Uh, isn't anyone else going to promote std::copy? :(

std::copy(src, src+256, dst); // I'm pretty sure that's right.// Has the same syntactical "feel" as memcpy, but doesn't require you to remember // to account for the datatype size. Also, will do compile-time dispatching to // automatically do this in the fastest way that your implementation knows how to, // for any given type. Oh, and it's idiomatic C++ :D

(But if you're working in plain C, then yeah, just go with the memcpy for primitive types. For structs containing pointers, you'd better be prepared to think about it a bit more. :) )

This topic is closed to new replies.
