Jump to content
  • Advertisement
Sign in to follow this  
Mantear

memcpy versus for-loop

This topic is 4889 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Anyone know which is faster, and if one is faster, by how much? Doing a memcpy or going through a for-loop and copying. Example:
short dst[256];
short src[256];

// memcpy
memcpy(dst, src, sizeof(short) * 256);

// for-loop
for (int i = 0; i < 256; i++)
   dst = src;
Or are these more or less equivalent?

Share this post


Link to post
Share on other sites
Advertisement
Depends whether the memcpy ends up doing it byte by byte or can optimally copy the memory in larger chunks. Profile it or look at the disassembly, it's probably implementation dependent.

Share this post


Link to post
Share on other sites
Given the platform I'm working on, it's somewhat hard to profile it. If we assume we copy 2 bytes at a time (the same amount as the for-loop), would one be faster than the other or no?

Share this post


Link to post
Share on other sites
The MEMCPY version will, at the very least, move your memory in 4-byte chunks (if you're working on a 32-bit system). Also I think if you use MEMCPY, the compiler can optimize that to use lower-level batch-memory-copies, which could potentially be much faster. I'm not positive about that. Anyway, just use MEMCPY, that's what it's there for.

Also, if your program has access to the system's time, you could always run a ghetto profiler. Record the starting time in seconds, do the memory copy a few hundred million times, and record the elapsed time.

Share this post


Link to post
Share on other sites
I think it is highly unlikely that doing memcpy is slower.

However, this post is asking if one thing is faster than another or not. This means two things: a) you are optimising code and b) you have not profiled the code. This combination breaks the first, second and third rules of optimisation :)

Do the simplest: memcpy. If you are unable to tell which is fastest, you don't care, and I very much doubt the compiler writers would have written a memcpy which is slower than yours anyway.

Share this post


Link to post
Share on other sites
Here is what memcpy.c contains for msvc++ 7.1


void * __cdecl memcpy (
void * dst,
const void * src,
size_t count
)
{
void * ret = dst;

#if defined (_M_MRX000) || defined (_M_ALPHA) || defined (_M_PPC) || defined (_M_IA64)
{
extern void RtlMoveMemory( void *, const void *, size_t count );

RtlMoveMemory( dst, src, count );
}
#else /* defined (_M_MRX000) || defined (_M_ALPHA) || defined (_M_PPC) || defined (_M_IA64) */
/*
* copy from lower addresses to higher addresses
*/

while (count--) {
*(char *)dst = *(char *)src;
dst = (char *)dst + 1;
src = (char *)src + 1;
}
#endif /* defined (_M_MRX000) || defined (_M_ALPHA) || defined (_M_PPC) || defined (_M_IA64) */

return(ret);
}







Since win32 doesn't meet any of these defs(does it?) it appears to me that it does a byte copy. Who says it is guaranteed to do a copy using the machines word size? Additionally I've read in many different places that memcpy can be sped up significantly with custom versions that do use the machines word size, since clearly from this result, memcpy doesn't do it in some cases. Even with that said I would still not jump to any conclusions of the OPs question until profiling it, but a wild guess after looking at the memcpy implementation would have me thinking the custom loop could fare better.

Share this post


Link to post
Share on other sites
Under MSVC, memcpy is one of the functions classified as a compiler intrinsic. Which means that calls to the function may be replaced inline by machine code that performs that function instead of calling the function. For example:

void my_memcpy(void * dst, void * src, size_t count) {
memcpy(dst, src, count);
}

Produces this assembly:

; 4 : memcpy(dst, src, count);

mov ecx, DWORD PTR _count$[esp-4]
push esi
mov esi, DWORD PTR _src$[esp]
mov eax, ecx
push edi
mov edi, DWORD PTR _dst$[esp+4]
shr ecx, 2
rep movsd
mov ecx, eax
and ecx, 3
rep movsb
pop edi
pop esi

when compiled with the appropriate release mode switches.

Most likely in an actual usage context, rather than this simple wrapper function, the compiler could use the alignment of data types to simplify the assembly produced.

edit: spelling good

Share this post


Link to post
Share on other sites
Uh, isn't anyone else going to promote std::copy? :(


std::copy(src, src+256, dst); // I'm pretty sure that's right.
// Has the same syntactical "feel" as memcpy, but doesn't require you to remember
// to account for the datatype size. Also, will do compile-time dispatching to
// automatically do this in the fastest way that your implementation knows how to,
// for any given type. Oh, and it's idiomatic C++ :D


(But if you're working in plain C, then yeah, just go with the memcpy for primitive types. For structs containing pointers, you'd better be prepared to think about it a bit more. :) )

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!