Archived

This topic is now archived and is closed to further replies.

How fast is a memcpy() ?

This topic is 6912 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

Recommended Posts

I am considering a system of storing client movements that will require copying a hunk of memory. In general it will be about a meg in size. If I have to do that 10 times a second sould I be concerned about it''s speed or is a memcpy of a meg fast enough for me not to worry too much? Many thanks

Share on other sites
I believe that most x86 implementations of memcpy() take about 3 clock cycles per dword moved. But really, that''s something you can find out by profiling a sample program.

Share on other sites
I believe memcpy is fast enough for that operation 10x per sec if that''s all you''re doing. It''s relatively fast but people claim to have written even faster versions in assembly.

Share on other sites
Thanks guys... Your right of course, I SHOULD have profiled it myself... I just new though it would be easier to ask than to spend an afternoon creating an example that makes a giant array then fills it then moves it... (I''m not the fastest coder around)

thanks again...

Share on other sites
A reply to the "faster version in assembly", for certain specialised tasks, you could probably speed it up. But in any recent, half-decent compiler, memcpy will utilise the system to it''s maximum performance and you wouldn''t be able to go faster.

#pragma DWIM // Do What I Mean!
**I use Software Mode**

Share on other sites
If you go look at the Quake source code you''ll see that Id had their own version of memcpy(). I can''t find it myself right now, but if I remember correctly, it did the copy itself or called the C memcpy(). It did it itself when certain conditions were met (which I think was aligment of the data and/or the size of the data being even number of words). The point being that you could use a general purpose byte by byte copy (memcpy()) or you could copy words at a time.

My guess is that now a days this type of optimization is being done by the compiler, but you never know unless you go look at an assembler listing generated by a properly formulated piece of test code.

An exercise I will leave for the reader.

Mike Roberts
aka milo
mlbobs@telocity.com

Share on other sites
quote:
Original post by milo

If you go look at the Quake source code you''ll see that Id had their own version of memcpy

I seem to recall Abrash saying that the implementation they were using only copied a byte at a time, so it was necessary to rewrite the function for speed.

Of course, it''s been a while since I read that, so don''t quote me on it.

Share on other sites
Well let''s see if I can do this code thing correctly. This is from COMMON.C, part of the Quake source code. It is defined out, but I don''t know if that is how it was when this code was new (a long, long time ago ). As you see it does either a byte by byte copy (which is what memcpy() would of been doing at this time (in some compilers at least)) or a int by int copy. So who''s going to go check what VC++ does?

void Q_memcpy (void *dest, void *src, int count){  int i;  if((((long)dest / (long)src / count) & 3) == 0 )  {    count>>=2;    for(i=0; i    {      ((int *)dest) = ((int *)src)[i];    }  }  else  {    for(i=0; i    {      ((byte *)dest)[i] = ((byte *)src)[i];    }  }}

Mike Roberts
aka milo
mlbobs@telocity.com

Share on other sites
// iSize must also be "DOUBLE DWORD ALIGNED"
// --> (iSize + 3) & 0xFFFFFFF8
// --> or in asm --> __asm and ecx, 0xFFFFFFF8

__asm mov ecx, iSize
__asm mov esi, pSource
__asm mov edi, pDest

mainloop:
__asm mov eax, [esi+ecx-4]
__asm mov ebx, [esi+ecx-8]
__asm mov [edi+ecx-4], eax
__asm mov [edi+ecx-8], ebx
__asm sub ecx, 8
__asm jnc SHORT mainloop

Please correct me if I'm wrong! (test can be faster if I'm not mistaken?)

Takes about 4 cycles per 8 byte copy.

Edited by - BasKuenen on May 16, 2000 7:25:43 PM

Edited by - baskuenen on May 16, 2000 7:28:58 PM

Share on other sites
I think it depends on the implementation. Under Visual C++ 6, I found the Win32 API''s CopyMemory function to be nearly twice as fast as memcpy.

• 10
• 13
• 57
• 11
• 84