• Advertisement

Archived

This topic is now archived and is closed to further replies.

memcpy performance

This topic is 5839 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hey there! I heard somewhere that memcpy only copies one byte at a time. If that is the case, then it would be very slow. Does anybody know if this is true and if so, what can you use instead? Thanks /Fredrik Olsson

Share this post


Link to post
Share on other sites
Advertisement
not true.

To the vast majority of mankind, nothing is more agreeable than to escape the need for mental exertion... To most people, nothing is more troublesome than the effort of thinking.

Share this post


Link to post
Share on other sites
Ah.. he ment like that. Some newbies tend to think it copies everything in one lump in some magic way, but I guess he didnt mean that.

Share this post


Link to post
Share on other sites
The answer is yes, no, and maybe.

It all depends on optimization level, compiler used and platform. Any compiler is free to implement memcpy() however they want as long as it works according to the standard definition.

Share this post


Link to post
Share on other sites
Ive found memcpy and memset to be bloody fast, so dont worry about it.

-----------------------------
Damnit Dave, I would have a link to Graphics Wars here, but you haven''t put it up on GDNet yet!

The sad thing about artificial intelligence is that it lacks artifice and therefore intelligence.

Democracy is where you say what you want and do what you''re told.

Share this post


Link to post
Share on other sites
Sorry for the bad explanation.

I meant in MSVC6.0 with full optimization on.

And the reason why I asked is because I read a post on the OpenGL forum where sombody said something like:
"ill never use memcpy again, it only copies one byte at a time"

And if that would be the case, the compiler/function would not take full advantage of the memory bandwidth.

Thanks
/Fredrik Olsson

Share this post


Link to post
Share on other sites
It varies, but a good compiler should be able to unroll the loop and copy by registers if the size is small enough. Worst case is byte copy and the slighty better one is the "rep movsd" style, copying four bytes at a time.

Share this post


Link to post
Share on other sites
In the crtdll code memcpy does proceed one byte at a time. It's still fast. Assembly language aside, I'm curious how a void pointer could be copied any other way?

    
/*
* copy from lower addresses to higher addresses
*/

while (count--) {
*(char *)dst = *(char *)src;
dst = (char *)dst + 1;
src = (char *)src + 1;
}



Edited by - lessbread on February 23, 2002 7:38:32 PM

Share this post


Link to post
Share on other sites
quote:
Original post by fettodingo
Ah.. he ment like that. Some newbies tend to think it copies everything in one lump in some magic way, but I guess he didnt mean that.


That would actually be nice Maybe with quantum computers...

Share this post


Link to post
Share on other sites
quote:
Original post by LessBread
In the crtdll code memcpy does proceed one byte at a time. It''s still fast. Assembly language aside, I''m curious how a void pointer could be copied any other way?



Maybe with a (DWORD) (unsigned long) casting ?



DworD

Share this post


Link to post
Share on other sites
That would cause problems for 3 byte objects - eg. char trio[3] - and for objects with sizes not divisible by 4 - eg 6, 10, 13, 17 bytes (etc.)

This page, Creating Small Win32 Executables, has replacement functions for buffer manipulation functions (at the bottom). They all use bytes for the actual transfer.

For a generalized function, byte to byte might be the only way (and the optimizer the speediness).

Share this post


Link to post
Share on other sites
This one (VS.NET) does byte copies until it hits a 4 byte boundary, then does 4 bytes at a time, until it hits the end, when it does bytes again.

There are two versions -- one unoptimized that does a simple byte-by-byte copy, and this optimized one in asm.

Share this post


Link to post
Share on other sites
That would actually be nice Maybe with quantum computers

You can''t copy anything in a quantum computer - no cloning theorem

Share this post


Link to post
Share on other sites
IIRC, due to the caching on current Intel CPUs, sequential memory access and copying was essentially the same speed regardless of the size of the individual blocks transferred.

Share this post


Link to post
Share on other sites
This is how you''d copy something four bytes at a time:

inline void memcpy32 ( LPVOID Dest, LPVOID Source, UINT Size )
{
_asm
{
mov edi, Dest
mov esi, Source
mov ecx, Size
shr ecx, 2 ; Divide by four
cld
rep movsd
}
}


~CGameProgrammer( );

Share this post


Link to post
Share on other sites
quote:
Original post by TerranFury
IIRC, due to the caching on current Intel CPUs, sequential memory access and copying was essentially the same speed regardless of the size of the individual blocks transferred.

Simple benchtesting suggests that this is very much not the case.

Share this post


Link to post
Share on other sites

  • Advertisement