memcpy performance

Started by
18 comments, last by FXO 22 years, 1 month ago
In the crtdll code memcpy does proceed one byte at a time. It's still fast. Assembly language aside, I'm curious how a void pointer could be copied any other way?

            /*         * copy from lower addresses to higher addresses         */        while (count--) {                *(char *)dst = *(char *)src;                dst = (char *)dst + 1;                src = (char *)src + 1;        }    



Edited by - lessbread on February 23, 2002 7:38:32 PM
"I thought what I'd do was, I'd pretend I was one of those deaf-mutes." - the Laughing Man
Advertisement
quote:Original post by fettodingo
Ah.. he ment like that. Some newbies tend to think it copies everything in one lump in some magic way, but I guess he didnt mean that.


That would actually be nice Maybe with quantum computers...
quote:Original post by LessBread
In the crtdll code memcpy does proceed one byte at a time. It''s still fast. Assembly language aside, I''m curious how a void pointer could be copied any other way?


Maybe with a (DWORD) (unsigned long) casting ?



DworD
DworD
That would cause problems for 3 byte objects - eg. char trio[3] - and for objects with sizes not divisible by 4 - eg 6, 10, 13, 17 bytes (etc.)

This page, Creating Small Win32 Executables, has replacement functions for buffer manipulation functions (at the bottom). They all use bytes for the actual transfer.

For a generalized function, byte to byte might be the only way (and the optimizer the speediness).
"I thought what I'd do was, I'd pretend I was one of those deaf-mutes." - the Laughing Man
This one (VS.NET) does byte copies until it hits a 4 byte boundary, then does 4 bytes at a time, until it hits the end, when it does bytes again.

There are two versions -- one unoptimized that does a simple byte-by-byte copy, and this optimized one in asm.
char a[99999],*p=a;int main(int c,char**V){char*v=c>0?1[V]:(char*)V;if(c>=0)for(;*v&&93!=*v;){62==*v&&++p||60==*v&&--p||43==*v&&++*p||45==*v&&--*p||44==*v&&(*p=getchar())||46==*v&&putchar(*p)||91==*v&&(*p&&main(0,(char**)(--v+2))||(v=(char*)main(-1,(char**)++v)-1));++v;}else for(c=1;c;c+=(91==*v)-(93==*v),++v);return(int)v;}  /*** drpizza@battleaxe.net ***/
Doh! It was looking me right in the face. Thanks.
"I thought what I'd do was, I'd pretend I was one of those deaf-mutes." - the Laughing Man
That would actually be nice Maybe with quantum computers

You can''t copy anything in a quantum computer - no cloning theorem
IIRC, due to the caching on current Intel CPUs, sequential memory access and copying was essentially the same speed regardless of the size of the individual blocks transferred.
This is how you''d copy something four bytes at a time:
inline void memcpy32 ( LPVOID Dest, LPVOID Source, UINT Size ){    _asm    {        mov edi, Dest        mov esi, Source        mov ecx, Size        shr ecx, 2 ; Divide by four        cld        rep movsd    }} 


~CGameProgrammer( );

~CGameProgrammer( );Developer Image Exchange -- New Features: Upload screenshots of your games (size is unlimited) and upload the game itself (up to 10MB). Free. No registration needed.
quote:Original post by TerranFury
IIRC, due to the caching on current Intel CPUs, sequential memory access and copying was essentially the same speed regardless of the size of the individual blocks transferred.

Simple benchtesting suggests that this is very much not the case.
char a[99999],*p=a;int main(int c,char**V){char*v=c>0?1[V]:(char*)V;if(c>=0)for(;*v&&93!=*v;){62==*v&&++p||60==*v&&--p||43==*v&&++*p||45==*v&&--*p||44==*v&&(*p=getchar())||46==*v&&putchar(*p)||91==*v&&(*p&&main(0,(char**)(--v+2))||(v=(char*)main(-1,(char**)++v)-1));++v;}else for(c=1;c;c+=(91==*v)-(93==*v),++v);return(int)v;}  /*** drpizza@battleaxe.net ***/

This topic is closed to new replies.

Advertisement