Back to General and Gameplay Programming

memcpy performance

FXO · 2002-02-25T02:01:37

Hey there! I heard somewhere that memcpy only copies one byte at a time. If that is the case, then it would be very slow. Does anybody know if this is true and if so, what can you use instead? Thanks /Fredrik Olsson

General and Gameplay Programming Programming

Started by FXO February 23, 2002 10:46 AM

18 comments, last by FXO 22 years, 1 month ago

LessBread

1,415

February 23, 2002 06:37 PM

In the crtdll code memcpy does proceed one byte at a time. It's still fast. Assembly language aside, I'm curious how a void pointer could be copied any other way?

            /*         * copy from lower addresses to higher addresses         */        while (count--) {                *(char *)dst = *(char *)src;                dst = (char *)dst + 1;                src = (char *)src + 1;        }

Edited by - lessbread on February 23, 2002 7:38:32 PM

"I thought what I'd do was, I'd pretend I was one of those deaf-mutes." - the Laughing Man

Gorg

248

February 23, 2002 11:31 PM

quote:Original post by fettodingo
Ah.. he ment like that. Some newbies tend to think it copies everything in one lump in some magic way, but I guess he didnt mean that.

That would actually be nice

Maybe with quantum computers...

dword2002

122

February 24, 2002 04:04 AM

quote:Original post by LessBread
In the crtdll code memcpy does proceed one byte at a time. It''s still fast. Assembly language aside, I''m curious how a void pointer could be copied any other way?

Maybe with a (DWORD) (unsigned long) casting ?

DworD

DworD

LessBread

1,415

February 24, 2002 04:32 AM

That would cause problems for 3 byte objects - eg. char trio[3] - and for objects with sizes not divisible by 4 - eg 6, 10, 13, 17 bytes (etc.)

This page, Creating Small Win32 Executables, has replacement functions for buffer manipulation functions (at the bottom). They all use bytes for the actual transfer.

For a generalized function, byte to byte might be the only way (and the optimizer the speediness).

"I thought what I'd do was, I'd pretend I was one of those deaf-mutes." - the Laughing Man

DrPizza

160

February 24, 2002 05:05 AM

This one (VS.NET) does byte copies until it hits a 4 byte boundary, then does 4 bytes at a time, until it hits the end, when it does bytes again.

There are two versions -- one unoptimized that does a simple byte-by-byte copy, and this optimized one in asm.

char a[99999],*p=a;int main(int c,char**V){char*v=c>0?1[V]:(char*)V;if(c>=0)for(;*v&&93!=*v;){62==*v&&++p||60==*v&&--p||43==*v&&++*p||45==*v&&--*p||44==*v&&(*p=getchar())||46==*v&&putchar(*p)||91==*v&&(*p&&main(0,(char**)(--v+2))||(v=(char*)main(-1,(char**)++v)-1));++v;}else for(c=1;c;c+=(91==*v)-(93==*v),++v);return(int)v;}  /*** drpizza@battleaxe.net ***/

LessBread

1,415

February 24, 2002 05:47 AM

Doh! It was looking me right in the face. Thanks.

"I thought what I'd do was, I'd pretend I was one of those deaf-mutes." - the Laughing Man

sQuid

149

February 24, 2002 07:33 PM

That would actually be nice Maybe with quantum computers

You can''t copy anything in a quantum computer - no cloning theorem

TerranFury

142

February 24, 2002 07:54 PM

IIRC, due to the caching on current Intel CPUs, sequential memory access and copying was essentially the same speed regardless of the size of the individual blocks transferred.

CGameProgrammer

640

February 24, 2002 07:59 PM

This is how you''d copy something four bytes at a time:

inline void memcpy32 ( LPVOID Dest, LPVOID Source, UINT Size ){    _asm    {        mov edi, Dest        mov esi, Source        mov ecx, Size        shr ecx, 2 ; Divide by four        cld        rep movsd    }}

~CGameProgrammer( );

~CGameProgrammer( );Developer Image Exchange -- New Features: Upload screenshots of your games (size is unlimited) and upload the game itself (up to 10MB). Free. No registration needed.

DrPizza

160

February 25, 2002 02:01 AM

quote:Original post by TerranFury
IIRC, due to the caching on current Intel CPUs, sequential memory access and copying was essentially the same speed regardless of the size of the individual blocks transferred.

Simple benchtesting suggests that this is very much not the case.

char a[99999],*p=a;int main(int c,char**V){char*v=c>0?1[V]:(char*)V;if(c>=0)for(;*v&&93!=*v;){62==*v&&++p||60==*v&&--p||43==*v&&++*p||45==*v&&--*p||44==*v&&(*p=getchar())||46==*v&&putchar(*p)||91==*v&&(*p&&main(0,(char**)(--v+2))||(v=(char*)main(-1,(char**)++v)-1));++v;}else for(c=1;c;c+=(91==*v)-(93==*v),++v);return(int)v;}  /*** drpizza@battleaxe.net ***/

memcpy performance

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

memcpy performance

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines