fast memcpy

Started by
12 comments, last by jeffakew 23 years, 10 months ago
Hello Could anyone give me some help with improving the performance of the standard memcpy() function please. My game runs about 25% faster with out it. I know that memcpy works with bytes only but I dont know asm so I cant write a function to copy qwords so what can I do? Any help would be much appreciated thanks.
Advertisement
you can words at a time but sorry I dont know how
what OS are you programming for and what are you trying to copy memory to? If its in dos I can help
I don't if this helps but in C you could do this?

        void copy (long *from, long *to, int nlongs){  int blocks = nlongs / 100;  int remainder = nlongs % 100;  while (blocks--) {    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;    *to++ = *from++;  }     // do the rest   while (remainder--)    *to++ = *from++;}         


Edited by - bishop_pass on June 19, 2000 2:03:53 AM
_______________________________
"To understand the horse you'll find that you're going to be working on yourself. The horse will give you the answers and he will question you to see if you are sure or not."
- Ray Hunt, in Think Harmony With Horses
ALU - SHRDLU - WORDNET - CYC - SWALE - AM - CD - J.M. - K.S. | CAA - BCHA - AQHA - APHA - R.H. - T.D. | 395 - SPS - GORDIE - SCMA - R.M. - G.R. - V.C. - C.F.
Hi
Thanks alot, I''m at work right now but when I get home I''ll try that function. I''m programming for win32 and I''m trying to copy a system memory buffer to VRAM just by locking the VRAM. The pitch of the VRAM memory is the same as system memory. Surely after the VRAM is locked it is just the same as dos?(I mean the same as in to write too). Thanks again for your help.

Hi,

don''t use the function above.. it''s slow! He''s just copying a byte after another...

to copy fast (paste this stuff):

----- cut here ------

void mov2scr_32(unsigned char *source,unsigned char *dest,unsigned long count)
{
__asm
{
mov esi,source
mov edi,dest
mov ebx,count
mov edx,edi
and edx,11b
jz m2s_memaligned
mov ecx,4
sub ecx,edx
rep movsb
sub ebx,ecx

m2s_memaligned:
mov edx,ebx
and edx,11b
mov ecx,ebx
shr ecx,2
rep movsd
mov ecx,edx
rep movsb
}
}

------ cut here -----

just pass the memory pointers and the number of bytes to be copied to the function.. that''s all...
the function checks how many dwords to copy and how many bytes remain... so it uses fast dword copy if possible..
Hi,thanks thats great it''s just what I looking for, when I get in I''ll convert my code to use it and let you know what happens.Once again thanks alot.
My function is slow?

I ran some tests on both functions and copied more than 100 billion bytes in the tests to insure fairness.

My function was 6% faster on an AMD 350. It is also machine independent.
_______________________________
"To understand the horse you'll find that you're going to be working on yourself. The horse will give you the answers and he will question you to see if you are sure or not."
- Ray Hunt, in Think Harmony With Horses
ALU - SHRDLU - WORDNET - CYC - SWALE - AM - CD - J.M. - K.S. | CAA - BCHA - AQHA - APHA - R.H. - T.D. | 395 - SPS - GORDIE - SCMA - R.M. - G.R. - V.C. - C.F.
I should note that my function does not do the initial verifying that the assembly version does, but this could be added. My function takes a number of long words to copy. It does not copy bytes at a time.

Also, Jacen/SE, you should note that my function incurs loop maintenance only every 400 bytes. It appears yours does maintenance every 4 bytes.

The fastest function would be a hybrid of the 2.
_______________________________
"To understand the horse you'll find that you're going to be working on yourself. The horse will give you the answers and he will question you to see if you are sure or not."
- Ray Hunt, in Think Harmony With Horses
ALU - SHRDLU - WORDNET - CYC - SWALE - AM - CD - J.M. - K.S. | CAA - BCHA - AQHA - APHA - R.H. - T.D. | 395 - SPS - GORDIE - SCMA - R.M. - G.R. - V.C. - C.F.
memcpy only takes bytes AS A PARAMETER. That doesn''t mean it copies
them byte-by-byte internally. It most likely tries to move them
as fast as possible (i.e. DWORDs).
I know it only takes bytes as a parameter. But the point is:

My function was bashed as being slow!

It is in fact faster.
_______________________________
"To understand the horse you'll find that you're going to be working on yourself. The horse will give you the answers and he will question you to see if you are sure or not."
- Ray Hunt, in Think Harmony With Horses
ALU - SHRDLU - WORDNET - CYC - SWALE - AM - CD - J.M. - K.S. | CAA - BCHA - AQHA - APHA - R.H. - T.D. | 395 - SPS - GORDIE - SCMA - R.M. - G.R. - V.C. - C.F.

This topic is closed to new replies.

Advertisement