Copying Double Words

Started by
6 comments, last by Helicon56 21 years, 10 months ago
Hi, Just wondering if there''s there a function that copies memory in 32 bits? like memcpy works in bytes, i want one that works in double words. I used to use dosmemputl in djgpp...is there any equivalent? Thanks alot, --Helicon56
Advertisement
Lookup MOVSD (its an x86 assembler instruction), its the same as MOVSB, but moves data in 32bit chunks rather than 8bit chunks.
you can also use the FPU (coprocessor) to copy double words in memory
VC++ and most other compilers'' memcpy() will move memory in 32bit chunks. The data that is not aligned at the beginning and end will be moved one byte at a time (max 3 bytes). And it''s faster than anything you''ll ever be able to write or find unless you use MMX, but that''s only marginally faster and a waste of registers. memcpy()''s written specifically for the Pentium''s pipeline. Not sure about djgpp''s memcpy though.

DO NOT USE MOVSD!!! It''s been slower than a loop since the 486.
Additionally, MSVC can inline memcpy calls (replace them with assembly code), but it''s the same thing as doing REP MOVSD anyway.
---visit #directxdev on afternet <- not just for directx, despite the name
quote:Original post by Helicon56
Hi,
Just wondering if there''s there a function that copies memory in 32 bits? like memcpy works in bytes, i want one that works in double words.

On Win32, memcpy is optimized to copy upto 3 bytes at the beginning and end of any memory block, and 4byte DWORDs in between. The only way to make it faster, is to garuantee that the data you want to copy is DWORD aligned, skip the check & rep move the DWORDs.
Rumors have floated around that you can make a faster memcpy using MMX, but I suspect it''s due to the alignment assumption/requirement.


quote:
you can also use the FPU (coprocessor) to copy double words in memory

Good lord, do you know what happens when you do that?!
- The trade-off between price and quality does not exist in Japan. Rather, the idea that high quality brings on cost reduction is widely accepted.-- Tajima & Matsubara
quote:Original post by Magmai Kai Holmlor
On Win32, memcpy is optimized to copy upto 3 bytes at the beginning and end of any memory block, and 4byte DWORDs in between. The only way to make it faster, is to garuantee that the data you want to copy is DWORD aligned, skip the check & rep move the DWORDs.
Rumors have floated around that you can make a faster memcpy using MMX, but I suspect it''s due to the alignment assumption/requirement.


It''s no rumor. I actually coded one up, but it''s only slightly faster (for large amounts of data only because the setup kills it). It''s not worth the effort because the speed gain is negligible. The 7 bytes before and after alignment still have to be copied the normal way.

And IndicectX, I hope you''re not suggesting that REP MOVSD is the same as the memcpy() function. memcpy() will run circles around REP MOVSD any day.
quote:Original post by Vorlath
And IndicectX, I hope you're not suggesting that REP MOVSD is the same as the memcpy() function. memcpy() will run circles around REP MOVSD any day.

Care to explain how?

What does this line from memcpy do:
        rep     movsd           ;N - move all of our dwords  


And in speed-optimized release build,
82:           char x[100];83:           char y[100];84:           memcpy(x, y, 100);00401060   mov         ecx,19h00401065   lea         esi,[y]00401068   lea         edi,[x]0040106E   rep movs    dword ptr [edi],dword ptr [esi]  


I just thought it funny that size-optimized release build produced the following code:
82:           char x[100];83:           char y[100];84:           memcpy(x, y, 100);0040104D   push        64h0040104F   lea         eax,[y]00401052   push        eax00401053   lea         eax,[x]00401059   push        eax0040105A   call        _memcpy (004010a0)0040105F   add         esp,0Ch 

I don't see how this is size optimized. With #pragma intrinsic, I get rep movs back.

[edited by - IndirectX on June 27, 2002 2:31:10 AM]
---visit #directxdev on afternet <- not just for directx, despite the name

This topic is closed to new replies.

Advertisement