Archived

This topic is now archived and is closed to further replies.

Copying Double Words

This topic is 5651 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi, Just wondering if there''s there a function that copies memory in 32 bits? like memcpy works in bytes, i want one that works in double words. I used to use dosmemputl in djgpp...is there any equivalent? Thanks alot, --Helicon56

Share this post


Link to post
Share on other sites
VC++ and most other compilers'' memcpy() will move memory in 32bit chunks. The data that is not aligned at the beginning and end will be moved one byte at a time (max 3 bytes). And it''s faster than anything you''ll ever be able to write or find unless you use MMX, but that''s only marginally faster and a waste of registers. memcpy()''s written specifically for the Pentium''s pipeline. Not sure about djgpp''s memcpy though.

DO NOT USE MOVSD!!! It''s been slower than a loop since the 486.

Share this post


Link to post
Share on other sites
quote:
Original post by Helicon56
Hi,
Just wondering if there''s there a function that copies memory in 32 bits? like memcpy works in bytes, i want one that works in double words.


On Win32, memcpy is optimized to copy upto 3 bytes at the beginning and end of any memory block, and 4byte DWORDs in between. The only way to make it faster, is to garuantee that the data you want to copy is DWORD aligned, skip the check & rep move the DWORDs.
Rumors have floated around that you can make a faster memcpy using MMX, but I suspect it''s due to the alignment assumption/requirement.


quote:

you can also use the FPU (coprocessor) to copy double words in memory


Good lord, do you know what happens when you do that?!

Share this post


Link to post
Share on other sites
quote:
Original post by Magmai Kai Holmlor
On Win32, memcpy is optimized to copy upto 3 bytes at the beginning and end of any memory block, and 4byte DWORDs in between. The only way to make it faster, is to garuantee that the data you want to copy is DWORD aligned, skip the check & rep move the DWORDs.
Rumors have floated around that you can make a faster memcpy using MMX, but I suspect it''s due to the alignment assumption/requirement.



It''s no rumor. I actually coded one up, but it''s only slightly faster (for large amounts of data only because the setup kills it). It''s not worth the effort because the speed gain is negligible. The 7 bytes before and after alignment still have to be copied the normal way.

And IndicectX, I hope you''re not suggesting that REP MOVSD is the same as the memcpy() function. memcpy() will run circles around REP MOVSD any day.

Share this post


Link to post
Share on other sites
quote:
Original post by Vorlath
And IndicectX, I hope you're not suggesting that REP MOVSD is the same as the memcpy() function. memcpy() will run circles around REP MOVSD any day.


Care to explain how?

What does this line from memcpy do:

rep movsd ;N - move all of our dwords


And in speed-optimized release build,

82: char x[100];
83: char y[100];
84: memcpy(x, y, 100);
00401060 mov ecx,19h
00401065 lea esi,[y]
00401068 lea edi,[x]
0040106E rep movs dword ptr [edi],dword ptr [esi]


I just thought it funny that size-optimized release build produced the following code:

82: char x[100];
83: char y[100];
84: memcpy(x, y, 100);
0040104D push 64h
0040104F lea eax,[y]
00401052 push eax
00401053 lea eax,[x]
00401059 push eax
0040105A call _memcpy (004010a0)
0040105F add esp,0Ch

I don't see how this is size optimized. With #pragma intrinsic, I get rep movs back.

[edited by - IndirectX on June 27, 2002 2:31:10 AM]

Share this post


Link to post
Share on other sites