Back to General and Gameplay Programming

Fast memory copying

General and Gameplay Programming Programming

Started by DvDmanDT March 25, 2006 03:00 PM

18 comments, last by Jan Wassenberg 18 years ago

DvDmanDT

1,951

Author

March 25, 2006 03:00 PM

Hello everyone.. I need to copy alot of memory fast.. Often, there'll be large segments that are equal on the dest and the source.. Say I have 1024 bytes in two buffers.. I need to copy from buffer1 to buffer2.. Sometimes there can be 128 bytes that are equal in the buffers.. So.. Should I 1. Just use memcpy() and copy everything all the time (how optimized is memcpy?) 2. Compare the buffers, say 64 bytes at a time, and if a change is detected, copy the remaining bytes of the chunk 3. Compare all the way and only write differences? We are talking several MBs at a time here, so it is important that it's done the fastest way possible.. On some systems, the readspeed is 3-5x faster than the write speed.. On my system, it's 3200mb vs 1300mb.. So, what do you suggest?

kuphryn

210

March 25, 2006 03:43 PM

Interesting.

Where does the source buffer get its data?

Kuphryn

Anonymous

March 25, 2006 03:47 PM

Any sort of comparison will slow you down.

DvDmanDT

1,951

Author

March 25, 2006 03:52 PM

It's generated by an independent module (sortof).. In other words, it's cpu generated, and it's only ram, no device IO involved.. It's not very predictable, and parts of it is generated from scripts and plugin dlls, which cannot interact with the copying code (for example, it can't say which areas are modified or the like)..

Evil Steve

2,021

March 25, 2006 03:52 PM

I agree with the AP. You'd probably be best just making one call to memcpy(). How many MB are we talking here? If it's less than 10 or 20, then the speed isn't going to be that bad.

owl

376

March 25, 2006 04:01 PM

Quote:Original post by Anonymous Poster
Any sort of comparison will slow you down.

He could pre-calculate some sort of checksum/modification_flag for the blocks of memory and use them for the comparision. If the data is more likely to reamain unchaged, or to be modified just a little, not copying several megabytes in vain could be worth to be taken in consideration.

[size="2"]I like the Walrus best.

Jan Wassenberg

1,000

March 25, 2006 04:08 PM

Quote:We are talking several MBs at a time here, so it is important that it's done the fastest way possible.. On some systems, the readspeed is 3-5x faster than the write speed.. On my system, it's 3200mb vs 1300mb..

How are you doing the copying?
Techniques outlined in technical report on speeding up memcpy might help.

E8 17 00 42 CE DC D2 DC E4 EA C4 40 CA DA C2 D8 CC 40 CA D0 E8 40E0 CA CA 96 5B B0 16 50 D7 D4 02 B2 02 86 E2 CD 21 58 48 79 F2 C3

DvDmanDT

1,951

Author

March 25, 2006 04:36 PM

Here's the deal.. It's a pool.. It's 20-64mb in size, and it has a 6.25% memory overhead.. The pool will be divided into 64bytes chunks, and one allocation can use one or more of those chunks.. There's no point in copying unused chunks.. Some of the used chunks might not change (the first 4mb or so will change very little for example, and many of those chunks will never change).. The rest of the chunks are less likely to be used (chunk #65537 will almost always be used, while #80000 will be way less likely to be used)..

Ideas?

DvDmanDT

1,951

Author

March 25, 2006 09:41 PM

Can anyone help me building the code from the article? I'm not really into ASM, so I don't understand any of that code..

I linker errors like this:
1>Memory.obj : error LNK2001: unresolved external symbol "void __cdecl ia32_asm_init(void)" (?ia32_asm_init@@YAXXZ)
1>Memory.obj : error LNK2001: unresolved external symbol "bool __cdecl ia32_cpuid(unsigned int,unsigned int *)" (?ia32_cpuid@@YA_NIPAI@Z)

Any ideas on how to fix?

I tried to just create a .asm file in my project, and a .h and a .cpp and add the relevant code to each of them, and then I added the custom build step for nasm.. What else to I have to do?

Nathan Baum

1,027

March 25, 2006 11:12 PM

Do you know memcpy() is a bottleneck? If not, use memcpy(). Anything else is premature optimisation.

Note that if I were a vaguely sane C library writer, and if only writing if the data that's different was faster than always writing, my implementation of memcpy() would do that.

Some attempts at logic

Let's say reading is precisely three times slower than writing.

To do the comparison, you need to read from the source and the destination. Let's say you can load the 128-bytes into some SIMD registers and you can compare the bytes in parallel with a single instruction. I don't know if that's true. I'm guessing that the time taken to do the comparison is vanishingly small in comparison to memory access time. I don't know if that's true. If the comparison reveals that the block has changed, then we must do the write operation.

If all the data has changed, and the above paragraph is a realistic description of the timing requirements, then the whole copy will take about 60% longer than just doing the write (because we've introduced an extra read operation that wasn't there before). If only half the data has changed, it's about 10% longer. You only win when less than a third of the data has changed.

Fast memory copying

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Fast memory copying

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines