Jump to content
  • Advertisement
Sign in to follow this  
DvDmanDT

Fast memory copying

This topic is 4471 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hello everyone.. I need to copy alot of memory fast.. Often, there'll be large segments that are equal on the dest and the source.. Say I have 1024 bytes in two buffers.. I need to copy from buffer1 to buffer2.. Sometimes there can be 128 bytes that are equal in the buffers.. So.. Should I 1. Just use memcpy() and copy everything all the time (how optimized is memcpy?) 2. Compare the buffers, say 64 bytes at a time, and if a change is detected, copy the remaining bytes of the chunk 3. Compare all the way and only write differences? We are talking several MBs at a time here, so it is important that it's done the fastest way possible.. On some systems, the readspeed is 3-5x faster than the write speed.. On my system, it's 3200mb vs 1300mb.. So, what do you suggest?

Share this post


Link to post
Share on other sites
Advertisement
Guest Anonymous Poster
Any sort of comparison will slow you down.

Share this post


Link to post
Share on other sites
It's generated by an independent module (sortof).. In other words, it's cpu generated, and it's only ram, no device IO involved.. It's not very predictable, and parts of it is generated from scripts and plugin dlls, which cannot interact with the copying code (for example, it can't say which areas are modified or the like)..

Share this post


Link to post
Share on other sites
I agree with the AP. You'd probably be best just making one call to memcpy(). How many MB are we talking here? If it's less than 10 or 20, then the speed isn't going to be that bad.

Share this post


Link to post
Share on other sites
Quote:
Original post by Anonymous Poster
Any sort of comparison will slow you down.


He could pre-calculate some sort of checksum/modification_flag for the blocks of memory and use them for the comparision. If the data is more likely to reamain unchaged, or to be modified just a little, not copying several megabytes in vain could be worth to be taken in consideration.

Share this post


Link to post
Share on other sites
Here's the deal.. It's a pool.. It's 20-64mb in size, and it has a 6.25% memory overhead.. The pool will be divided into 64bytes chunks, and one allocation can use one or more of those chunks.. There's no point in copying unused chunks.. Some of the used chunks might not change (the first 4mb or so will change very little for example, and many of those chunks will never change).. The rest of the chunks are less likely to be used (chunk #65537 will almost always be used, while #80000 will be way less likely to be used)..

Ideas?

Share this post


Link to post
Share on other sites
Can anyone help me building the code from the article? I'm not really into ASM, so I don't understand any of that code..

I linker errors like this:
1>Memory.obj : error LNK2001: unresolved external symbol "void __cdecl ia32_asm_init(void)" (?ia32_asm_init@@YAXXZ)
1>Memory.obj : error LNK2001: unresolved external symbol "bool __cdecl ia32_cpuid(unsigned int,unsigned int *)" (?ia32_cpuid@@YA_NIPAI@Z)


Any ideas on how to fix?

I tried to just create a .asm file in my project, and a .h and a .cpp and add the relevant code to each of them, and then I added the custom build step for nasm.. What else to I have to do?

Share this post


Link to post
Share on other sites
Do you know memcpy() is a bottleneck? If not, use memcpy(). Anything else is premature optimisation.

Note that if I were a vaguely sane C library writer, and if only writing if the data that's different was faster than always writing, my implementation of memcpy() would do that.

Some attempts at logic

Let's say reading is precisely three times slower than writing.

To do the comparison, you need to read from the source and the destination. Let's say you can load the 128-bytes into some SIMD registers and you can compare the bytes in parallel with a single instruction. I don't know if that's true. I'm guessing that the time taken to do the comparison is vanishingly small in comparison to memory access time. I don't know if that's true. If the comparison reveals that the block has changed, then we must do the write operation.

If all the data has changed, and the above paragraph is a realistic description of the timing requirements, then the whole copy will take about 60% longer than just doing the write (because we've introduced an extra read operation that wasn't there before). If only half the data has changed, it's about 10% longer. You only win when less than a third of the data has changed.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

Participate in the game development conversation and more when you create an account on GameDev.net!

Sign me up!