Fast memory copying

Started by
18 comments, last by Jan Wassenberg 18 years ago
Quote:Original post by owl
Quote:Original post by Anonymous Poster
Any sort of comparison will slow you down.


He could pre-calculate some sort of checksum/modification_flag for the blocks of memory and use them for the comparision. If the data is more likely to reamain unchaged, or to be modified just a little, not copying several megabytes in vain could be worth to be taken in consideration.


A clarification: When I said blocks of memory I was talking about partitioning the buffer in chunks (dunno, some size) and copying only those that were modified.
[size="2"]I like the Walrus best.
Advertisement
Quote:Original post by owl
Quote:Original post by owl
Quote:Original post by Anonymous Poster
Any sort of comparison will slow you down.


He could pre-calculate some sort of checksum/modification_flag for the blocks of memory and use them for the comparision. If the data is more likely to reamain unchaged, or to be modified just a little, not copying several megabytes in vain could be worth to be taken in consideration.


A clarification: When I said blocks of memory I was talking about partitioning the buffer in chunks (dunno, some size) and copying only those that were modified.

The problem with this is that a checksum can't be used to offer a guarantee that a chunk has not changed, only that it has. If not writing to memory was a performance problem, then this technique would be useful. Presumably, to be efficient you couldn't be using each byte of the chunk, so you'll have some bytes which don't contribute to the checksum anyway. Even if you do use each byte, the pigeonhole principle guarantees that N chunks have the same checksum, where N is 2(length of chunk in bits - length of checksum in bits).
Quote:Original post by Nathan Baum
Quote:Original post by owl
Quote:Original post by owl
Quote:Original post by Anonymous Poster
Any sort of comparison will slow you down.


He could pre-calculate some sort of checksum/modification_flag for the blocks of memory and use them for the comparision. If the data is more likely to reamain unchaged, or to be modified just a little, not copying several megabytes in vain could be worth to be taken in consideration.


A clarification: When I said blocks of memory I was talking about partitioning the buffer in chunks (dunno, some size) and copying only those that were modified.

The problem with this is that a checksum can't be used to offer a guarantee that a chunk has not changed, only that it has. If not writing to memory was a performance problem, then this technique would be useful. Presumably, to be efficient you couldn't be using each byte of the chunk, so you'll have some bytes which don't contribute to the checksum anyway. Even if you do use each byte, the pigeonhole
principle
guarantees that N chunks have the same checksum, where N is 2(length of chunk in bits - length of checksum in bits).


You could manage the access to the blocks of memory and keep track in an array of bools which ones where modified.

Note: I'm just brainstorming here :)
[size="2"]I like the Walrus best.
Yes, but DvDmanDT states that he doesn't have access to that kind of functionality; the function which does the modification "cannot interact with the copying code (for example, it can't say which areas are modified or the like)."
Quote:Original post by Nathan Baum
Yes, but DvDmanDT states that he doesn't have access to that kind of functionality; the function which does the modification "cannot interact with the copying code (for example, it can't say which areas are modified or the like)."


That's true, I didn't read that part.

Well, in that case, the only way to not reading_and_copying everyting all the time, is to read everything all the time an keep track of the changes. That would require a copy of the buffer in question and a process that trasverses the source buffer once in a while and copies only those parts that suffered a modification and at the same time updates the buffer used to look up for changes. Since it was stated that reading is faster than writing, reading everything all the time, gotta be faster than reading and writing everything all the time.

The only (maybe detrimental issue) is that you require to duplicate the memory needed by the data, you need to trasverse two buffers, and that the writing part (that could be very big) gotta be done twice.
[size="2"]I like the Walrus best.
Maybe you can use threading - ie one thread to check a 64kb block,
while another one is copying a 'dirty' one.. ?

Probably won't bring much on single-core, but it might help on
dual(/quad)-core


You'd have to use some locking mechanism though to protect reading
and writing the same block (but since you have lots of 64k blocks,
that shouldn't be so problematic I guess)
visit my website at www.kalmiya.com
This'll be pretty theoretical, but oh well. On your machine you can read 3200mb in the same time as you can write 1300mb right? So our relative times are:

read = 1.00
write = 2.46
So, assuming the compare is negligable (won't be your bottleneck)
non-checked copy = 3.46
checked copy = 2.00 - 4.46

But that's assuming a fairly naive copy. So it might be worth checking if generally less than half your data is modified. What I'd probly do is pick a chunk size then go through comparing. The moment you find something modified you just copy the rest of the chunk without checking. That'd reduce your worst case a bit. I dunno, just a thought. If this is important to you, try several things and get some good performance measurements on them.
___________________________________________________David OlsenIf I've helped you, please vote for PigeonGrape!
Quote:I tried to just create a .asm file in my project, and a .h and a .cpp and add the relevant code to each of them, and then I added the custom build step for nasm.. What else to I have to do?

That's all you ought to have to do.
These linker errors indicate the C++ compiler is mangling names, but asm code expects C linkage. Apparently in my project, the relevant declarations are wrapped in extern "C" {}. Please try changing all extern declarations in the C++ code/header portions to extern "C".
E8 17 00 42 CE DC D2 DC E4 EA C4 40 CA DA C2 D8 CC 40 CA D0 E8 40E0 CA CA 96 5B B0 16 50 D7 D4 02 B2 02 86 E2 CD 21 58 48 79 F2 C3
Yes, sorry.. When I looked at your code, I saw extern void funcname, so I thought it was in order.. Worked great once I changed that to extern "C" void funcname.. And it did speed up things quite a bit.. I don't have numbers, but previously a copy of 32mb was noticeable, now I can easily get away with 64mb..

I keep track of which chunks are used (it's a pool as I said).. I only need to copy the used ones.. It's a separate list of ints (one int/chunk), with one bit being set if it's used, and the rest is used for the size of the data in the chunk (accutually, how many chunks belong to this block).. I have to copy that list, but after that, I could walk through it and see what chunks I need to copy.. In a wostcase, I would loose quite some performance by doing that, but I could on the other hand get rid of lots of chunks.. Another thing I can do is keep track of the last chunk used.. That should be a fairly easy and fast thing to do..
Quote:Yes, sorry.. When I looked at your code, I saw extern void funcname, so I thought it was in order..Worked great once I changed that to extern "C" void funcname..

Indeed, that's what it should have been in the article. I've updated it; thanks!

Quote:And it did speed up things quite a bit.. I don't have numbers, but previously a copy of 32mb was noticeable, now I can easily get away with 64mb..

Excellent! BTW, be sure to call ia32_init - otherwise the codepaths that provide really big gains will remain disabled.
E8 17 00 42 CE DC D2 DC E4 EA C4 40 CA DA C2 D8 CC 40 CA D0 E8 40E0 CA CA 96 5B B0 16 50 D7 D4 02 B2 02 86 E2 CD 21 58 48 79 F2 C3

This topic is closed to new replies.

Advertisement