Back to General and Gameplay Programming

New Mini-Article: Speeding up memcpy

General and Gameplay Programming Programming

Started by Jan Wassenberg November 29, 2005 06:56 AM

44 comments, last by Jan Wassenberg 18 years, 4 months ago

Jan Wassenberg

1,000

Author

November 29, 2005 06:56 AM

Howdy. I've just written a Technical Report on speeding up memcpy. It presents source code and techniques behind an implementation that beats VC7.1's memcpy() by 7..300%, depending on transfer size. There are no special CPU requirement (runs on 'all' CPUs from the original Pentium MMX on) and it can easily be dropped into other projects. Hope it helps someone! Feedback is welcome. [Edited by - Jan Wassenberg on December 8, 2005 5:22:05 AM]

E8 17 00 42 CE DC D2 DC E4 EA C4 40 CA DA C2 D8 CC 40 CA D0 E8 40E0 CA CA 96 5B B0 16 50 D7 D4 02 B2 02 86 E2 CD 21 58 48 79 F2 C3

_winterdyne_

530

November 29, 2005 07:16 AM

That is sweeeeeet.

I rely heavily on memcpy for cloning assets around so this will give a real performance boost in my project. Many thanks!

Winterdyne Solutions Ltd is recruiting - this thread for details!

Emmanuel Deloget

1,382

November 29, 2005 07:45 AM

The whole article is very interesting - a good read. The only negative point is that the code can't be used in anything else but a GPL'ed project.

Quote:
Source Code:
This is licensed under the GPL.

As a consequence, it will be harder to use. At least, I can't use it :S

But it is still a good work, and an impressive achievement ! [smile]

-- Emmanuel D. [blog, in French] [blog, very bad googlized translation]

DJHoy

378

November 29, 2005 07:47 AM

Looks quite interesting - what CPU types has it been tested on? It'd be interesting to see what it would do on a Celeron-type with a smaller on-board cache...

_winterdyne_: Not to get off topic here, but can I ask (without knowing anything about your project) why do you need to clone assets rather than to re-use pointers to the same asset? (Just curious - )

_winterdyne_

530

November 29, 2005 08:09 AM

I'm developing a flexible MMO infrastructure - part of the mandate is to allow heavy asset reuse where possible but also to allow live editting. Portions of the descriptors for areas that are being editted are cloned before alteration if they're already in use elsewhere, and the new, altered asset compared against the old to generate a delta patch. Whether this will be done on the server or on a trusted superclient is still in the air.

The heavy asset reuse and checking should allow for download size to be minimised, and those delta packages can be sent to clients rather than distribute an entirely new asset. That's the plan anyway. :-)

Winterdyne Solutions Ltd is recruiting - this thread for details!

Genjix

100

November 29, 2005 08:10 AM

this looks like really nice work.

Nitage

1,107

November 29, 2005 08:40 AM

To add this to a C++ project, would I have to write a header decalring the function like this?

void* __declspec(naked) ia32_memcpy(void* dst, const void* src, size_t nbytes);

CTar

1,134

November 29, 2005 09:19 AM

Very (very) nice article, I can probably use that knowledge for lots of stuff. I never thought of speeding up memcpy, well I did once when reading Andre LaMothe's books, but then I realized lots of his optimization tricks were horrible (today at least) and from that point I never thought of implementing my own memcpy.

[Edited by - CTar on November 29, 2005 9:19:10 AM]

chollida1

532

November 29, 2005 09:24 AM

Cool, it looks like you borrowed a bunch of tricks from an old AMD paper on increasing memcpy throughput:) Damn, where is that paper:)

Thanks for sharing.

Cheers
Chris

CheersChris

_winterdyne_

530

November 29, 2005 10:26 AM

Just a quick thought, but does memset() suffer the same failings as memcpy()?

An adapted version of this could help optimise initialisation of large data structures, rather than using a for.. loop on smaller element sizes.

Edit: Or worse, several for loops on differing element types.

Winterdyne Solutions Ltd is recruiting - this thread for details!

New Mini-Article: Speeding up memcpy

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

New Mini-Article: Speeding up memcpy

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines