Jump to content
  • Advertisement
Sign in to follow this  
Jan Wassenberg

New Mini-Article: Speeding up memcpy

This topic is 4606 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Howdy. I've just written a Technical Report on speeding up memcpy. It presents source code and techniques behind an implementation that beats VC7.1's memcpy() by 7..300%, depending on transfer size. There are no special CPU requirement (runs on 'all' CPUs from the original Pentium MMX on) and it can easily be dropped into other projects. Hope it helps someone! Feedback is welcome. [Edited by - Jan Wassenberg on December 8, 2005 5:22:05 AM]

Share this post


Link to post
Share on other sites
Advertisement
The whole article is very interesting - a good read. The only negative point is that the code can't be used in anything else but a GPL'ed project.
Quote:

Source Code:
This is licensed under the GPL.

As a consequence, it will be harder to use. At least, I can't use it :S

But it is still a good work, and an impressive achievement ! [smile]

Share this post


Link to post
Share on other sites
Looks quite interesting - what CPU types has it been tested on? It'd be interesting to see what it would do on a Celeron-type with a smaller on-board cache...

_winterdyne_: Not to get off topic here, but can I ask (without knowing anything about your project) why do you need to clone assets rather than to re-use pointers to the same asset? (Just curious - )

Share this post


Link to post
Share on other sites
I'm developing a flexible MMO infrastructure - part of the mandate is to allow heavy asset reuse where possible but also to allow live editting. Portions of the descriptors for areas that are being editted are cloned before alteration if they're already in use elsewhere, and the new, altered asset compared against the old to generate a delta patch. Whether this will be done on the server or on a trusted superclient is still in the air.

The heavy asset reuse and checking should allow for download size to be minimised, and those delta packages can be sent to clients rather than distribute an entirely new asset. That's the plan anyway. :-)

Share this post


Link to post
Share on other sites
To add this to a C++ project, would I have to write a header decalring the function like this?

void* __declspec(naked) ia32_memcpy(void* dst, const void* src, size_t nbytes);

Share this post


Link to post
Share on other sites
Very (very) nice article, I can probably use that knowledge for lots of stuff. I never thought of speeding up memcpy, well I did once when reading Andre LaMothe's books, but then I realized lots of his optimization tricks were horrible (today at least) and from that point I never thought of implementing my own memcpy.

[Edited by - CTar on November 29, 2005 9:19:10 AM]

Share this post


Link to post
Share on other sites
Cool, it looks like you borrowed a bunch of tricks from an old AMD paper on increasing memcpy throughput:) Damn, where is that paper:)

Thanks for sharing.

Cheers
Chris

Share this post


Link to post
Share on other sites
Just a quick thought, but does memset() suffer the same failings as memcpy()?

An adapted version of this could help optimise initialisation of large data structures, rather than using a for.. loop on smaller element sizes.

Edit: Or worse, several for loops on differing element types.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!