Back to General and Gameplay Programming

Why std::copy is faster than std::memcpy ?

General and Gameplay Programming Programming

Started by Alundra September 12, 2015 12:32 AM

11 comments, last by RobTheBloke 8 years, 7 months ago

Alundra

2,325

Author

September 12, 2015 12:32 AM

Hi,

Why std::copy is faster than std::memcpy ?

Possible implementation of std::copy :


template<class InputIt, class OutputIt>
OutputIt copy(InputIt first, InputIt last, OutputIt d_first)
{
  while (first != last) {
    *d_first++ = *first++;
  }
  return d_first;
}

Thanks

Advertisement

7,836

September 12, 2015 12:43 AM

First things first: That's not how most std::copy implementations work.

If you actually look at it you will see a lot of them use compile time type deduction to optimize the amount of work they need to do. Furthermore, as they're templates and fairly small, they have a high chance of being inlined. Lastly, due to the fact that the compiler has exact information about the size of the objects it can explicitly optimize the copy for your type. Lastly, a lot of times, if copy can deduce the point is a pointer, then it can simply delegate to memmove.

memcpy tends to be located in the msvcrt dll, and as such will typically not be inlined (LTCG can do this though). It also is quite a bit more complex as it does not actually know what you're going to be passing into it at compile time, and thus has to do more work to efficiently move things around.

All of that being said, in general the two should perform roughly the same.

In time the project grows, the ignorance of its devs it shows, with many a convoluted function, it plunges into deep compunction, the price of failure is high, Washu's mirth is nigh.

46,221

September 12, 2015 01:20 AM

That's not how most std::copy implementations work. If you actually look at it you will see...

That is actually a fun exercise.

Seriously, open up your implementation's version of std::copy. Find all the variations, since there are likely several with subtle differences in types.

Then look over how the different forms of copy's template parameter types are themselves template types with their own subtle variations and internal implementations. So a small number of std::copy() templates can be implemented in a large number of implementation details. Some implementation variations for random access iterators, for pointers to scalars, for forward iterators, for input iterators, for arbitrary other iterators, and so on.

SeanMiddleditch

17,596

September 12, 2015 07:26 AM

Note that memcpy is (usually) forwarded to a compiler intrinsic (as in https://msdn.microsoft.com/en-us/library/tzkfha43.aspx and https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html).

Also see https://stackoverflow.com/questions/4707012/c-memcpy-vs-stdcopy, especially the "explanation of results" part. Fancy overloads of std::copy aren't (generally) relevant to the case where std::copy is super efficient.

The relevant bit is that the optimizer can see into the instantiation of templates and apply type-aware optimizations that C implementations of memcpy (and apparently most intrinsic versions) just can't.

This is akin to the same reason that C++ std::sort is faster than C's qsort, even if they implement the same sort algorithm internally. Or why C++ hashing functions can be faster than C versions. etc. Assuming you're okay with the increased compile times, C++ enables the compiler to work magic in a way that C doesn't.

Sean Middleditch – Game Systems Engineer – Join my team!

10,209

September 12, 2015 08:09 AM

I wouldn't worry about differences in speed between two copy methods, but instead try to avoid having to copy things.

Seriously, if such speed differences make a significant difference, there is either something very wrong on the design, or you're working at the very edge of what the application or the system can handle, which means that if you make things a tad bigger, it dies anyway.

samoth

9,833

September 12, 2015 09:04 AM

This is akin to the same reason that C++ std::sort is faster than C's qsort

Well, no.

std::sort is first and foremost faster because qsort is not only a non-inlineable library function, but one that that calls back a user-supplied function (which needs to cast from void* and do whatever is needed as comparison, and for which the compiler cannot assume strict aliasing rules). That callback cannot possibly be inlined, nor can the compiler optimize across it. So assuming an pretty good sorting algorithm that needs exactly N comparisons, you already have added N non-inlined function calls.

Now of course std::sort has a comparison functor, too. So technically you have just as many function callbacks. But these can in practically every case be inlined, and the compiler is able to further optimize the whole "unit" of sort+functor, since it can see all the source.

Also, the comparison for qsort returns -1, 0, or 1 depending on the result whereas comparators for std::sort return bool. This lends to a much simpler logic for std::sort (of course, on many architectures, the more complex logic can be optimized into one compare and 3 flag-dependent conditional jumps on the library side, but that is not guaranteed, and the added complexity needed to produce a tri-state at the user side remains).

SeanMiddleditch

17,596

September 12, 2015 05:50 PM

Well, no.

std::sort is first and foremost faster because qsort is not only a non-inlineable library function, but one that that calls back a user-supplied function

Well, yes.

The function being inline-able into the algorithm is a consequence of how "the optimizer can see into the instantiation of templates" as I said.

Sean Middleditch – Game Systems Engineer – Join my team!

JohnnyCode

1,084

September 12, 2015 05:58 PM

Why std::copy is faster than std::memcpy ?

Could you post your profiling scenario that made you observe this? We might speculate further then.

Alundra

2,325

Author

September 13, 2015 12:51 AM

I read that from Stack Overflow but I did the test by myself memcpy vs copy to copy 1000000 times a matrix identity 4x4 but the time difference is there :


memcpy = 0ms
copy = 9ms

The test was made in release mode.

23,138

September 13, 2015 01:49 AM

Those numbers show memcpy being faster, not the other way around. They also are strongly suggestive of basic mistakes in setting up a microbenchmark.

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

This topic is closed to new replies.

Advertisement

Popular Topics

Aybe One

How do you manage yourself and your time as years pass on? GDNet Lounge

Jikoob

Too many managers? General and Gameplay Programming

Do your part to stop global warming and climate change GDNet Lounge

Konjointed

C++ how do you use dependency injection? For Beginners

The promise of freedom in story games Game Design and Theory

robertrobert

Facebook Ads for sales on steam, does it work? Games Business and Law

Advertisement

Recommended Tutorials

Procedural Modeling of Buildings with Shape Grammars Engines and Middleware

Building a First-Person Shooter: Part 1.6 Sound General and Gameplay Programming

DOOM Style BSP Trees Graphics and GPU Programming

Design : Format Game Design and Theory

Myopic Rhino

Distributed Gaming Networking and Multiplayer