Sign in to follow this  

Why std::copy is faster than std::memcpy ?

This topic is 818 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi,

Why std::copy is faster than std::memcpy ?

Possible implementation of std::copy :

template<class InputIt, class OutputIt>
OutputIt copy(InputIt first, InputIt last, OutputIt d_first)
{
  while (first != last) {
    *d_first++ = *first++;
  }
  return d_first;
}

Thanks

Edited by Alundra

Share this post


Link to post
Share on other sites

That's not how most std::copy implementations work. If you actually look at it you will see...

That is actually a fun exercise.

 

Seriously, open up your implementation's version of std::copy. Find all the variations, since there are likely several with subtle differences in types.

 

Then look over how the different forms of copy's template parameter types are themselves template types with their own subtle variations and internal implementations. So a small number of std::copy() templates can be implemented in a large number of implementation details. Some implementation variations for random access iterators, for pointers to scalars, for forward iterators, for input iterators, for arbitrary other iterators, and so on.

Share this post


Link to post
Share on other sites

I wouldn't worry about differences in speed between two copy methods, but instead try to avoid having to copy things.

 

Seriously, if such speed differences make a significant difference, there is either something very wrong on the design, or you're working at the very edge of what the application or the system can handle, which means that if you make things a tad bigger, it dies anyway.

Share this post


Link to post
Share on other sites

This is akin to the same reason that C++ std::sort is faster than C's qsort

Well, no.

 

std::sort is first and foremost faster because qsort is not only a non-inlineable library function, but one that that calls back a user-supplied function (which needs to cast from void* and do whatever is needed as comparison, and for which the compiler cannot assume strict aliasing rules). That callback cannot possibly be inlined, nor can the compiler optimize across it. So assuming an pretty good sorting algorithm that needs exactly N comparisons, you already have added N non-inlined function calls.

 

Now of course std::sort has a comparison functor, too. So technically you have just as many function callbacks. But these can in practically every case be inlined, and the compiler is able to further optimize the whole "unit" of sort+functor, since it can see all the source.

 

Also, the comparison for qsort returns -1, 0, or 1 depending on the result whereas comparators for std::sort return bool. This lends to a much simpler logic for std::sort (of course, on many architectures, the more complex logic can be optimized into one compare and 3 flag-dependent conditional jumps on the library side, but that is not guaranteed, and the added complexity needed to produce a tri-state at the user side remains).

Edited by samoth

Share this post


Link to post
Share on other sites

Well, no.
 
std::sort is first and foremost faster because qsort is not only a non-inlineable library function, but one that that calls back a user-supplied function


Well, yes.

The function being inline-able into the algorithm is a consequence of how "the optimizer can see into the instantiation of templates" as I said. smile.png

Share this post


Link to post
Share on other sites

I read that from Stack Overflow but I did the test by myself memcpy vs copy to copy 1000000 times a matrix identity 4x4 but the time difference is there :

memcpy = 0ms
copy = 9ms

The test was made in release mode.

Edited by Alundra

Share this post


Link to post
Share on other sites

Nice to know, but since the compiler optimize for memcpy, it's safer to always use memcpy, true ?

About the timer, good point, I should stop to use SDL timer and implement a custom one with better precision.

Share this post


Link to post
Share on other sites

Nice to know, but since the compiler optimize for memcpy, it's safer to always use memcpy, true ?About the timer, good point, I should stop to use SDL timer and implement a custom one with better precision.

If you are in C++, you should use std::copy. memcpy is dangerous, it doesn't invoke copy or move constructors, and completely ignores type safety.

For any relatively trivial pod type, the compiler will probably just replace std::copy, or a simple for loop copy, with an intrinsic memcpy.

Share this post


Link to post
Share on other sites

For any relatively trivial pod type, the compiler will probably just replace std::copy, or a simple for loop copy, with an intrinsic memcpy.
That is of course absolutely true, but since there exist trivial POD types and not so trivial non-POD types, and std::copy performs optimally for either one, you should still prefer std::copy over memcpy.

 

Even if there is no immediately obvious advantage/disadvantage in some particular, isulated case... be consistent.

 

Better always use the same thing for every type than use memcpy for one half "because it works, too" and std::copy for the other half, only to discover half a year later that you wasted two work days hunting down a bug where move constructors weren't called because you accidentially used memcpy when you shouldn't have.

Share this post


Link to post
Share on other sites

This topic is 818 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this