# Why std::copy is faster than std::memcpy ?

This topic is 1127 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

Hi,

Why std::copy is faster than std::memcpy ?

Possible implementation of std::copy :

template<class InputIt, class OutputIt>
OutputIt copy(InputIt first, InputIt last, OutputIt d_first)
{
while (first != last) {
*d_first++ = *first++;
}
return d_first;
}

Thanks

Edited by Alundra

##### Share on other sites

That's not how most std::copy implementations work. If you actually look at it you will see...

That is actually a fun exercise.

Seriously, open up your implementation's version of std::copy. Find all the variations, since there are likely several with subtle differences in types.

Then look over how the different forms of copy's template parameter types are themselves template types with their own subtle variations and internal implementations. So a small number of std::copy() templates can be implemented in a large number of implementation details. Some implementation variations for random access iterators, for pointers to scalars, for forward iterators, for input iterators, for arbitrary other iterators, and so on.

##### Share on other sites

I wouldn't worry about differences in speed between two copy methods, but instead try to avoid having to copy things.

Seriously, if such speed differences make a significant difference, there is either something very wrong on the design, or you're working at the very edge of what the application or the system can handle, which means that if you make things a tad bigger, it dies anyway.

##### Share on other sites

This is akin to the same reason that C++ std::sort is faster than C's qsort

Well, no.

std::sort is first and foremost faster because qsort is not only a non-inlineable library function, but one that that calls back a user-supplied function (which needs to cast from void* and do whatever is needed as comparison, and for which the compiler cannot assume strict aliasing rules). That callback cannot possibly be inlined, nor can the compiler optimize across it. So assuming an pretty good sorting algorithm that needs exactly N comparisons, you already have added N non-inlined function calls.

Now of course std::sort has a comparison functor, too. So technically you have just as many function callbacks. But these can in practically every case be inlined, and the compiler is able to further optimize the whole "unit" of sort+functor, since it can see all the source.

Also, the comparison for qsort returns -1, 0, or 1 depending on the result whereas comparators for std::sort return bool. This lends to a much simpler logic for std::sort (of course, on many architectures, the more complex logic can be optimized into one compare and 3 flag-dependent conditional jumps on the library side, but that is not guaranteed, and the added complexity needed to produce a tri-state at the user side remains).

Edited by samoth

##### Share on other sites

Well, no.

std::sort is first and foremost faster because qsort is not only a non-inlineable library function, but one that that calls back a user-supplied function

Well, yes.

The function being inline-able into the algorithm is a consequence of how "the optimizer can see into the instantiation of templates" as I said.

##### Share on other sites

Why std::copy is faster than std::memcpy ?

Could you post your profiling scenario that made you observe this? We might speculate further then.

##### Share on other sites

I read that from Stack Overflow but I did the test by myself memcpy vs copy to copy 1000000 times a matrix identity 4x4 but the time difference is there :

memcpy = 0ms
copy = 9ms

The test was made in release mode.

Edited by Alundra

1. 1
Rutin
32
2. 2
3. 3
4. 4
5. 5

• 13
• 59
• 11
• 10
• 14
• ### Forum Statistics

• Total Topics
632967
• Total Posts
3009559
• ### Who's Online (See full list)

There are no registered users currently online

×