This is akin to the same reason that C++ std::sort is faster than C's qsort
Well, no.
std::sort is first and foremost faster because qsort is not only a non-inlineable library function, but one that that calls back a user-supplied function (which needs to cast from void* and do whatever is needed as comparison, and for which the compiler cannot assume strict aliasing rules). That callback cannot possibly be inlined, nor can the compiler optimize across it. So assuming an pretty good sorting algorithm that needs exactly N comparisons, you already have added N non-inlined function calls.
Now of course std::sort has a comparison functor, too. So technically you have just as many function callbacks. But these can in practically every case be inlined, and the compiler is able to further optimize the whole "unit" of sort+functor, since it can see all the source.
Also, the comparison for qsort returns -1, 0, or 1 depending on the result whereas comparators for std::sort return bool. This lends to a much simpler logic for std::sort (of course, on many architectures, the more complex logic can be optimized into one compare and 3 flag-dependent conditional jumps on the library side, but that is not guaranteed, and the added complexity needed to produce a tri-state at the user side remains).