C++11 template question

Started by
7 comments, last by SmkViper 8 years, 4 months ago

template<typename T, std::size_t N>
constexpr std::size_t arraySize(T (&)[N]) noexcept 
{ 
return N; // and
}

(from the book Effective Modern C++)

arraySize returns the number of elements inside an array.

The parameter to the actual function is a reference to an array of type T with N elements in it.

But I don't understand how N is deduced, and why it even appears in the tags for the template.

arraysize(myArray);

How is N deduced from the above call?

Advertisement
It is information the compiler has on hand because the type is a reference to an array. As the book points out, this is one of those interesting little quirks of the language.


Remember that template code is not actual c++, it is a template. The compiler effectively turns the template into a custom invisible cpp file with all the elements filled in, then that invisible cpp file gets compiled in.

Under the old C rules, passing an array is equivalent to passing a pointer to the first element. So as the book points out, the compiler's invisible custom cpp file would have a pointer to the item. If the type were an array (like: int [10]) it would still be converted to a pointer to the first element (like: int*). That is why the template needs to use a reference (T&) rather than a plain type (T) or type pointer (T*).

Since it is a reference to the array, and because the C++ rules on references are slightly different than the old C rules for passing arrays, the compiler has the data to figure out the exact type of the array. This is information the compiler knows at the time of compilation.




So when the compiler generates the custom invisible cpp file from the template, the magic happens as the full type information is used to substitute T and N in the template.

The compiler takes the definition of the type that it knows: int foo[10];

and applies it when it sees: arraySize(foo);

then it generates the invisible file with this function, filling in the blanks T and N in the template with the information it already knows:

constexpr std::size_t arraySize( int (&)[10]) noexcept { return 10; }

Again, this only works if you've got a reference (the & in the middle) because references have the full information. If it was not a reference, the type would be deduced as a pointer to int, (int *) so for this code the compiler could not properly substitute out the array parameter, so the substitution would fail and the compiler would not consider it as a valid template instantiation.

The basic parameter to the function says that you are passing in a reference to an unnamed array (the (&) part), which is of type T (the T part) and of size N (the [N] part). Since this information must be known at compile time, it means it only work for explicitly declared arrays that will never change size, i.e. something like int temp[50], however if you did something like pass the array in as a pointer, you loose the size information of the array and the code wont even compile. Thus in summary the value of N is derived from the size used in the array declaration.

This example shows the case where it works because the size (which is 50) of the array is known at compile time:


#include <iostream>

template<typename T, std::size_t N>
constexpr std::size_t arraySize(T (&)[N]) noexcept
{
    return N; // and
}

int main()
{
    int temp[50];

    std::cout << "Array Size: " << arraySize(temp) << std::endl;

    return 0;
}

This shows a broken example where the code wont compile because the size information has been lost as the pointer only stores the memory locations of the first element in the array and nothing relating to the size of the array.


#include <iostream>

template<typename T, std::size_t N>
constexpr std::size_t arraySize(T (&)[N]) noexcept
{
    return N; // and
}

int main()
{
    int temp[50];
    int * temp1 = temp;

    std::cout << "Array Size: " << arraySize(temp1) << std::endl;

    return 0;
}

The function parameter don't have to be unnamed, i.e. it can be given a name though there is no point since it's not referenced inside the function. I.e:


template<typename T, std::size_t N>
constexpr std::size_t arraySize(T (&array)[N]) noexcept
{
    return N; // and
}

And for good measure, since I rarely see arrays being passed in by reference, here is a short example of passing arrays by reference to functions.


#include <iostream>

// The size of the array.
#define ARRAY_SIZE  50

// Return the size of the array.
template<typename T, std::size_t N>
constexpr std::size_t gArraySize(T (&)[N]) noexcept
{
    return N;
}

// Print the content of an array.
void gPrintArray(int (&temp)[ARRAY_SIZE])
{
    // Print all the values.
    for(auto i : temp)
    {
        std::cout << i << ",";
    }

    // Output a new line.
    std::cout << std::endl;
}

int main()
{
    // The temporary test array.
    int temp[ARRAY_SIZE];

    // Initialise the array.
    for(int i = 0; i < ARRAY_SIZE; i++)
    {
        // Set the array value to it's index in the array.
        temp[i] = i;
    }

    // Print the size of the array.
    std::cout << "Array Size: " << gArraySize(temp) << std::endl;

    // Print the array.
    gPrintArray(temp);

    return 0;
}

Of course, those of us who write firmware will not go through all this trouble just to get the size, we just straight use the ARRAY_SIZE definition to save precious clock cycles ;)

Of course, those of us who write firmware will not go through all this trouble just to get the size, we just straight use the ARRAY_SIZE definition to save precious clock cycles ;)

Sounds like you don't realise that constexpr methods are evaluated at compile time and as such have no run-time penalty whatsoever. And if that's not enough, the int template argument is also determined at compile time.
Identical assembly is generated to the code with macros, but without the headache macros bring.

When people writing firmware do this sort of thing it's actually typically because they're stuck using an ancient compiler written at a time when templates were either unheard of, or were still in their infancy.

"In order to understand recursion, you must first understand recursion."
My website dedicated to sorting algorithms


Sounds like you don't realise that constexpr methods are evaluated at compile time and as such have no run-time penalty whatsoever. And if that's not enough, the int template argument is also determined at compile time.

With a caveat. There is no run time penalty when compiled in release mode. It will be different in debug mode because of how the code is generated. In a test of performance difference, it showed the template to be 10.3 (1031%) times slower than the #define. When compiled with optimisation, the template speed increased by 7450% and the #define by 724%. (This test was performed with 10 000 000 000 lookups in 100 batches).


Identical assembly is generated to the code with macros, but without the headache macros bring.

Well no. In debug builds I would expect it to be treated as a function and in release build as a constant. This has different effects on your stack, status registers, etc. The template is fundamentally different depending on the compile mode.


When people writing firmware do this sort of thing it's actually typically because they're stuck using an ancient compiler written at a time when templates were either unheard of, or were still in their infancy.

Yes there is that as well, but they do it because it works, what you see is what you get and it is reliable. The instance you let the compiler decide, you have to hope for the best (especially if you are moving across platforms). You have to admit, the dirty little #define is a lot more consistent.

Sounds like you don't realise that constexpr methods are evaluated at compile time and as such have no run-time penalty whatsoever. And if that's not enough, the int template argument is also determined at compile time.

With a caveat. There is no run time penalty when compiled in release mode. It will be different in debug mode because of how the code is generated. In a test of performance difference, it showed the template to be 10.3 (1031%) times slower than the #define. When compiled with optimisation, the template speed increased by 7450% and the #define by 724%. (This test was performed with 10 000 000 000 lookups in 100 batches).


Speed in debug is irrelevant compared to code safety. You don't ship debug code, you ship release.

Identical assembly is generated to the code with macros, but without the headache macros bring.

Well no. In debug builds I would expect it to be treated as a function and in release build as a constant. This has different effects on your stack, status registers, etc. The template is fundamentally different depending on the compile mode.


See above - you don't ship debug code. Your debug code is already different because the compiler will do less inlining, will preserve stack variables when it can otherwise re-use them or store things in registers, and avoid things like frame pointer optimization that makes debugging harder.

Running a function at runtime instead of at compile time is small potatoes in comparison.

When people writing firmware do this sort of thing it's actually typically because they're stuck using an ancient compiler written at a time when templates were either unheard of, or were still in their infancy.

Yes there is that as well, but they do it because it works, what you see is what you get and it is reliable. The instance you let the compiler decide, you have to hope for the best (especially if you are moving across platforms). You have to admit, the dirty little #define is a lot more consistent.


Templates and constexpr are far more reliable than macros. The compiler is generally far smarter than you about what is fast and what isn't. In most cases when people try "clever little tricks" in C++ they end up making their code slower because they're inhibiting other optimizations that compiler is doing.

There is a reason several language features are at the discretion of the compiler like "inline".

I can debate you on all these points but after trying to insert an image to prove my A* claim below, I accidentally lost everything I've typed :S

Please don't think that I'm just arguing for the sake of arguing or intentionally trying to rub people the wrong way. I have to talk about my experiences because I would just be lying talking about anything else. The reality is I disagree with you on almost every-point. My background is in firmware development and consequently I have done everything from writing compiler, compiler patches (where gcc generated incorrect assembly for an MSP4530) to full pre-emptive operating systems, custom schedulers, etc. I think you put too much faith in your compiler. They are written by very clever people, but compilers are only as smart as the people who write them.


Speed in debug is irrelevant compared to code safety. You don't ship debug code, you ship release.

You want your debug code to be similar enough to your release code so you can debug with confidence knowing that an issue in release mode can be debugged. The reality in embedded development is that when code paths change can experience everything from stack overflows to silicon errors (that required particular asm sequences that the compiler optimised out), etc. If your relative timing changes, then you may or may not be able to debug timing issues on your interfaces and you will see that with multi threaded applications where race conditions exist in one mode but not the other etc. The reality is, the closer your debug version is to your release version, the better you life is regarding debugging. I have seen cases where problems has been so severe that the debug version was actually the version that was shipped. Debugging programs take time, like everything else on the project, it's not adequate to disregard it as having no impact on the final product.


See above - you don't ship debug code. Your debug code is already different because the compiler will do less inlining, will preserve stack variables when it can otherwise re-use them or store things in registers, and avoid things like frame pointer optimization that makes debugging harder.

Running a function at runtime instead of at compile time is small potatoes in comparison.

Your function call overhead in debug mode could very well be the reason between a race condition appearing or disappearing. Runtime vs compile time penalties are massively important. About 6 months, my friend and I wrote two separate implementations of the A* star algorithm, using the pseudo code published on wikipedia. Like you he believes very much templates and not using "clever little tricks" and well I obviously don't. He used all the respective STL and C++ functions he thought appropriate and I implemented my own containers using core C++ with general firmware best practices. In the end, for the same path using the same heuristic, mine was on average 1000x faster. Using a slightly modified heuristic mine became 2000x faster. (Respective calc times: 0.393397 seconds vs 975.823250 seconds.) I was going to write it up as an article to post here, but I have to admit, I'm a bit worried now. Both bits of code has been build with purely macro optimisations (i.e. which data structure is the best for the job, etc, no ASM, SSE, etc) and what ever default optimisation were provided for release mode builds.

You may not agree with me and I respect that. But for me game engines are closer to firmware (realtime) than a desktop processing application (non-realtime). As such, I don't want surprises and I certainly want performance. Any time that my engine wastes is time that the game developer cannot use adding extra awesome.

This conversation is starting to sound exactly like the same flame wars that rage on the kernel formus any time someone mention C vs C++ for kernels, etc. As such, I will respectfully opt out of any further discussion on this thread.

This conversation is starting to sound exactly like the same flame wars that rage on the kernel formus any time someone mention C vs C++ for kernels, etc.

Don't try to convince people that your view is the one and only correct one, just informing them about your different view is enough, let them make up their own mind about it.

Everybody has a valid view, even if they fully disagree on each and every point.

Instead of convincing, assume the other person is as sane as you and try to understand why the other person has a different view. It makes you a richer person.

I was going to write it up as an article to post here, but I have to admit, I'm a bit worried now.

I would be interested in such an article, especially any analysis where the factor 1000 is coming from.

I can debate you on all these points but after trying to insert an image to prove my A* claim below, I accidentally lost everything I've typed :S

Please don't think that I'm just arguing for the sake of arguing or intentionally trying to rub people the wrong way. I have to talk about my experiences because I would just be lying talking about anything else. The reality is I disagree with you on almost every-point. My background is in firmware development and consequently I have done everything from writing compiler, compiler patches (where gcc generated incorrect assembly for an MSP4530) to full pre-emptive operating systems, custom schedulers, etc. I think you put too much faith in your compiler. They are written by very clever people, but compilers are only as smart as the people who write them.


Well that's a salient point smile.png

My experience is in desktop/console development where the compilers are generally extremely well-tested, and I have certainly heard horror stories of compilers for embedded systems.

As such I only have a few small things in response:

For games you don't ship debug because debug performance is horrible. Heck, our studio has a "release beta" build which is basically "debug + optimizations" so the game runs fast enough to be playable, and you have enough debug information for debugging to be tolerable.

"Race conditions" in games are indications of broken code. Properly working code will never have race conditions because you've properly locked all your shared memory, or are operating on independent copies with proper merges/message passing/whatever. Yes, debug/release will change where race conditions occur. And will change whether they even happen. But so does moving a single line of code up or down in a supposedly "unrelated" function. Therefore, focus on the version we ship (i.e. release).

And finally, I have never personally seen a case where templates slowed down code with a good compiler (emphasis on "good compiler" - something like a modern Clang, or MS compiler). In fact, in almost all cases, templated code resulted in a speed increase over non-templated code, and especially over an equivalent "pure C" algorithm. Why? Because the compiler has a lot more information to work with and perform good optimizations.

So.

We can agree to disagree smile.png

In my case - working with good compilers on well-supported hardware, I believe my points to be valid (and for most people on this site, they will generally be working on well-supported hardware by making programs for their local Windows/Linux/OSX machines).

However if you want to write an article about how your experience and coding practices may provide different or better solutions to things, I would certainly read it.

This topic is closed to new replies.

Advertisement