Quote:Original post by ZahlmanQuote:Original post by Spoonbender
you might as well make it easy for the compiler by making it a template in the first place, rather than hoping that it will templatize your code behind your back by compiling a special optimized version for each value of size.
I'd like to point out:
- Loop unrolling doesn't require a separate version of the code for each iteration-count.
- Trying to find the correct separate-version at runtime could well (in at least some cases) outweigh the advantages of the optimized code in the first place. Except that it can't even really be done, unless code exists for every possible runtime value (basically impossible for an int parameter).
There is no runtime cost for GCP. If the compiler determines that size is a single constant, it will transform
uint32 add(uint32* out, const uint32* a, const uint32* b, int size)
into
uint32 add(uint32* out, const uint32* a, const uint32* b)
and simply use the value of size literally (or refer to it's address if it was complex). If you call add() with multiple size values, even if they are constants or literals, you will get a parameter pass as expected. As such, this isn't a replacement for templates (nor did I ever suggest it was), it's just a nice way to maintain generality and get the benefits of constants wrt to loop unrolling and branch collapsing without having to roll a template for a single instance.
As an aside, a compiler could generate custom variants based on values of size, but there is still no search or runtime cost involved; if you see add(..,..,..,4) you call add_4(), if you see add(..,..,..,SIZE8) you call add_8(), if you see add(..,..,..,var) you call the general version. I've personally never seen a compiler do this, but I've never looked for it either. I wouldn't be surprised if some compilers did do it however.