MSVC 2005 expression template optimization

Started by
5 comments, last by Ra 16 years, 6 months ago
So I'm toying around with expression templates a bit and while I'm setting them up I've come across this problem. MSVC doesn't want to do any compile-time folding on them at all. As you can see at the bottom of this code (in the assembly listing) it's adding 0.0f and 0.0f at runtime. I have the compiler set up with all optimizations on, emphasis on speed, SSE2 enabled, /fp:fast. Anyone know how I might fix this? Sauce:

#include <boost/typeof/typeof.hpp>
 
template<typename ArgT1, typename ArgT2>
struct OpAdd
{
private:
        typedef BOOST_TYPEOF_TPL(*reinterpret_cast<ArgT1 *>(0) + *reinterpret_cast<ArgT2 *>(0)) return_type;
 
public:
        static return_type apply(ArgT1 a1, ArgT2 a2)
        {
                return a1 + a2;
        }
};
 
template<typename ExprT1, typename ExprT2, typename Op>
struct BinaryExpr
{
private:
        ExprT1 e1;
        ExprT2 e2;
 
public:
        BinaryExpr(ExprT1 e1, ExprT2 e2) : e1(e1), e2(e2)
        {
        }
 
        float /*XXX*/ eval() const
        {
                return Op::apply(this->e1.eval(), this->e2.eval());
        }
};
 
template<typename C>
struct ConstantExpr
{
private:
        C c;
 
public:
        ConstantExpr(C c) : c(c)
        {
        }
 
        C eval() const
        {
                return this->c;
        }
};
 
#include <iostream>
 
int main(void)
{
00401000 51               push        ecx  
        BinaryExpr
        <
                ConstantExpr<float>,
                ConstantExpr<float>,
                OpAdd
                <
                        float,
                        float
                >
        >
        expr
        (
                ConstantExpr<float>(0.0f),
                ConstantExpr<float>(0.0f)
        );
00401001 0F 57 C0         xorps       xmm0,xmm0 
00401004 F3 0F 11 04 24   movss       dword ptr [esp],xmm0 
 
        std::cout << expr.eval();
00401009 8B 04 24         mov         eax,dword ptr [esp] 
0040100C 51               push        ecx  
0040100D 8B 0D 38 20 40 00 mov         ecx,dword ptr [__imp_std::cout (402038h)] 
00401013 89 44 24 04      mov         dword ptr [esp+4],eax 
00401017 F3 0F 58 44 24 04 addss       xmm0,dword ptr [esp+4] 
0040101D F3 0F 11 04 24   movss       dword ptr [esp],xmm0 
00401022 FF 15 40 20 40 00 call        dword ptr [__imp_std::basic_ostream<char,std::char_traits<char> >::operator<< (402040h)] 
        std::cin.get();
00401028 8B 0D 44 20 40 00 mov         ecx,dword ptr [__imp_std::cin (402044h)] 
0040102E FF 15 3C 20 40 00 call        dword ptr [__imp_std::basic_istream<char,std::char_traits<char> >::get (40203Ch)] 
}
00401034 33 C0            xor         eax,eax 
00401036 59               pop         ecx  
00401037 C3               ret

Ra
Advertisement
Doing the intended operation at compile-time is not the purpose of expression templates. Rather they are usually used to reduce copying overhead and delay the computation until a point where there is additional context that might help to perform the computation with a more efficient algorithm.

If you have both operands at compile time, you can easily construct the result of the computation yourself (with an appropriate comment to decrypt the "magic constant").

I don't think one should count on or expect this optimization from any compiler. Besides in a different context, when the compiler thinks it a good idea, it might be able to perform the optimization you're after.

Edd
Quote:Original post by the_edd
Doing the intended operation at compile-time is not the purpose of expression templates. Rather they are usually used to reduce copying overhead and delay the computation until a point where there is additional context that might help to perform the computation with a more efficient algorithm.

I know that.

Quote:Original post by the_edd
If you have both operands at compile time, you can easily construct the result of the computation yourself (with an appropriate comment to decrypt the "magic constant").

This can be annoying and hard to maintain. Why bother when the compiler can already do this automatically?

Quote:Original post by the_edd
I don't think one should count on or expect this optimization from any compiler.

I do. The compiler does it for straight literals, functions that return constant values, and many other things that are completely known at compile-time. This is no different.

Quote:Original post by the_edd
Besides in a different context, when the compiler thinks it a good idea, it might be able to perform the optimization you're after.

People give too much credit to the compiler. While this may be the case in some situations with inlining and whatnot, I have never found this to be true with anything trivial (such as this case). If it doesn't want to optimize it properly in the extremely simple case, it's not going to do it in a complicated usage scenario.

My real question is this: Is there anything in my code that's preventing the compiler from folding these constants?

I find it particularly odd that if, for example, I do something like 1.0f + 2.0f + 3.0f using the above expression templates it'll compile to 1.0f + 5.0f.
Ra
Forgive me, but I still don't understand the need. If you really do have constants, then you'll only need to do the calculation once and store it somewhere. Surely this can't be a bottleneck!?

Is time not better spent elsewhere?
I generally just assume that the compiler does zero optimization of floating-point arithmetics in any case. Compilers are generally extremely conservative about that. (I don't know how much /fp:fast helps, but probably not as much as you'd expect)

What happens if you just type out the same without the expression templates?
Somthing like

float a = 0.0f, b = 0.0ffloat f = a + b;

Does it perform the addition at runtime in that case too with your compiler settings?
Quote:Original post by Spoonbender
What happens if you just type out the same without the expression templates?
Somthing like

float a = 0.0f, b = 0.0ffloat f = a + b;

Does it perform the addition at runtime in that case too with your compiler settings?

Nope. It'll fold constants like that all day. It's certainly better at integer math, but it's not exactly bad with floats either.

On the other hand GCC 4.0.1 folds it (the code using the expression templates) without complaint. It generates code equivalent to std::cout << 0.0f;. Unfortunately I think this may just be a shortcoming of MSVC 8.0.
Ra
Update: If I make ConstantExpr accept a const C & in the constructor and store that reference internally, it completely optimizes it out as expected. I'm not 100% sure if that invokes undefined behavior, though.
Ra

This topic is closed to new replies.

Advertisement