Sign in to follow this  
GlassBil

Construction of my math vectors are really slow

Recommended Posts

Hello,

I've recently developed my own maths library for computer graphics. I have a very CPU intensive program that I used to do calculations component wise e.g:

GLdouble rx = particle[j].position[0] - particle[i].position[0];
GLdouble ry = particle[j].position[0] - particle[i].position[0];
GLdouble rz = particle[j].position[0] - particle[i].position[0];



However now that I've started using my own maths library my FPS goes from around 30 to around 11 FPS by changing the code above to:

Vector3d distance = particle[j].position - particle[i].position;



This makes me think that there's something wrong with my maths library. (The code above gets executed 1000^1000 times every iteration).

My vector constructors are:

// Constructs and sets the vector to (0, 0, 0)
Vector3() : x(0), y(0), z(0) { }

// Constructs and sets the vector to (vx, vy, vz)
Vector3(T vx, T vy, T vz) : x(vx), y(vy), z(vz) { }

// Constructs with data from another vector
Vector3(const Vector3<T>& v) : x(v.x), y(v.y), z(v.z) { }



Equal and subtraction has the code:

Vector3<T> operator-(const Vector3<T>& v) const
{
return Vector3<T>(x - v.x, y - v.y, z - v.z);
}

const Vector3<T>& operator=(const Vector3<T>& v)
{
x = v.x;
y = v.y;
z = v.z;
return *this;
}



What can I do to fix my horrible performance dip?

Regards

Share this post


Link to post
Share on other sites
Quote:
Original post by GlassBil
What can I do to fix my horrible performance dip?

What compiler are you using? Do you have optimizations enabled? If you are using Visual C++ 2010, then in addition to regular optimizations, you may be able to enable SSE/SSE2 optimizations to your program.

Share this post


Link to post
Share on other sites
Is this dip measured in release or debug builds? For a debug build, it makes perfect sense that you would see a performance drop, as you are creating a lot of temporary objects with the class based operators. In release mode, the compiler should optimize out the temporaries for you, yielding code more or less equivalent to your original approach.

If you are in release mode, try taking a look at the assembler output and seeing what the compiler is doing under the hood.

Share this post


Link to post
Share on other sites
Hi,

Another thing you can easily do to find out what the compiler is doing even in a
release optimized build is to have your ctor/dtor/assignment op. print some small
msg to a log file or the console.

Although this won't be good in terms of measuring the performance of the code, it
can tell you what the compiler is trying to do, even in release.

Now, without any kind of optimizations, the change of code is, roughly, a change
from substracting a set of 3 numbers and storing their result in 3 local varaibles, as opposed to the following operations with the vector class:

1. Creating a temp object to hold the result of the substraction
2. calling operator -
3. constructing a new object to hold the result, on the stack
4. calling operator =
5. destroying the temp object
6. once the result vector goes out of scope, it too will be destroyed.

Quite a bit for just one substract operation, right ? And we don't know anything
about how you called the code. If you're not careful there could be aliasing
issues and more...

A good compiler will optimize most of these problems, but apparently not all.

Share this post


Link to post
Share on other sites
Thanks for your responses. I've run it in Win7 with VC++ 2010 (debug mode) and with g++ in Centos with similar results.

I will try turning on optimization later today and get back to you with the results.

Share this post


Link to post
Share on other sites
Couldn't the initializer list in the default constructor be the problem? Turning optimizations on, however, even with the initializer list should amlost eliminate or even completely remove the performance gap...

Share this post


Link to post
Share on other sites
Using classes for such purposes is very nice and clean, but for time critical situations I'm always using inline functions or even macroses. As the others suggested, the point is to avoid creating temporally objects for such simple operations. It's true that the compiler would optimize the code if told so, but I suggest you to avoid the practice "do whatever you can imagine, the compiler should find about what you meant".

Share this post


Link to post
Share on other sites
Quote:
Original post by inprazia
Using classes for such purposes is very nice and clean, but for time critical situations I'm always using inline functions or even macroses. As the others suggested, the point is to avoid creating temporally objects for such simple operations. It's true that the compiler would optimize the code if told so, but I suggest you to avoid the practice "do whatever you can imagine, the compiler should find about what you meant".


Have you actually tested that?

I've gone through large amounts of code replacing things like A = B * a + C with inline functions... and found that the result (when compiler optimisations are enabled) is actually slower than the original. So you may have just been obfuscating your code with no benefit...

Share this post


Link to post
Share on other sites
Hello!

I've now tried to run it with /O2 in VC++ 2010 in Release mode. This gives me 60 FPS and the old code (not using my maths library) gives me around 55 FPS. So it's actually faster now.

However my problem is that I can't get the optimization to work in Debug mode. When compiling I get the following error:

1>------ Build started: Project: Fluid, Configuration: Debug Win32 ------
1>cl : Command line error D8016: '/ZI' and '/O2' command-line options are incompatible
========== Build: 0 succeeded, 1 failed, 0 up-to-date, 0 skipped ==========




The default for "Debug Information Format" in Release mode is "Program Database (/Zi)", but when using that in debug mode I get:

1>------ Build started: Project: Fluid, Configuration: Debug Win32 ------
1>cl : Command line error D8016: '/O2' and '/RTC1' command-line options are incompatible
========== Build: 0 succeeded, 1 failed, 0 up-to-date, 0 skipped ==========




Is it even possible to do the optimization in Debug mode?

When it comes to compiling with g++ I guess I should use -O2 or similar flags?

Thanks again for all your help.

Share this post


Link to post
Share on other sites
Quote:
Original post by GlassBil
Command line error D8016: '/O2' and '/RTC1' command-line options are incompatible


That is clearly saying that you cannot optimize and include debugging information at the same time. gcc doesn't have a warning like this, but debugging optimized code is one of the best ways I know of going crazy.

Share this post


Link to post
Share on other sites
Quote:
Original post by GlassBil
Is it even possible to do the optimization in Debug mode?


Yes, it is, but you generally don't want to. Debug builds exist only as a tool for the developer, they are not meant for public release. So unless performance is so bad that you cannot test your code properly (for example, getting 1/2 FPS in a game), it's generally not worth worrying about performance in debug builds.

Share this post


Link to post
Share on other sites
Quote:
Original post by MrRowl
Quote:
Original post by inprazia
Using classes for such purposes is very nice and clean, but for time critical situations I'm always using inline functions or even macroses. As the others suggested, the point is to avoid creating temporally objects for such simple operations. It's true that the compiler would optimize the code if told so, but I suggest you to avoid the practice "do whatever you can imagine, the compiler should find about what you meant".


Have you actually tested that?

I've gone through large amounts of code replacing things like A = B * a + C with inline functions... and found that the result (when compiler optimisations are enabled) is actually slower than the original. So you may have just been obfuscating your code with no benefit...


You're right, in this case the class implementation is faster when optimized. But this is not generally true. If T is float, not double, the inline function is faster on my PC. Moreover, once you have some dynamic memory allocation in the class constructor, the inline function would be a much better solution.

Share this post


Link to post
Share on other sites
Quote:
Original post by inprazia
[...] I suggest you to avoid the practice "do whatever you can imagine, the compiler should find about what you meant".


The practice I recommend is "make your code as clear as possible, and only complicate things to gain performance if you have a profiler run that justifies it."

Share this post


Link to post
Share on other sites
Quote:
Original post by alvaro
The practice I recommend is "make your code as clear as possible, and only complicate things to gain performance if you have a profiler run that justifies it."


The wisest words I have ever read on this site :o)

Share this post


Link to post
Share on other sites
I'm assuming "Just be lazy and use what somebody else already wrote" isn't a good answer? You don't want to just use Eigen?

They do everything under the hood with expression templates; it's all template metaprogramming.

Share this post


Link to post
Share on other sites
I also totally agree with alvaro. Clean code saves time (while developing and while maintaining) which can be spent in something that actually is worth it (for example in optimizing code that NEEDS to be well optimized). Plus, clean code is just nice :)

Share this post


Link to post
Share on other sites
Quote:
Original post by Emergent
I'm assuming "Just be lazy and use what somebody else already wrote" isn't a good answer? You don't want to just use Eigen?

They do everything under the hood with expression templates; it's all template metaprogramming.


I could. But I thought that it would be a nice learning experience to write my own and get experience using templates, unions, operator overloading etc.

I like to write things from scratch, it's a little more satisfying :). I guess I loose a little performance since my code probably isn't as good as the others, but I learn lots from it.

Share this post


Link to post
Share on other sites
GlassBil,

For performance testing you could just disable program databases, although it
is quite possible to have pdb's that reflect your release build. We do it. Not
only that, we deploy a debugger package, built around WinDBG.
It will catch our application when crashes (which sadly in such a complex app
still happens :) and will provide call stacks even when the code is fully
optimized.

If its important to anyone, i can check the actual settings of our projects for
how to do this..

So the short answer is yes - its possible to have debug info in release. But no,
you cannot have run-time checks (RTC, any of them really) in release.

BTW, run-time checks are among the worst performance hogs for libraries such as
the C++ standard library (i.e: any STL stuff really :), at least in debug builds.

STD/STL is great, but FYI, something to keep in mind...

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this