Jump to content
  • Advertisement
Sign in to follow this  
ATC

Math API performance: saving CPU cycles?

This topic is 2217 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I'm trying not to go overboard with micro-optimizations for my engine's math API, but I am trying to put some consideration into performance. For example, it's my understanding than multiplication is slightly faster than division, and saves a few CPU cycles here and there; and this can add up when high-frequency code is executing over and over in a game loop. So I have done things like this example from my Matrix structure:

[source lang="csharp"]
public static Matrix operator /(Matrix mat, float div) {
#if PERFORM_CHECKS
if (div == 0)
throw new MathematicalException(
"Divisor is zero.", new DivideByZeroException());
#endif

float num = 1f / div;

var result = Matrix.Identity;
result.M11 = mat.M11 * num;
result.M12 = mat.M12 * num;
result.M13 = mat.M13 * num;
result.M14 = mat.M14 * num;
result.M21 = mat.M21 * num;
result.M22 = mat.M22 * num;
result.M23 = mat.M23 * num;
result.M24 = mat.M24 * num;
result.M31 = mat.M31 * num;
result.M32 = mat.M32 * num;
result.M33 = mat.M33 * num;
result.M34 = mat.M34 * num;
result.M41 = mat.M41 * num;
result.M42 = mat.M42 * num;
result.M43 = mat.M43 * num;
result.M44 = mat.M44 * num;

return result;
}[/source]

Is this correct/true, and should I be doing it this way? And what other optimizations might I use in general to make my math code blazing fast and efficient?

Might I even consider doing something like this:

[source lang="csharp"]#if !PERFORM_CHECKS
unchecked {
#endif

// math code here...

#if !PERFORM_CHECKS
}
#endif[/source]

Share this post


Link to post
Share on other sites
Advertisement

That looks like something you should use SIMD for tbh.


Well, the problem with that is, I think, that SIMD is CPU-specific; is it not? And I'm writing a C# engine. So dealing with that fact would make this difficult...

Furthermore, there are only two ways I know of to execute arbitrary machine code from C#. The first way is the "standard" way, using the interop layer. But that will incur a performance penalty that may defeat the whole purpose. The only other way I know to execute "pure" machine code is a bit complicated and may not speed things up very much to be worth it. But essentially, this is how it's done:

First you create a pool of unmanaged memory and "trick" the CLR into believing it is executable code. One way is to use the Reflection API and treat it as a module. Then you obtain an address to some place in the memory that is safe to write to. You then emit machine code instructions (in raw bytes) and copy it into memory. Then you use Marshal class to obtain a delegate wrapping the function pointer. Then you can call the code and it will indeed work. I did this before just for giggles, and I actually have the project saved somewhere on my old HDD. But doing all this seems way overkill, and I doubt it would be worth it. And I'd have to figure out a way to emit the correct machine code for every single processor architecture this engine could conceivably be used on. As this is a platform-agnostic engine, that is no small undertaking.

Share this post


Link to post
Share on other sites
What about interoperating my engine with Intel's Math Kernel Libraries? Would the performance gain be worth it? And will it be portable across not only computer platforms but consoles and mobile devices as well?

Share this post


Link to post
Share on other sites

Is this correct/true, and should I be doing it this way?

That's probably correct/true. However, the only way to know for sure is to test it in a program. If you are writing a Math library without a project that uses it, I would say you are doing it wrong. In the same manner as game engines are usually created by extracting and polishing the parts of a game that can be reused in other games, a Math library should be created by extracting and polishing the Math-related code that can be reused in other projects. But if you don't have a project to start, how are you going to test your library and how are you going to know what would be useful or not?

And what other optimizations might I use in general to make my math code blazing fast and efficient?[/quote]
Don't use C#. smile.png Edited by alvaro

Share this post


Link to post
Share on other sites

That's probably correct/true. However, the only way to know for sure is to test it in a program. If you are writing a Math library without a project that uses it, I would say you are doing it wrong. In the same manner as game engines are usually created by extracting and polishing the parts of a game that can be reused in other games, a Math library should be created by extracting and polishing the Math-related code that can be reused in other projects. But if you don't have a project to start, how are you going to test your library and how are you going to know what would be useful or not?


You're right, and I am using a test project. I'm not merely writing a math library, I'm writing an engine. And it's a pretty large project. It's a major pain to go in a perform micro-testing on every little algorithm, so I try to do things right from the start and then go in and actually do all those micro-tests and micro-optimizations every other week; it usually takes a day or a few days dedicated to that and that alone.


Don't use C#. smile.png


Lol, c'mon... XD

IIRC, last time I did a head-to-head "race" of C# vs C math code the difference was negligible (often at or a near a tie) with CLR checks turned off. After all, once the code is run through JIT it is native code. That's why when I performance test C# code I will run one iteration of the test first and ignore/throw away the results; to "pre-JIT" the code...as the first time it runs it incurs an overhead subsequent tests will not. I was recently talking about that in another thread. C# is by no means slow or "less powerful". Whereas I'm losing a little speed in some areas of the engine, I'm going to win in the overall picture. The memory efficieny, stability and reduced complexity of engine internals makes this thing perform at a rather breath-taking speed.

For example, a few years ago I ran a test of a prototype of this engine which wasn't nearly as good/optimized as this commercial WIP version. I brute-force renderer a terrain made up of several million tris with complex multi-texture blending shaders, normal mapping and lighting... It was running about 7500fps despite the scene being so "heavy". :) Edited by ATC

Share this post


Link to post
Share on other sites

[quote name='SimonForsman' timestamp='1348593671' post='4983649']
That looks like something you should use SIMD for tbh.


Well, the problem with that is, I think, that SIMD is CPU-specific; is it not? And I'm writing a C# engine. So dealing with that fact would make this difficult...

Furthermore, there are only two ways I know of to execute arbitrary machine code from C#. The first way is the "standard" way, using the interop layer. But that will incur a performance penalty that may defeat the whole purpose. The only other way I know to execute "pure" machine code is a bit complicated and may not speed things up very much to be worth it. But essentially, this is how it's done:
[/quote]


You could just use Mono.SIMD it works with .Net aswell(allthough .Net users don't get actual SIMD support AFAIK) and should work with most modern x86 CPUs. Edited by SimonForsman

Share this post


Link to post
Share on other sites

You could just use Mono.SIMD it works with .Net aswell(allthough .Net users don't get actual SIMD support AFAIK) and should work with most modern x86 CPUs.


Interesting. I'll look into this.

But anyone know anything about Intels MKL? It sounds like it can be pretty darn fast.

Share this post


Link to post
Share on other sites

It's a major pain to go in a perform micro-testing on every little algorithm, so I try to do things right from the start and then go in and actually do all those micro-tests and micro-optimizations every other week; it usually takes a day or a few days dedicated to that and that alone.

You need to come up with a better test framework. It shouldn't be necessary to spend days running tests and profiling.

You should absolutely have unit tests that can be run after every build. These are to check the correctness of each of your math functions.

But you probably also want to consider integrating profiling with your build. Flip a switch, and it will compile an executable with profiling built-in. Then you can run real-world tests with your actual application, to ensure that you are hitting your performance goals.

Share this post


Link to post
Share on other sites

You need to come up with a better test framework. It shouldn't be necessary to spend days running tests and profiling.

You should absolutely have unit tests that can be run after every build. These are to check the correctness of each of your math functions.

But you probably also want to consider integrating profiling with your build. Flip a switch, and it will compile an executable with profiling built-in. Then you can run real-world tests with your actual application, to ensure that you are hitting your performance goals.


You're absolutely right. I just haven't gotten a chance to write a good test framework yet. Most of the math code is coming from a past prototype which was already tested throroughly, so I know the output for everything is correct in its current state. However, I DO need to create a new test framework to do it all over again. But I decided to make this thread first to get ideas on how to optimize things before I start changing or rewriting anything. BTW, when I was talking about taking "several days" I wasn't talking about simply testing things; that's pretty quick. I was talking about all the work to use the test result to do micro-optimizations, rewrite things and make significant changes...then test again to make sure the changes actually worked, didn't break anything and actually resulted in a performance net gain.

Explain to me how you think I should do my profiling builds in as much detail as you're willing or have time to go into. It's an area I'm definitely no expert in.

BTW, do you know anything about Intel's MKL and how it might be beneficial to my engine project?

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!