maths Library

Started by
14 comments, last by ArnoAtWork 20 years, 2 months ago
What I do is provide multiple binaries for different platforms. Just #ifdef #endif the relevant parts of the code. Do make sure however that you still check for SSE/SSE2 support before you run it. People could very easily download the wrong version ofcourse. If they did download the wron version, gently quit with an error message explaining them where to get the right version.

It''s a bit more work and you will have to hassle with multiple binary versions, but you''ll get the fastest codepath.

Sander Maréchal
[Lone Wolves Game Development][RoboBlast][Articles][GD Emporium][Webdesign][E-mail]

<hr />
Sander Marechal<small>[Lone Wolves][Hearts for GNOME][E-mail][Forum FAQ]</small>

Advertisement
Put the platform specific code in its own functions
Write a generic version of the functions too.
Build your class using templates
Choose the different platform functions depending upon the template specialisation
typedef the specialisations to vector3D in separate platform specific headers
Change the include paths for different builds for different platforms
No need for #ifdefs
Nice and neat
Sorry not my usual detailed reply. Stuff to do

Pete
I''ve written a math library and I''m pretty pleased that everything works correctly. However, I have some similar optimization questions:

SSE
Jan and Sander, it sounds like you guys having working SSE code in your math libs. How much speed improvement have you noticed? In which areas are SSE optimizations most crucial?

Inlining
Which functions did you choose to inline (everything, nothing, something in between)? How much of a speed improvement did you notice?



Thanks,
Matt
Sander: hehe, didn''t consider that, because of the trouble for the user - many people don''t know what SSE is, or at least that you need a PIII or Athlon XP ("what''s that?") to run it. I guess it''s workable with a ''you installed the wrong version'' check, but that''s still a hassle.

I actually don''t think any of these suggestions are worth the trouble, unless you find that your math code is demonstrably too slow, and further, that it would be improved by SSE. mishikel, I don''t SSE-optimize stuff unless it really, really matters (see CLOD terrain engine on my page for one example), and math lib isn''t one of them, IMO. For a few odd matrix ops, SSE doesn''t make a difference at all. If you do enough that it would, I''d write the whole thing in asm, doing register alloc myself. It''s kind of silly to load stuff from memory, do a few SSE ops on it, and write it back out to memory.
That said, if you have lots of fsqrt(), you still win by replacing fsqrt with rsqrtss & mulss, even with parameter passing overhead.
E8 17 00 42 CE DC D2 DC E4 EA C4 40 CA DA C2 D8 CC 40 CA D0 E8 40E0 CA CA 96 5B B0 16 50 D7 D4 02 B2 02 86 E2 CD 21 58 48 79 F2 C3
mishikel:
I can''t tell the speed improvement of inlining since we never NOT inlined the functions we use. That is the entire reason we went for a multi-binary approach. Inlining + SSE(2) = max speed.

The SSE functions take approximately 20%-25% of the time of the normal C functions (when operating on arrays of 4D vectors). SSE2 has still to be profiled correctly. If you use other vectors (like 3D ones) SSE speed improvement is less than that.

Jan:
We are eliminating the binary hassle via an installer/launcher. Our game will be an online multiplayer only game, thus the latest binaries are always available via the internet. At startup, the launcher checks SSE(2) support (or Altivec for Macintosh) and in the installed version is not the optimal one, the user is prompted to download the optimal version. Zero hassling for the user.



Sander Maréchal
[Lone Wolves Game Development][RoboBlast][Articles][GD Emporium][Webdesign][E-mail]

<hr />
Sander Marechal<small>[Lone Wolves][Hearts for GNOME][E-mail][Forum FAQ]</small>

ah, ok, cool.
How do the SSE and FPU versions compare with ''regular'' math lib usage (a few matrices here and there), as opposed to large batches, where SSE is obviously faster?
E8 17 00 42 CE DC D2 DC E4 EA C4 40 CA DA C2 D8 CC 40 CA D0 E8 40E0 CA CA 96 5B B0 16 50 D7 D4 02 B2 02 86 E2 CD 21 58 48 79 F2 C3

This topic is closed to new replies.

Advertisement