Finding that balance between optimization and legibility.
There are obvious benefits to what you say, this I'm not denying. And it is a strong argument that this particular aspect of my code should be. But I refuse to set out designing code for the sake of honoring opacity and best practices. This process should manifest itself as the situation arises.
I wrote a flight simulator in C++ using quaternions that worked, but was inherently non-intuitive, and my experience with C++ and boiler plate drudgery code was awful. I then wrote a rigid body simulator shortly afterwards in C, after intuitively deriving Rodrigues rotation on my own (I would come to find out much later that's what it was called), extending it to orthonormal axes, and the experience was amazing. (It would turn out my dependency on matrices was minimal outside of the graphics pipeline, and in the few instances that I did require it, I reconstructed them from the axes). I understood each and every aspect of my code inside and out, and my write-then-refactored-as-needed approach allowed for the design and coding structure to manifest itself naturally.
Not to mention, it allowed me to self discover problems/solutions such as LCP /PGS and before even knowing they had a name. I look back at both the physics engine now, and it is still as legible, understandable, easy to jump right in as when I first wrote it. I've lost the code for my flight simulator and I don't even bat an an eye.
After crunching some numbers, I calculated that my cache hit was approximately 70%, so I'm thinking about keeping this example in my code for now (if not for the performance gain, then at the very least as a reminder).
Given your usage pattern, and considering the size of the CPU's data cache on today's processors, this may work out well in your use pattern. If you are frequently computing the trig functions on a small set of common angles, and you are doing it frequently enough that they stay local to the CPU instead of getting evicted from the cache, it will absolutely help.
But as others point out, keep measuring and monitoring.
It looks like for now you are only using it where it makes sense. But add some debug blocks to your code to notify you when the situation changes.
If you reach the point where your comparisons are no longer in the cache, or if you are multithreaded and your writes need to be slowly propagated to all the processors, then you'll need to revisit your design. If you start doing more operations on a wider variety of values, or values that are not quite identical, those could also trip your routine up.
http://number-none.com/blow/john_carmack_on_inlined_code.html
I wonder what would Michael Abrash say...
Qouting carmack: "In almost all cases, code duplication is a greater evil than whatever second order problems arise from functions being called in different circumstances, so I would rarely advocate duplicating code to avoid a function"
There should be a a balance. I'm not dismissing convention here, but I'm not dismissing the opposite either. I'm saying don't assume that a hard line approach to following said conventions yields the greatest benefits. Sometimes it will work, sometimes it won't.Qouting carmack: "In almost all cases, code duplication is a greater evil than whatever second order problems arise from functions being called in different circumstances, so I would rarely advocate duplicating code to avoid a function"http://number-none.com/blow/john_carmack_on_inlined_code.html
I wonder what would Michael Abrash say...
I have the luxury and flexibility to do what I want and try to challenge conventional thinking. In so doing I've discovered pros and cons for going against the norm as well, and i will take the route of least resistance (wether it is a conventional approach, or something to the contrary).
For me personally (and in my own subjectivity), I've found working within the confines of the C++ and OOP in general to be quite problematic.
I have the luxury and flexibility to do what I want and try to challenge conventional thinking. In so doing I've discovered pros and cons for going against the norm as well, and i will take the route of least resistance (wether it is a conventional approach, or something to the contrary).
Nothing wrong with that. Just be aware that there are often very good reasons for conventions. They frequently encode a lot of lessons learned the hard way.
By all means, challenge them, but do so understanding why they are conventions in the first place.
For me personally (and in my own subjectivity), I've found working within the confines of the C++ and OOP in general to be quite problematic.
And on the flip side to what I just said... no-one is forcing you to write C++ in an OOP manner.
If you're writing something and you think it should be a free function/struct/other "non OO" construct, write it as such.
It is one of the strengths of the language that it doesn't force you into OOP, unlike Java or C#. (I love C#, but I find it stupid that I have to create a static class to write a simple function).
For me personally (and in my own subjectivity), I've found working within the confines of the C++ and OOP in general to be quite problematic.
You're not transcending the boundaries of C++ and OOP. You're just achieving your goal of caching these numbers with the smallest, easiest edit possible. There's no balance here. There's no actual concern for legibility that I can see.
You're modifying your code in the same way practically every modification is made to so-called OOP C++ programs out there - in whatever way lets the programmer move on to the next task and keep collecting paychecks. Please do not act like this is something special. You are not challenging conventional thinking. You are demonstrating it.
There is no possible way that code speeds up the calculation of sin and cos values for vectors and it introduces problem with reentrancy (it's not thread safe).
When optimizing this sort of code there are three things you must do to achieve state-of-the-art performance.
1) Ensure you are using the greatest known mathematical reduction of the algorithm
2) Eliminate all branches (even if it means more calculations)
3) Use vectorizing operations, e.g. SIMD, NEON, AVX, et. al.
Optimizing the code for vectoring operations can be very annoying.
Algorithms tend to favor separate arrays for each element/dimension as opposed to interleaved arrays which are more conveniently to deal with.
This cuts down on loading and packing time of the MD registers and that can be critical to utilizing all available computation units.
Doing the above and eliminating any IEEE-754 or C-standard overhead (e.g. if the rounding rules of the unit is different than the standards then it has to perform a conversion when storing) is how you make it fast.
The old fsincos instruction got it done in about 137 clock cycles; SSE2 and newer should have faster or more vectorized options.
If you can sacrifice accuracy then you can use an estimation of the sin and cos values and those algorithms are generally just multiplies and accumulates and you can get it done in a lot less than 100 clock cycles.
2) Eliminate all branches (even if it means more calculations)
Except when the calculations actually outweight the cost of a mispredicted branch... right? I'm not sure on the details, but shouldn't this misprediction cost be something like ~100 cycles on modern desktop CPUs? So if you can skip calculations that take significantly longer than that, a branch is the better choice.
Also on desktop, branches that are easy to predict have very little cost. Like something that checks for an memory allocation error that when thrown will terminate the program, the branch will always be false anyways so the only cost should be the branching operation itself. Thats different on consoles where there is no branch prediction (I don't think the current generation added it, did they?), but I didn't program on consoles myself so far so I can't say much about it.
Except when the calculations actually outweight the cost of a mispredicted branch... right? I'm not sure on the details, but shouldn't this misprediction cost be something like ~100 cycles on modern desktop CPUs? So if you can skip calculations that take significantly longer than that, a branch is the better choice.
Also on desktop, branches that are easy to predict have very little cost. Like something that checks for an memory allocation error that when thrown will terminate the program, the branch will always be false anyways so the only cost should be the branching operation itself. Thats different on consoles where there is no branch prediction (I don't think the current generation added it, did they?), but I didn't program on consoles myself so far so I can't say much about it.
They did add branch prediction.
Thats different on consoles where there is no branch prediction (I don't think the current generation added it, did they?)
The most recent consoles from MS and Sony are both x64 based and come with all the features in a modern CPU; AVX, SSE, branch prediction, out-of-order execution and the like.
The are basically (low powered) PCs in a box.