Sign in to follow this  
donny dont

Math library development (Advice Needed)

Recommended Posts

So I decided I wanted to clean up my math library and add support for SSE intrinsics. I've thought of two designs to achieve this, one relies on function pointers and the other relies on partial templates. Before I get too deeply down one path I wanted to see if anyone had any advice as to what route might be a better choice. Since there are a number of different mathematical structures useful for 3D math that have very similar operations, adding a 2 3D vectors together can be done in the same fashion as adding 2 quaternions together, a base class n_tuple is used. The mathematical constructs then derive from this base class. So using function pointers the base class looks like this...
template <typename Real, std::size_t N>
class n_tuple
{
public:
	/*--------- Additional Methods cut for brevity ---------*/

	/**
	 * Addition of two n_tuples.
	 *
	 * \param rhs The rhs of the addition.
	 * \returns The sum of the two n_tuples.
	 */
	inline n_tuple operator + (const n_tuple& rhs) const
	{
		// Create a temporary copy
		n_tuple copy(*this);

		add(copy, rhs);

		return copy;
	}

	/**
	 * Addition assignment of two n_tuples.
	 *
	 * \param rhs The rhs of the addition.
	 * \returns The sum of two n_tuples.
	 */
	inline n_tuple& operator += (const n_tuple& rhs)
	{
		add(*this, rhs);

		return *this;
	}

	/**
	 * Provides initialization of SIMD intrinsics.
	 *
	 * After run-time determination of the capabilities of the CPU
	 * applicable SIMD instructions can be used during computation.
	 *
	 * \param type The SIMD type to use.
	 */
	static void initialize(simd_type type = NONE);

protected:
	/** Function pointer to the addition function */
	static boost::function<void (n_tuple&, const n_tuple&)> add;

	/**
	 * Addition using STL libraries.
	 *
	 * \param lhs Left hand side of the addition.
	 * \param rhs Right hand side of the addition.
	 */
	static void addition(n_tuple& lhs, const n_tuple& rhs);

	/**
	 * Addition using SSE intrinsics.
	 *
	 * \param lhs Left hand side of the addition.
	 * \param rhs Right hand side of the addition.
	 */
	static void addition_sse(n_tuple& lhs, const n_tuple& rhs);	
} ;

Then to create the SSE static method template specialization is used.
// Initialize static member variable
boost::function<void (n_tuple<type, size>&, const n_tuple<type, size>&)>
	n_tuple<type, size>::add = &n_tuple<type, size>::addition;

template <> 
void n_tuple<float, 3>::addition_sse(n_tuple& lhs, const n_tuple& rhs)
{
	_mm_store_ps(lhs.values, _mm_add_ps(_mm_load_ps(lhs.values), _mm_load_ps(rhs.values))); 
}

The plus side of this seems to be the ability to change the SSE type during runtime. Though I'm not sure how much of a plus that actually is. Also there's got to be some amount of performance hit with using function pointers. So using partial templates it would look something like this
namespace meta
{
	struct NO_SSE { };
	struct SSE_1  { };
	struct SSE_2  { };
}

template <typename Real, std::size_t N, typename SSE = meta::NO_SSE>
class n_tuple
{
public:

	/*--------- Additional Methods cut for brevity ---------*/

	/**
	 * Addition of two n_tuples.
	 *
	 * \param rhs The rhs of the addition.
	 * \returns The sum of the two n_tuples.
	 */
	inline n_tuple operator + (const n_tuple& rhs) const
	{
		// Create a temporary copy
		n_tuple copy(*this);

		for (unsigned int i = 0; i < N; i++)
			copy.values[i] += rhs.values[i];

		return copy;
	}

	/**
	 * Addition assignment of two n_tuples.
	 *
	 * \param rhs The rhs of the addition.
	 * \returns The sum of two n_tuples.
	 */
	inline n_tuple& operator += (const n_tuple& rhs)
	{
		for (unsigned int i = 0; i < N; i++)
			values[i] += rhs.values[i];

		return *this;
	}	
} ;

And the partial specialization for floats...
template <std::size_t N>
class n_tuple<float, N, meta::SSE_1>
{
public:
	/*--------- Additional Methods cut for brevity ---------*/

	/**
	 * Addition of two n_tuples.
	 *
	 * \param rhs The rhs of the addition.
	 * \returns The sum of the two n_tuples.
	 */
	inline n_tuple operator + (const n_tuple& rhs) const
	{
		// Create a temporary copy
		n_tuple copy(*this);

		_mm_store_ps(copy.values, _mm_add_ps(_mm_load_ps(values, _mm_load_ps(rhs.values)));

		return copy;
	}

	/**
	 * Addition assignment of two n_tuples.
	 *
	 * \param rhs The rhs of the addition.
	 * \returns The sum of two n_tuples.
	 */
	inline n_tuple& operator += (const n_tuple& rhs)
	{
		_mm_store_ps(values, _mm_add_ps(_mm_load_ps(values, _mm_load_ps(rhs.values)));

		return *this;
	}
} ;

The plus side of this looks like it will be speed gained from not using function pointers. The potential downside is the amount of code duplication required to get partial templates working, aka the interface from the template isn't copied by the partial. Any advice would be appreciated. Thanks in advance.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this