• Announcements

    • khawk

      Download the Game Design and Indie Game Marketing Freebook   07/19/17

      GameDev.net and CRC Press have teamed up to bring a free ebook of content curated from top titles published by CRC Press. The freebook, Practices of Game Design & Indie Game Marketing, includes chapters from The Art of Game Design: A Book of Lenses, A Practical Guide to Indie Game Marketing, and An Architectural Approach to Level Design. The GameDev.net FreeBook is relevant to game designers, developers, and those interested in learning more about the challenges in game development. We know game development can be a tough discipline and business, so we picked several chapters from CRC Press titles that we thought would be of interest to you, the GameDev.net audience, in your journey to design, develop, and market your next game. The free ebook is available through CRC Press by clicking here. The Curated Books The Art of Game Design: A Book of Lenses, Second Edition, by Jesse Schell Presents 100+ sets of questions, or different lenses, for viewing a game’s design, encompassing diverse fields such as psychology, architecture, music, film, software engineering, theme park design, mathematics, anthropology, and more. Written by one of the world's top game designers, this book describes the deepest and most fundamental principles of game design, demonstrating how tactics used in board, card, and athletic games also work in video games. It provides practical instruction on creating world-class games that will be played again and again. View it here. A Practical Guide to Indie Game Marketing, by Joel Dreskin Marketing is an essential but too frequently overlooked or minimized component of the release plan for indie games. A Practical Guide to Indie Game Marketing provides you with the tools needed to build visibility and sell your indie games. With special focus on those developers with small budgets and limited staff and resources, this book is packed with tangible recommendations and techniques that you can put to use immediately. As a seasoned professional of the indie game arena, author Joel Dreskin gives you insight into practical, real-world experiences of marketing numerous successful games and also provides stories of the failures. View it here. An Architectural Approach to Level Design This is one of the first books to integrate architectural and spatial design theory with the field of level design. The book presents architectural techniques and theories for level designers to use in their own work. It connects architecture and level design in different ways that address the practical elements of how designers construct space and the experiential elements of how and why humans interact with this space. Throughout the text, readers learn skills for spatial layout, evoking emotion through gamespaces, and creating better levels through architectural theory. View it here. Learn more and download the ebook by clicking here. Did you know? GameDev.net and CRC Press also recently teamed up to bring GDNet+ Members up to a 20% discount on all CRC Press books. Learn more about this and other benefits here.
Sign in to follow this  
Followers 0
Rattrap

Karatsuba algorithm

12 posts in this topic

I've been implementing my own big integer class in C++.  I've been trying to implement the Karatsuba algorithm as an alternate method of doing multiplication once the bit length reaches a certain threshold.  My current implementation seems to get the first 3 or 4 elements correct, but after that the values are incorrect.

 

I've tried reproducing the algorithm using C# and .Net's BigInteger, and I've been getting the same results. So, I believe that my actual math functions are correct (Multiply, Add, Subtract, LeftShift), I'm guessing I'm doing something incorrectly in the algorithm itself.  I'm hoping some other sets of eyes might see something I missed.

 

My big integer class stores binary values in vectors of unsigned integers, the lowest order bits at the beginning of the array.  All values are stored in the positive, along with a signed bit.  Since the signs don't actually affect multiplication, the function is not concerned about them.

 

[source lang="c++"]

typedef std::vector<std::uint32_t> storage_t

 

inline static void AlternateMultiplication(const storage_t& Lhs, const storage_t& Rhs, storage_t& Product)

{

/*Get the size of each input*/

const auto LhsSize(Lhs.size());

const auto RhsSize(Rhs.size());

/*If one of the inputs is a single element long, use regular multiplication*/

if((LhsSize <= 1) || (RhsSize <= 1))

{

/* Multiplication ignoring the signs.  First two parameters are inputs by const reference, 3rd parameter is the output by reference */

AbsoluteMultiplication(Lhs, Rhs, Product);

return;

}

 

/*Find the length of the longest input*/

const auto LongestSize(std::max(LhsSize, RhsSize));

/*Find half of the longest input. The +1 helps it round up*/

const auto HalfLongestSize((LongestSize + 1) / 2);

/*Hold over from messing with the algorithm at the moment*/

const auto HalfLongestSizeSafe(HalfLongestSize);

 

/*Vector's to copy the split values*/

storage_t LhsHigh(HalfLongestSizeSafe);

storage_t LhsLow(HalfLongestSizeSafe);

storage_t RhsHigh(HalfLongestSizeSafe);

storage_t RhsLow(HalfLongestSizeSafe);

 

/*Iterators marking where the splits will occur*/

const auto LhsBegin(std::begin(Lhs));

const auto LhsEnd(std::end(Lhs));

const auto LhsMid(LhsSize >= HalfLongestSize ? LhsBegin + HalfLongestSize : LhsEnd);

const auto RhsBegin(std::begin(Rhs));

const auto RhsEnd(std::end(Rhs));

const auto RhsMid(RhsSize >= HalfLongestSize ? RhsBegin + HalfLongestSize : RhsEnd);

 

/*Split the inputs*/

storage_t LhsLow(LhsBegin, LhsMid);

Truncate(LhsLow);

storage_t LhsHigh(LhsMid, LhsEnd);

Truncate(LhsHigh);

storage_t RhsLow(RhsBegin, RhsMid);

Truncate(RhsLow);

storage_t RhsHigh(RhsMid, RhsEnd);

Truncate(RhsHigh);

 

storage_t Z0;

AlternateMultiplication(LhsLow, RhsLow, Z0);

 

storage_t Z2;

AlternateMultiplication(LhsHigh, RhsHigh, Z2);

 

storage_t Z1;

AbsoluteAddition(LhsHigh, LhsLow, LhsHigh);

AbsoluteAddition(RhsHigh, RhsLow, RhsHigh);

AlternateMultiplication(LhsHigh, RhsHigh, Z1);

 

/*Just to be safe, tracking the sign of Z1, since subtraction is involved*/

bool Z1Sign(true);

Subtraction(Z1Sign, Z1, true, Z2, Z1Sign, Z1);

Subtraction(Z1Sign, Z1, true, Z0, Z1Sign, Z1);

/*Left shift doesn't affect the sign. Convert elements length of number of bits to shift*/

AbsoluteLeftShift(Z1, (HalfLongestSize * (sizeof(T) * CHAR_BIT)), Z1);

 

bool Z2Sign(true);

AbsoluteLeftShift(Z2, (LongestSize * (sizeof(T) * CHAR_BIT)), Z2);

Addition(Z2Sign, Z2, Z1Sign, Z1, Z2Sign, Z2);

bool ProductSign(true);

Addition(Z2Sign, Z2, true, Z0, ProductSign, Product);

}

[/source]

 

In the pseudocode on Wikipedia, the example is in base 10, where my code is in binary.  If I had to guess, the issue is in the left shifting or that I have to keep breaking down a single element down into a single bytes or smaller, since this is base 2 math.

 

[source lang="cpp"]

storage_t Input = {0x2fad3fd6,0xb14e9f81,0x00000006};

storage_t Output;

 

AlternateMultiplication(Input, Input, Output);

[/source]

 

Output - 0xeb2706e4,0x1d2f3c5b,0x7627584b,0xca7d4ae8,0x00000008

Correct - 0xeb2706e4,0x1d2f3c5b,0x7627584b,0xca7d4ac4,0x0000002c

 

As you can see, the first 3 elements are correct, as well as all but the highest order byte in the 4th.  This is similar if the inputs get bigger.  If the inputs are smaller, the answer comes out correct.

 

I'm trying to avoid posting the entire big integer class, since it is massive.  Any insight or suggestions would be helpful.

0

Share this post


Link to post
Share on other sites
Hey,

could it be that you override memory? You could try
changing func(A, B, A) to func(A, B, C) and see what happens then.

A note on bignums: you mentioned that you use base 2. I think the typical
choice is base 2^n, where n is the number of bits in an int-value.
This simplifies things as you can add two digits using ordinary + operator
etc.
1

Share this post


Link to post
Share on other sites


A note on bignums: you mentioned that you use base 2. I think the typical
choice is base 2^n, where n is the number of bits in an int-value.
This simplifies things as you can add two digits using ordinary + operator
etc.

 

That is how it works.  Sorry, I meant base 2 as in I wasn't storing the number in base 10 form (like I've seen some big int libraries do), but as binary number that use the entire element.  The actual class is template based allowing any unsigned base integer to be used as a vector element.

 


could it be that you override memory? You could try

changing func(A, B, A) to func(A, B, C) and see what happens then.

 

The places where It is overriding old values is where those values aren't used again (which is why I do Z0 and Z2 first), but I'll give it a shot. 

0

Share this post


Link to post
Share on other sites

There was a link at the bottom of the Wikipedia page for a c++ example that worked very similar to mine.  After rigging up some debugging code, I found that the issue involved the splitting and the shifting.  I added 1 before dividing the size by 2 in order to make the lower half bigger.  I assumed this would be the place to split it and trailing zeros in the upper half would "balance" it out.  Based on the other code, it looks like you want the higher order to be the larger of the two if it doesn't split perfectly.  This then caused me to use to large of a shift for Z2, since it should have been shifting by twice the "half".  I assumed that twice the half would always be the longest length, which isn't true when the number is odd.

1

Share this post


Link to post
Share on other sites

Thanks.  This has been one of those pet projects of mine that has been written and rewritten several times (bigint not Karatsuba).  I've been playing with implementing different cryptographic functions and needed it to get RSA to work.

 

Based on Wikipedia, it was saying in the 300-600 bit range should be where it beats the long multiplication method.  My default multiuplication uses the shift and add method instead of long multiplication, which tends to be faster than long multiplication as well.  So my base comparison is a little skewed to begin with.

 

I've been comparing runtimes between my Karatsuba and my shift and add based multiplication and currently Karatsuba is way slower using QueryPerformanceCounter as the metric.  Doing a test of 50 multiplies for each algorithm, even at the 25000 bit range (I was really looking for the point Karatsuba would overtake the other), it was still twice as slow.  The shift and add version only needs to two temporary vectors. My current Karatsuba implementation currently has to allocate 4 temporary vectors for each level of recursion, so I think the memory allocations are killing it.  The two vectors in shift and add grow so there are multiple allocations there too, but no where near as bad.  Plus the needed space can be pre-allocated.  I'm going to try allocating a large pool before starting the recursion and passing iterators through to see if that will help speed it up.

2

Share this post


Link to post
Share on other sites


I should note though that if you are doing this specifically for RSA, the main bottleneck in RSA is not multiplication, but modular multiplication/exponentiation. There are faster ways to do this, such as Montgomery reduction, and furthermore decryption can be made faster by using number-theoretic algorithms (but those are probably beyond the scope of what you're trying to achieve). Anyway I thought I'd mention it, many bigint libraries do provide very optimized algorithms for "a^b mod m" type calculations, but a Karatsuba multiplication algorithm is always useful for the general case.

 

RSA was just one of the motivations I had for trying to implement a big int library.  I've just recently been researching faster multiplication methods on big integers.  As far as modular exponentiation, I'm currently using the method described on the Wikipedia entry using the Right-to-left binary method optimized with my big int class.

 

This has been a real iterative process.  I've been trying to just get things working, then work on trying to get things faster.  My goal has been to try and keep it C++11 compliant, not being dependent on platform optimizations.  I was just ecstatic when my Karatsuba implementation worked, so I felt like comparing my two implementations to see how they compared at the moment.  I was hoping to build a crossover point into the multiplication function, but until I further optimize the Karatsuba method, it can't surpass it.

0

Share this post


Link to post
Share on other sites

Always cool to hear of other people messing with this stuff. It sounds like we have fairly similar goals.

The three main bigint classes I have are:

One for fixed-size (not dynamically allocated) integers. The goal of this one if to behave as close to the real thing as possible. In fact to that end, it did end up getting used in a commercial compiler.

One for variable-sized (dynamically allocated) integers.

One for string-based integers.

My Karatsuba step is in the second one. See uvarbigint.cpp, in uvarbigint::Multiply_Unsigned. It doesn't do brilliantly in terms of reducing allocations itself, but hopefully it shows something of value. I'd be interested to hear if you'd like to work together on combining our efforts.

0

Share this post


Link to post
Share on other sites

Going to have to really stare at your code a while, since it is structured very different than mine.  For example yours is using raw pointers and a lot of for loops.

 

I had original started writing an unsigned and signed version of the code, but once subtraction came into play with the unsigned big int, I just dropped it.  I didn't think it was really feasible once I tried to define 0 - 1.  It is well defined in C++ since the size of the native type doesn't grow and modulo wrap around comes into play.  I couldn't wrap my head around what the result would be, since it had no maximum value to use as a modulo.

 

Just an example, here is my signed addition code (minus a couple functions).  I included the shared functions that are called to perform the twos compliment on the fly.

[source lang="cpp"]

// T is an unsigned integer and a template parameter for the class.

typedef std::vector<T> storage_t;

 

  /*

   * Helper function for returning an elements twos compliment value

   * Requires a persistant boolean value to track the carry/overflow

   * from element to element.

   */

  inline static const T GetSignedValue(bool& Carry, bool Sign, T Value)

  {

   /* Sign test */

   if(Sign == false)

   {

    /* Perform twos compliment to get negative value */

    const T OnesCompliment(~Value);

 

    if(Carry == true)

    {

     /* Handle twos compliment +1/addition carry */

     const T TwosCompliment(OnesCompliment + 1);

 

     /*

      * If the two's compliment value is zero, the Carry will always

      * be false throughout the rest of the elements.

      */

     Carry = TwosCompliment == 0;

     return TwosCompliment;

    }

    return OnesCompliment;

   }

   return Value;

  }

  /*

   * Perform an operation on a set of input values resulting in the output

   * Handles conversion of inputs and result to and from twos compliment

   */

  template <typename BinaryOperation>

  inline static const T SignedBinaryOperation(bool& LhsCarry, bool LhsSign, T LhsValue, bool& RhsCarry, bool RhsSign, T RhsValue, bool& ResultCarry, bool ResultSign, BinaryOperation binary_op)

  {

   /* Perform twos compliment on input, if needed */

   const T SignedLhsValue(GetSignedValue(LhsCarry, LhsSign, LhsValue));

   const T SignedRhsValue(GetSignedValue(RhsCarry, RhsSign, RhsValue));

 

   /* Perform operation */

   const T SignedResultValue(binary_op(SignedLhsValue, SignedRhsValue));

 

   /* Perform twos compliment on result, if needed */

   const T ResultValue(GetSignedValue(ResultCarry, ResultSign, SignedResultValue));

   return ResultValue;

  }

 

  inline static const T ElementAddition(T LhsValue, T RhsValue, bool &AdditionCarry)

  {

   const T ResultValue(LhsValue + RhsValue);

   if(AdditionCarry == true)

   {

    /* Add the carry */

    const T ResultWithCarry(ResultValue + 1);

    /*

     * Test for overflow

     * Because of the carry, we have to test for wrap around and equality

     * Worst case:

     *  FF + FF = FE

     *  FF + FF + 1 = FF

     */

    AdditionCarry = ResultWithCarry <= LhsValue;

    return ResultWithCarry;

   }

   else

   {

    /*

     * Test for overflow

     * FF + FF = FE

     */

    AdditionCarry = ResultValue < LhsValue;

    return ResultValue;

   }

  }

 

  /* Addition - Signed */

  inline static void Addition(bool LhsSign, const storage_t& Lhs, bool RhsSign, const storage_t& Rhs, bool& ResultSign, storage_t& Result)

  {

   /*

    * Determine Sign

    * This must be done first, since Lhs or Rhs could change, if &Lhs or &Rhs == &Result

    */

   if(LhsSign == RhsSign)

   {

    /* Signs Match */

    ResultSign = LhsSign;

 

    /*

     * Since the signs match, we can use AbsoluteAddition

     * |Lhs| + |Rhs| = |-Lhs| + |-Rhs|

     */

    AbsoluteAddition(Lhs, Rhs, Result);

    return;

   }

 

   /* Compare absolute value of Lhs and Rhs, to determine which is bigger */

   const int AbsoluteCompareResults(AbsoluteCompare(Lhs, Rhs));

   switch(AbsoluteCompareResults)

   {

   case -1:

    {

     /* |Lhs| < |Rhs| */

     ResultSign = RhsSign;

     break;

    }

 

   case 0:

    {

     /*

      * |Lhs| = |Rhs|

      * Lhs and Rhs Signs do not match, so values will cancel out

      * -Lhs + Rhs = Lhs + -Rhs = 0

      */

     ResultSign = true;

     Result.clear();

     return;

    }

 

   case 1:

    {

     /* |Lhs| > |Rhs| */

     ResultSign = LhsSign;

     break;

    }

 

   default:

    {

     throw std::exception();

    }

   }

 

   /*

    * Get the original sizes of the inputs

    * The resizing of Result could change the input sizes if the smaller of

    * the inputs is also the output.

    */

   const auto LhsSize(Lhs.size());

   const auto RhsSize(Rhs.size());

 

   /* Marking which of the inputs is greater */

   const bool LhsSizeGreaterThanRhsSize(LhsSize > RhsSize);

 

   /* Resize the output to match the largest input */

   Result.resize(LhsSizeGreaterThanRhsSize == true ? LhsSize : RhsSize);

 

   /* Create aliases to the shortest of the inputs */

   const auto& Shortest(LhsSizeGreaterThanRhsSize == true ? Rhs : Lhs);

   const auto& ShortestSign(LhsSizeGreaterThanRhsSize == true ? RhsSign : LhsSign);

 

   /*

    * Because of the possibiilty that Shortest == Result, it is neccessary to

    * track of the original size, so the original end element can be located.

    */

   const auto& ShortestSize(LhsSizeGreaterThanRhsSize  == true? RhsSize : LhsSize);

 

   /* Create an alias to the largest of the inputs */

   const auto& Longest(LhsSizeGreaterThanRhsSize  == true? Lhs : Rhs);

   const auto& LongestSign(LhsSizeGreaterThanRhsSize  == true? LhsSign : RhsSign);

 

   /*

    * Assists with twos compliment calculations

    * Addes the initial +1 as part of the twos compliment conversion

    */

   bool ShortestCarry(true);

   bool LongestCarry(true);

   bool ResultCarry(true);

   /* Tracks addition carry between elements */

   bool AdditionCarry(false);

 

   /* Perform addition operation on the inputs, until the end of the shortest range is encounter */

   auto ResultIterator(std::transform(std::begin(Shortest), (std::begin(Shortest) + ShortestSize), std::begin(Longest), std::begin(Result), [&ShortestCarry, &LongestCarry, &ResultCarry, &AdditionCarry, ShortestSign, LongestSign, ResultSign](T ShortestValue, T LongestValue) -> T

   {

    const T ResultValue(SignedBinaryOperation(ShortestCarry, ShortestSign, ShortestValue, LongestCarry, LongestSign, LongestValue, ResultCarry, ResultSign, [&AdditionCarry](T LhsValue, T RhsValue) -> T

    {

     const T ResultValue(ElementAddition(LhsValue, RhsValue, AdditionCarry));

     return ResultValue;

    }));

    return ResultValue;

   }));

 

   /* Perform addition operation on the longest input and treat shortest as 0 or negative equivelent */

   ResultIterator = std::transform(std::begin(Longest) + ShortestSize, std::end(Longest), ResultIterator, [&ShortestCarry, &LongestCarry, &ResultCarry, &AdditionCarry, ShortestSign, LongestSign, ResultSign](T LongestValue) -> T

   {

    const T ResultValue(SignedBinaryOperation(ShortestCarry, ShortestSign, 0, LongestCarry, LongestSign, LongestValue, ResultCarry, ResultSign, [&AdditionCarry](T LhsValue, T RhsValue) -> T

    {

     const T ResultValue(ElementAddition(LhsValue, RhsValue, AdditionCarry));

     return ResultValue;

    }));

    return ResultValue;

   });

   assert(ResultIterator == std::end(Result));

 

   /* Handle remaining carry, if one exists */

   if(true == AdditionCarry)

   {

    const T ResultValue(SignedBinaryOperation(ShortestCarry, ShortestSign, 0, LongestCarry, LongestSign, 0, ResultCarry, ResultSign, [&AdditionCarry](T LhsValue, T RhsValue) -> const T

    {

     const T ResultValue(ElementAddition(LhsValue, RhsValue, AdditionCarry));

     return ResultValue;

    }));

 

    /* Append the result, if one exists */

    if(ResultValue != 0)

    {

     Result.push_back(ResultValue);

    }

   }

 

   /* Remove any trailing zeros */

   Truncate(Result);

  }

[/source]

0

Share this post


Link to post
Share on other sites

Yeah I really should change mine to just use a vector internally. I had been trying to be clever and save space.

Modulo wrap around is very easy for fixed-sized values. It just means you only keep the n lowest bits of the result.

But certainly for a variable-sized bigint, unsigned doesn't really make sense.

 

Your code seems to be very high level and well commented. Mine is more designed for high performance and less need to understand how it works hence less commenting.

In fact your code reminds me of this: http://ericlippert.com/2013/09/16/math-from-scratch-part-one/

0

Share this post


Link to post
Share on other sites

Mine is more designed for high performance and less need to understand how it works hence less commenting.

I think for the most part, mine should still have good performance. At this point, I put a lot of trust in the compiler to optimize things, like the intermediate variables, lambdas, and the algorithm based loops. They do make debugging a little easier. Plus there have been several iterations of this code, I figure the more I document it, they more likely I'll remember what the heck I was thinking at the time, and be less inclined to start over :).
0

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0