Assembly vs. C++ vs. C# finding primes

Started by
41 comments, last by raccoonone 17 years, 11 months ago
I was interested in seeing what the performance different between C++, C# and ASM with a C++ wrapper would be. So far the results have surprised me since C# is the fastest, this must be because my C++/ASM isn't good enough so maybe you guys can point out where I'm going wrong. The program I'm testing with is quite simple all it does is: a) get a number from the user b) calculate all the primes between 1 and the number c) output the number of primes found and the time taken to find them. All 3 programs use the same algorithm: if x is 1, 2, or 3 return true if x devided by 2 has a remainder of zero return false otherwise devide by all odd numbers starting with 3 until the square root of x is reached or the remainder is zero (indicating a composite number) I compiled all of the programs with full optimizations and ran them on my computer (AthlonXP 2800+ with 1GB of RAM). The C++ version and my ASM version both took 19.5-20secs to find all the primes between 1 and 10,000,000. The C# version took 20-20.5secs when I was using ints, but after I changed it to use doubles to reduce the ammount of type conversions, it does it in 16-17secs! When I looked at the disassembly for the C# version I found that it's using almost all floating point processer instructions, but then I calls some functions that I couldn't step into (well I'm sure I could if I spent some time figuring out how). Here's a short part of the disassembly. I haven't used floating point instructions much and I don't really understand what it's doing in the beginning there. It looks to me like it's just moving data through the ALU by pushing and then poping it somewhere else.

                if (TestNumber % x == 0)
0000008c  fld         qword ptr [ebp+8] 
0000008f  sub         esp,8 
00000092  fstp        qword ptr [esp] 
00000095  fld         qword ptr [esp+10h] 
00000099  sub         esp,8 
0000009c  fstp        qword ptr [esp] 
0000009f  call        79235954 
000000a4  fldz             
000000a6  fcomip      st,st(1) 
000000a8  fstp        st(0) 
000000aa  jp          000000B9 
000000ac  jne         000000B9 

Here's the source of my C# version

using System;
using System.Collections.Generic;
using System.Text;

namespace Csharp_Test_console_application
{
    class Program
    {
        static void Main(string[] args)
        {
            double x;
            int PrimesFound = 0;
            DateTime start, end;
            Console.WriteLine("Find primes up to 'x'. Please enter x:");
            x = Convert.ToDouble(Console.ReadLine());
            start = System.DateTime.Now;
            for (double y = 1; y <= x; y++)
            {
                if (IsPrime(y))
                {
                    PrimesFound++;
                }
            }
            end = System.DateTime.Now;
            Console.WriteLine("Found {0} primes in {1}", PrimesFound, end - start);
            Console.Read();
        }

        static bool IsPrime(double TestNumber)
        {
            double limit = System.Math.Sqrt(TestNumber) + 1;
            if (TestNumber == 2)
                return true;
            if(TestNumber % 2 == 0) //Check if TestNumber is even
                return false;
            double x = 3;
            //Devide until the square root is reached
            for (; x < limit; x += 2)
            {
                if (TestNumber % x == 0)
                    return false;
            }
            return true;
        }
    }
}

Source of my C++ version

// Cpp primes.cpp : Defines the entry point for the console application.
//

#include "stdafx.h"
using std::cin;
using std::cout;
bool IsPrime(int TestNumber);

int _tmain(int argc, _TCHAR* argv[])
{
	int x, PrimesFound = 0;
    DWORD start, end;
    cout << "Find primes up to 'x'. Please enter x:";
    cin >> x;
    start = GetTickCount();
    for (int y = 1; y <= x; y++)
    {
        if (IsPrime(y))
        {
            PrimesFound++;
        }
    }
    end = GetTickCount();
    cout << "Found " << PrimesFound << " primes in " << end - start;
    while(!_kbhit());
	return 0;
}

_inline bool IsPrime(int TestNumber)
{
    int limit = (int)sqrt((double)TestNumber) + 1;
    if (TestNumber == 2)
        return true;
    if(TestNumber % 2 == 0) //Check if TestNumber is even
        return false;
    int x = 3;
    //Devide until the square root is reached
    for (; x < limit; x += 2)
    {
        if (TestNumber % x == 0)
            return false;
    }
    return true;
}

Assembly with C++ for the input and output

// asm primes.cpp : Defines the entry point for the console application.
//

#include "stdafx.h"

using std::cout;
using std::cin;

int _tmain(int argc, _TCHAR* argv[])
{
	int x, PrimesFound = 0;
    DWORD start, end;
    cout << "Find primes up to 'x'. Please enter x:";
    cin >> x;
    start = GetTickCount();
	int limit, temp;
	_asm 
	{
		pusha					//Save registers
		mov	esi, 1				//Start checking numbers from 1
		mov	edi, x				//Loop until X is reached
Forloop: cmp	esi, edi
		ja		Done			//Exit if X has been reached
	
		//Check if this number is one, two, or three
		cmp		esi, 3
		ja		Next
		inc		PrimesFound
		jmp		CheckDone
		//Check if divisible by two
Next:	test	esi, 1						//If ESI is odd it's not divisible by 2
		jnz		Next2						// so continuing checking for primality
		jmp		CheckDone

Next2:	mov		temp, esi
		fild	temp						//Convert
		fsqrt								// to square
		fistp	limit						//  root

		mov		ebx, 3						//Set EBX to 3

		mov		ecx, limit					//Set up the loop counter
		shr		ecx, 1						//  devide by 2 since we'll be incrementing by 2

Check:	mov		eax, esi					//Restore test number
		xor		edx, edx					// and clear remainder

		div		ebx							//Divide and
		or		edx, edx					// test for
		jz		CheckDone					//  zero

		inc		ebx							//increment divisor
		inc		ebx							// ""
		loop	Check						//Continue until the square root is reached

		//Number is prime
		inc		PrimesFound
CheckDone: inc	esi							//Increment ESI so the next number will be checked
		jmp	Forloop							// for primality

Done: popa									//Restore registers
	}
    end = GetTickCount();
    cout << "Found " << PrimesFound << " primes in " << end - start;
    while(!_kbhit());
	return 0;
}

stdafx.h for C++ versions

// stdafx.h : include file for standard system include files,
// or project specific include files that are used frequently, but
// are changed infrequently
//

#pragma once


#define WIN32_LEAN_AND_MEAN		// Exclude rarely-used stuff from Windows headers
#include <stdio.h>
#include <tchar.h>
#include <math.h>
#include <windows.h>
#include <conio.h>
#include <iostream>



// TODO: reference additional headers your program requires here


Advertisement
None of your methods are particularily optimized at all. From number theory, you can show that you do not need to test against every number below the sqrt(n), but just the primes below the sqrt(n) to see if n is prime. As such, I would recommend changing your algorithm to remember each prime it finds.
Quote:All 3 programs use the same algorithm:
if x is 1, 2, or 3 return true

1 is not prime.
A simple sieve i wrote a bit back for a similar post.
#include <iostream>#include <deque>#include <cmath>void FindPrimes(std::deque<std::size_t>& primes, std::size_t upperBound) {	std::size_t current = 15;	while(current < upperBound) {		current += 2;		bool isPrime = true;		std::size_t mod6 = current % 6;		//Every prime number p > 3 has the form 6q + 1 or 6q + 5.		if(!(mod6 == 1 || mod6 == 5)) {			continue;		}		std::size_t maxVal = static_cast<std::size_t>(std::ceil(std::sqrt(static_cast<double>(current))));		for(int i = 0; primes <= maxVal; ++i) {			if(current % primes == 0) {				isPrime = false;				break;			}		}		if(isPrime) {			primes.push_back(current);		}	}}int main() {	std::size_t knownPrimes[6] = {2, 3, 5, 7, 11, 13}; //seed the list.	std::deque<std::size_t> primes(&knownPrimes[0], &knownPrimes[6]);	std::size_t upperBound;	std::cout<<"Please enter the upper bound: ";	std::cin>> upperBound;	FindPrimes(primes, upperBound);	std::copy(primes.begin(), primes.end(), std::ostream_iterator<std::size_t>(std::cout, "\n"));}

In time the project grows, the ignorance of its devs it shows, with many a convoluted function, it plunges into deep compunction, the price of failure is high, Washu's mirth is nigh.

Ya, I realize that only the primes below sqrt(n) need to be checked. But to make the assembly version simpler I decided not to keep track of the primes that have been found. I'm more interested in a comparison of the languages than making it as fast as possible. Since the C# version uses the same algorithm I don't understand how it can be faster.

Oh ya, I forgot 1 isn't prime.
Quote:Original post by Procyon Lotor
Ya, I realize that only the primes below sqrt(n) need to be checked. But to make the assembly version simpler I decided not to keep track of the primes that have been found. I'm more interested in a comparison of the languages than making it as fast as possible. Since the C# version uses the same algorithm I don't understand how it can be faster.

Oh ya, I forgot 1 isn't prime.


The c# code will be compiled and optimised on the fly. The actual code that will be executed will likely not look like yours.

Langugage comparisons like this ( i.e. a very narrow comparision ) are meaningless, they don't accurately represent how the languages will compete in a complex system.
0) 1 is not normally counted as a prime. I will not count it below.

1) Speed improvements are more likely to be found by improving the algorithm. In your case you can get rid of all the FP operations by doing it the other way around: gradually increment a limit value. You could also try making it only use primes that have been found so far. I will implement that below, but no guarantees that it will actually help (while you test against fewer primes, you add the overhead of tracking which primes to test against).

I will just show a function to test for primes up to a limit; I/O and timing should be kept separate and naturally aren't intended to be part of what is benchmarked.

#include <vector>using std::vector;int primesUpTo(int max) {  if (max < 2) return 0;  // Find primes < max (not <=)  vector<int> primes;  int num_primes_in_use = 1;  primes.push_back(2);  int limit = 4;  for (int testvalue = 3; testvalue < max; testvalue += 2) {    if (testvalue > limit) {      // Ran out of numbers that can be tested with the current set; add another      int last_prime_in_use = primes[num_primes_in_use++];      limit = last_prime_in_use * last_prime_in_use;    }    bool isPrime = true;    for (int testprime = 0; testprime < num_primes_in_use; ++testprime) {      if (!(testvalue % primes[testprime])) { isPrime = false; break; }    }    if (isPrime) { primes.push_back(testvalue); }  }  return primes.length();}
Of course, if an exact answer isn't needed...
#include <iostream>#include <cmath>int main() {    std::size_t upper_bound;    std::cout<<"Enter the upper bound: ";    std::cin>>upperBound;    std::cout<<"There are approximately "<<upperBound / std::log(static_cast<double>(upperBound))<<" primes below "<<upperBound<<std::endl;}

In time the project grows, the ignorance of its devs it shows, with many a convoluted function, it plunges into deep compunction, the price of failure is high, Washu's mirth is nigh.

Sorry I should have been more clear. I wasn't looking for help on a better algorithm. I was wondering why the C# version is faster than the other two, and if there was some specific C++/ASM optimizations that I missed (maybe floating point instructions are faster than the integer instructions?)
Quote:Original post by Procyon Lotor
Ya, I realize that only the primes below sqrt(n) need to be checked. But to make the assembly version simpler I decided not to keep track of the primes that have been found. I'm more interested in a comparison of the languages than making it as fast as possible. Since the C# version uses the same algorithm I don't understand how it can be faster.


Please, don't waste your time. You can't compare the "speed" of languages.
They all compile to ASM, so it depends entirely on how much the compiler can optimize the code *in your specific case*.
For any other code, you might get different results. Or when a new version of the compiler is released. It just doesn't make sense to test.

But I can easily tell you why your ASM version is slow. Because you're not a compiler. This goes back to my first point. The compiler can optimize. It's pretty hard for a human to keep track of the complexities of a modern CPU. A compiler can do it easily. That is why it pays to use high level languages. Low level is not "fast". it just means it's easier to screw up and slow things down. And that is why it doesn't make sense to talk about the "speed" of a language.

So basically, use whatever language you're most comfortable with, and use the best algorithm you can find. And don't bother searching for the "fastest" language. Please. [wink]

Oh yeah, and if you really want to compare, you should at least use the same algorithm in all three, shouldn't you? So either all versions should use floats/doubles, or none of them should.
Try printing out the primes to a file and see if the C# output is equivalent to the C++ and assembly output. It seems that in this case you may be safe, but using the modulus operator with doubles is a recipe for disaster, as is testing whether doubles are exact values (e.g. == 2).
h20, member of WFG 0 A.D.
Quote:Original post by Procyon Lotor
I was interested in seeing what the performance different between C++, C# and ASM with a C++ wrapper would be. So far the results have surprised me since C# is the fastest, this must be because my C++/ASM isn't good enough so maybe you guys can point out where I'm going wrong.
...
When I looked at the disassembly for the C# version I found that it's using almost all floating point processer instructions, but then I calls some functions that I couldn't step into (well I'm sure I could if I spent some time figuring out how).


Maybe C# generates SSE instruction for floating point computations ? Could explain why its faster than your asm version.

BTW if you are comparing the speed of compiled programs, you should also mention the flags you used for compilation and the compiler you used (i.e. gcc has an option -mfpmath=sse besides -O3).

This topic is closed to new replies.

Advertisement