My implementation of Project Euler 10 takes almost 25 minutes to complete. It’s a very naive attempt, and there is a great deal of optimization that needs to happen – particularly in the function I use to determine if a number is prime.
I decided to ignore an elegant optimization and see if I could decrease the execution time by solving the problem in several concurrent threads.
Project Euler 10 asks you to find the sum of all prime numbers that are under 2,000,000. It’s easy to see how to parallelize this – calculate (and add together) the primes from 0 to 1,000,000 in on thread, and the primes from 1,000,001 to 2,000,000 in another thread. Add the results from the two threads together and you have the solution. I decided to do it in four threads:
Thread 1: add primes from 0 to 500,000
Thread 2: add primes from 500,001 to 1,000,000
Thread 3: add primes from 1,000,001 to 1,500,000
Thread 4: add primes from 1,500,001 to 2,000,000
I’ve never written any threaded code before, so I hit up Google and found this page. I’m surprised (but quite relieved!) that it’s actually very easy to create and synchronize threads.
Each thread is summing up primes under it’s respective range, and when the thread is finished it adds the calculated total to a variable that all threads can see. I use a mutex to prevent the threads from potentially trying to write to the variable all at the same time, but in my application this really isn’t a risk since each thread takes a considerably different time to complete.
I don’t feel like fighting the code formatting in WordPress, so here’s a Gist of my solution: https://gist.github.com/anonymous/5130987
It’s pretty straightforward and follows much of the conventions in the tutorial I found, so I won’t do a line-by-line breakdown here (unless someone asks).
The end result? Execution time is just over 8 minutes (which is how long it takes for the 4th thread to finish – the other three finish well before then).
Two quick notes. The GCC command I needed to use was:
$gcc -pthread pthreadPE10.c -o pthreadPE10
In the tutorial (and elsewhere online) you see “-lpthread”, but that didn’t work for me – “-pthread” did. This SO question addresses this (see the response by “Employed Russian”).
Also, I wrote this program on my web server, which is a Linux server instance hosted on Digital Ocean. I wasn’t seeing any improvement in execution and after some pondering I realized that the package I’m using is a single-core server, so multi-threading doesn’t have any speed benefit there. Funny though – I’m glad I didn’t get frustrated and decide I didn’t know what I was doing! (EDIT: I should note that the 25 minutes to 8 minutes benchmark was achieved on my 8-core i7 laptop.)
dude, do you realize that if a number is not divisible by two, you don’t need to test if is divisible by 4,6,8,10….? that’ll cut your time in half
Yes, checking if it’s a multiple of two is a good first step towards a non-naive implementation of a prime test. The next is only testing divisors up to the square root of the number you’re checking.