Sunday, July 20, 2008

Rutgers Graduate Student Finds New Prime-Generating Formula

Studying prime numbers is like playing the guitar. No, really, let me explain.

The guitar is a simple instrument: six strings, some frets, a sound hole. You strum with the right hand, and form chords with the left. What could be simpler? Any reasonably coordinated person can learn to play a simple song, such as "Heart of Gold", passably in a few hours.

In the same way, the prime numbers have a simple definition: the integers greater than 1 that are divisible only by themselves and 1. Any reasonably intelligent person can learn to test a small number for primality, or understand Euclid's proof that there are infinitely many prime numbers, in a short amount of time.

Yet the guitar is also fiendishly difficult. Those studying classical guitar know well how some pieces take hundreds of hours to master. Techniques such as tremolo might take years, especially if you start learning as an adult.

In the same way, the prime numbers contain within them enough subtlety that many problems remain unsolved after hundreds of years. Goldbach conjectured in 1742 that every even number greater than 2 is the sum of two primes, and today this conjecture is still unsolved. (It is known to hold for every even number less than 1018.) And a proof of the Riemann hypothesis, which would have extremely important consequences for the distribution of primes, will net you a million dollars from the Clay Mathematics Institute -- probably more than you'll get from appearing on American Idol.

For a long time mathematicians have sought a simple formula that would generate all the prime numbers, or even infinitely many distinct prime numbers. Some have even gone so far as to claim that no such formula exists -- a statement of very questionable veracity that depends entirely on one's definition of "formula". If you define formula to mean "polynomial with integer coefficients", then it's not hard (and I leave it as a challenge to the reader) to prove that no such polynomial can generate only primes, other than the trivial example of a constant polynomial. Euler's polynomial x2 + x + 41 comes close: it generates primes for x = 0, 1, 2, ..., 39, but fails at x = 40.

A slight variation, though, leads to a genuine prime-generating polynomial. It is a consequence of the Davis-Matiyasevich-Putnam-Robinson work on Hilbert's 10th problem that there exists a multivariate polynomial with integer coefficients that takes on only negative and prime values when integers are substituted for the variables, and every prime is generated by some choice of the variables. In 1976, Jones, Sato, Wada, and Wiens actually wrote down such a polynomial. It has 26 variables.

Another prime-generating formula comes from a 1947 paper of W. H. Mills. Mills proved that there exists a real number A such that [ A3n ] is a prime number for all integers n ≥ 1. Here [ x ] is the greatest integer function, the greatest integer ≤ x. Unfortunately, nobody knows a good way to calculate A other than testing the numbers the formula is supposed to generate for primality, and then constructing A by working backwards.

So many people have worked on the prime numbers that it seems unlikely that there could be a simple prime-generating function that has been overlooked until now.

Rutgers graduate student Eric Rowland has defied the odds, however, and has found a new one. In a paper just published in a journal I edit, the Journal of Integer Sequences, Rowland defines his formula and proves it generates only 1's and primes. (1 is generally not accepted as a prime number, for a variety of reasons. For one thing, if 1 were a prime, then positive integers would not have a unique factorization into primes.) To be precise, I should say that the unusual property of the formula was originally conjectured by a team led by Matt Frank at a mathematics summer school in 2003 where Rowland was attending, but it was not proved until now.

Here is Rowland's formula. We define a(1) = 7, and for n ≥ 2 we set

a(n) = a(n-1) + gcd(n,a(n-1)).

Here "gcd" means the greatest common divisor. So, for example, we find a(2) = a(1) + gcd(2,7) = 8. The prime generator is then a(n) - a(n-1), the so-called first differences of the original sequence.

For example, here are the first 23 values of the a sequence:
7, 8, 9, 10, 15, 18, 19, 20, 21, 22, 33, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 69

and here are the first differences of these values:
1, 1, 1, 5, 3, 1, 1, 1, 1, 11, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 23

If we ignore the 1's, then, the Rowland formula starts by generating the primes 5, 3, 11, 3 (again), and 23. The reader can easily program up the formula and find lots more primes. Removing duplicates, the first few are
5, 3, 11, 23, 47, 101, 7, 13, 233, 467, 941, 1889, 3779, 7559, 15131, 53, 30323, ...

Why does it work? The proof is too involved to give here, but it is not that difficult. The interested reader can go to Rowland's paper for the details.

Rowland has been involved with mathematics for some time. He attended UC Santa Cruz and graduated with highest honors in math in 2003. Since then he has been a graduate student at Rutgers University, studying with Doron Zeilberger. Rowland describes himself as an "experimental mathematician", and uses the computer algebra system Mathematica for his experiments. Rowland tells me that he tried to prove the formula from time to time over a four-year period, but once the crucial insight was found, "I had an outline of the proof within a few days and all the details within a few weeks."

Are there any other formulas like Rowland's? Apparently yes. Benoit Cloitre, a French mathematician, recently proved that if you set b(1) = 1 and

b(n) = b(n-1) + lcm(n,b(n-1)) for n ≥ 2,

then b(n)/b(n-1)-1 is either 1 or prime for all n ≥ 2.

Will Rowland's formula lead to more efficient ways to generate large primes? If so, cryptographers would love it. But it seems unlikely. As Rowland explains in his paper, his formula only produces the prime p after first generating (p-3)/2 1's, so it takes a really long time to generate a large prime. He has a method for skipping over those useless 1's, but doing so essentially requires an independent test for primality.

Are there still unsolved properties of Rowland's prime generator? Yes. For example, is there anything special about the choice a(1) = 7? Other choices, such as a(1) = 8, always generate primes and 1's, but others, such as a(1) = 532, do not. (This choice generates 9 after less than 20 steps.) However, Rowland conjectures that for each starting value of a(1), there exists a point after which the first differences are always either 1 or prime. Rowland also doesn't know if his formula eventually generates all odd primes, although he believes it probably does.

Rowland has a number of other projects in the works. He told me, "I'm working on several things, mostly trying to finish up a backlog of papers. But one newer project is putting bounds on the frequency of 1's in the Kolakoski word. Another is something I'm not ready to fully divulge, but it has to do with values of the p-adic logarithm. A longer term project of mine is extending what is known about the arithmetic of Pascal's triangle modulo m and, generally, additive cellular automata."

What problem would Rowland most like to solve? "I'd really like to solve the 3n+1 problem, because I think it would tell us something very interesting about representations of integers. Dividing by 2 in base 2 just means dropping the last 0, and mapping n -> 3n+1 in base 3 just means appending 1. The problem is that we don't know how to get these two bases to talk to each other -- and of course perhaps there isn't a way -- but a solution to the 3n+1 problem might show us how to do this."

Solving the 3n+1 problem would indeed be a great achievement. In the meantime, however, he can take pleasure in his prime formula. Blending simplicity and mystery, Eric Rowland's formula is a delightful composition in the music of the primes, one everyone can enjoy.

Update, July 31 2008: Rowland has his own post describing his discovery.

26 comments:

Unknown said...

I wrote a quick Haskell program to generate this function:

import List

a 1 = 7
a n = (a n1) + (gcd n (a n1)) where n1 = n - 1

diffs [] = []
diffs (x:[]) = []
diffs (x:y:xs) = (y-x) : diffs (y:xs)

primes = filter (>1) $ diffs $ map a [1..]

Maybe someone can optimize this....

Anonymous said...

Does it make sense not to be an "experimental mathematician" nowadays?

If so, why?

Anonymous said...

there are properties of interest that depend on something being infinite in a way that can't be approximated with finite objects or is too complex to be computed within reasonable time.

if there is a way to test ones hypotheses it can make sense to 'experiment'.
but often something works for 'generic' objects but not for all.
for example almost all square matrices are invertible, so if you test this hypothesis by generating square matrices with uniformly random doubles in [-1,1] you will always (very high probability) get an invertible matrix. so the experiment would not help much. as we know the answer already it's obvious how to describe an experiment that shows that some square matrices are not invertible, but if your problem is at the frontier of mathematical knowledge it can be less obvious.

for some questions about integer sequences experimenting is very useful, for the poincaré conjecture not so much.

The Professor said...

I've been playing around with the problem in C (for performance) and my program can calculate the differences of the first 10.000.000 numbers in the sequence in about 6.6 seconds on my amd64(2GHz) - best and worst run:

gcc -O3 -Wall -o primes primes.c && time ./primes >> /dev/null

real 0m6.321s
user 0m6.268s
sys 0m0.048s

real 0m6.698s
user 0m6.628s
sys 0m0.056s

out of curiosity, I modified shawns haskell program (and assuming I didn't break something in the meantime - this is the first time I've seen haskell) this is it's benchmarks - again, best and worst:

ghc -O -o shawn primes.hs && time ./shawn >> /dev/null

real 0m6.169s
user 0m6.096s
sys 0m0.008s

real 0m6.394s
user 0m6.352s
sys 0m0.032s

BUT the haskell-version only generates the first 25 numbers in the sequence, as opposed to the C-versions 10.000.000 (might use less memory, though)

Here is the haskell I used:
primes.hs

and here is the C-version:
primes.c

Anyone up for optimizing a bit more? (I suggest looking into a more efficient gcd, perhaps with caching)

The Professor said...

By the way - I've also noticed that every new prime p that is discovered that is higher than a(1), seem to follow:
p = a(p)-a(p-1)

Dunno if that was an obvious result, but if true, it would mean that the function cannot possibly return all primes - at least not for a single value of a(1)

Anonymous said...

Freakcers, having the cache and the recursive method seemed superfluous, as one only needs two numbers in the sequence to create the next one. Also, when doing so, one does not get limited by the memory with this.

I changed the program not to have the aforementioned things, and to skip all 1's, and also to put the two values to register.

Although it's not so much faster than your program, it doesn't need a lot of memory.

If someone wants to play with it, grab it here.

Ryan Graham said...

freakcers, slinky, http://gist.github.com/1263

Seems like the perfect scenario for testing out the shiny new gist.

Anonymous said...

I made a version which uses the GNU MP multiple precision arithmetic library, just in case someone wants to run the program until "hell freezes over" and see what kind of prime pops out...

You need libgmp.
You can grab the updated program here.

Have fun!

Anonymous said...

Ok, this is not music everyone can enjoy. I'm hopelessly lost. But I'm trying, if that counts for anything!

The Professor said...

Hah, I guess I should have replied again sooner...

The optimisation slinky mentions is actually in my current implementation (though I have to admit, was introduced by a friend of mine, when I mentioned this problem to him)

I actually also considered gmp to allow for really large numbers, but taking into account the amount of time it takes to calculate the sequence just up to 100.000.000 (well below any number that would require gmp) I figured it would just slow down the whole process for any reasonable application...

What really needs improvement, is the gcd-algorithm - I've been reading up on "accelerated integer gcd", and more specifically "sequential accelerated integer gcd" from the paper "Improvements on the accelerated integer GCD algorithm" as it seam either that, or the parallel version from same paper is more or less the best implementation around.

I guess if anyone's got a cluster lying around, the parallel version along with gmp might come in handy, but until that, I recommend avoiding gmp, sticking to at most unsigned long long int in C, and just optimizing gcd, which is clearly the bottleneck at this point...

Anonymous said...

Not sure about this... I'm new to this stuf... but would the gcd function be faster as an implementation of Euclid's algorithm?

Anonymous said...

unsigned int gcd(unsigned int u, unsigned int v) {
int t = 0;
while (v != 0)
{
t = v;
v = u % v;
u = t;
}
return t;
}

Seems to be slightly faster though not significantly.

$ gcc -O3 -Wall -o primes primes.c && time ./primes >> /dev/null

real 0m13.672s
user 0m13.561s
sys 0m0.030s

$ gcc -O3 -Wall -o primes_euclid primes_euclid.c && time ./primes_euclid >> /de
v/null

real 0m12.224s
user 0m12.186s
sys 0m0.062s

Erdos56 said...

Hmmm, parallelizing is the next tempting possibility...

If you read the paper, an interesting point is that Stephen Wolfram was actually leading the summer school and we see the convergence with the notion of "experimental mathematics".

So can anyone implement this using CAs?

The Professor said...

n00b, it depends on your platform, really... in theory, the binary one I'm using should be faster, but on some platforms Euclids algorithm outperforms it marginally (looks like yours is one such case)

But as I mentioned a couple of posts up, Sequential (or Parallel) Accelerated Integer GCD is (probably) what we should be using - though I don't know of any implementation of it

I've not been able to find even an implementation of the "normal" Accelerated Integer GCD, though if anyone cares to read about it, here's the paper: The accelerated integer GCD algorithm

Ryan Graham said...

According to some info on the GMP site, (http://gmplib.org/devel/), there's a very fast GCD implementation planned for a future release. Should just be a drop-in replacement for the one in the current release, so maybe someone should give it a try. According to that page, it is significantly faster than the current one.

The Professor said...

Good catch Ryan, the algorithm they've implemented is actually *exactly* the "Accelerated Integer GCD Algorithm" I was talking about - I knew Kenneth Weber had contributed some code to gmp, but from the paper it seemed it was only part of the algorithm - iirc, a bmod implementation... (I guess I was wrong)

I've been doing some number-crunching, and along with the realization that the accelerated algorithm doesn't really offer a speed-up before there's a 5-bit difference in the size of n and a(n-1), I think gmp will definitely be needed well before the new algorithm saves us time - I'm not actually too sure it will ever happen - on average this seems to hold: a(n) ~= 2.5*n, any clever arguments as to why this would or would not ever cause a 5 (or more) bit difference in sizes?

Pseudonym said...

Shawn:

rowlandSequence = filter (/= 1) $ zipWith (-) (tail as) as
where
as = as' 7 2
as' an1 n = an1 : as' (an1 + gcd n an1) (n+1)

The two recursive calls are the main source of inefficiency in your code.

Anonymous said...

I agree with Pseudonym, it can be calculated without even doing any recursion in c. The gcd function isn't a big problem, the euclidean is fast enough for me. But the main code would look like:

int main(void) {
unsigned int a,b,p,n;
a = 7;
for(n=2;n!=10000000;n++) {
b = a + gcd(n,a);
printf("%i %i\n",n-1,b-a);
a = b;
}
}

which makes it twice as fast then the old one.

Anonymous said...

Congrats to Eric on the paper!

Here is the formula in Mathematica, which Eric uses:

a[1]=7;

a[n_] := a[n-1] + GCD[n, a[n-1]]

If you want to remember (memoize the values) to speed it up:

a[n_] := a[n] = a[n-1] + GCD[n,a[n-1]]

In[4]:= Array[a,23]
Out[4]= {7,8,9,10,15,18,19,20,21,22,33,36,
37,38,39,40,41,42,43,44,45,46,69}

In[5]:= Differences[%]
Out[5]= {1,1,1,5,3,1,1,1,1,11,3,1,1,1,1,1,
1,1,1,1,1,23}

In[6]:= DeleteCases[%,1]
Out[6]= {5,3,11,3,23}

In[7]:= PrimeQ/@%
Out[7]= {True,True,True,True,True}

Here's are the primes generated:

In[8]:= First/@Tally[DeleteCases[
Differences[Array[a,100000]],1]]

Out[8]= {5,3,11,23,47,101,7,13,233,467,941,
1889,3779,7559,15131,53,
30323,60647}

and in sorted order:

In[9]:= Union[DeleteCases[Differences[
Array[a,100000]],1]]

Out[9]= {3,5,7,11,13,23,47,53,101,233,467,
941,1889,3779,7559,15131,30323,
60647}

The formula can be seen as a "simple program" that generates an output known to be in some sense complicated, a point made very generally in Stephen Wolfram's 2002 book "A New Kind of Science".

Richard P.

Anonymous said...

Through a unique summation sieving algorithm it is
possible to sequentially produce all odd primes
up too any stopping point of (n) without using
any previous primes.
Likewise it is also possible to start with a large
prime and continue with this sequential list of
large primes up too any stopping point for (n)
without using any previous primes in the process
of generating these primes.

Here it is in Python code for all odd primes
up too (n)= 20000.
I am not sure if it will indent properly in
this post because in preview it shows no indents?


n= -2;n1= -4;n2= -1;n4=4;n5=1;b1=0;j=0
a=[0]*20001
data=[9,3,2,1,3,2]
for i in range(len(data)):
a1=data[i]
a1=a1+b1;a[a1]=a1;b1=a1
a1=b1+n;a[a1]=a1;b1=a1
n=n+n1
for x in range(20000):
i=n4
while i>=2:
a1=a1+i
if a1<=20000: a[a1]=a1
i=i-1
a1=a1+n
if a1<=20000: a[a1]=a1
n4=n4+n5;n=n-n4
for ii in range(10000):
i=2*ii+1
if a[i]==0 and i>1: print i,;j=j+1
print 'Total odd primes =',j
---------------------------------------------------
Prime print output = 3,5,7,11,..19997 which is the last
prime less than (n) where n=20000 and gives a total number of odd primes = 2261.


So odd primes do have a pattern, complex as their pattern
may be through summations and negations starting with (9).
where this summation and negation never produce an odd prime.


It is very slow but shows this complex pattern of the odd primes
after the first summation of these 6 integers where no odd primes
are produced at each point of the summation.


+9+3+2+1+3+2


Then continuing with the summations and negations below
where also no odd primes are produced when this pattern
emerges.


-2
+4+3+2
-6
+5+4+3+2
-11
+6+5+4+3+2
-17
+7+6+5+4+3+2
-24
+8+7+6+5+4+3+2
-32
+9+8+7+6+5+4+3+2
-41
+10+..
etc.
The pattern for the negations is just the difference
of +4 and -2 = 6 ,so the next negation is just -6
The next negation is the difference of -6 and +5 = 11,
so the next negation is -11. and so on.
The other groups of summations have an obvious pattern
which increments by one from the previous group.


All the odd integers produced from these
summations and negations are composite and all the
odd integers (not) produced are primes except (1).


The problem is, many composites repeat one or more
times which adds to the processing time.


Dan

Anonymous said...

Look up in google groups sci-math
on july 24-25 for --
"A prime Generator that needs no
previous primes" will give the
correct indent for the Python
program I posted here.

Dan

Anonymous said...

This prime chain of 3237 terms, 1671 of which are distinct, was found by means of a prime-producing
generator closely akin to Eric Rowland's formula for an infinite
chain: a(1) = 7, n>1, a(n) = a(n-1) + gcd(n,a(n-1) .

It could be argued that neither one is a true prime chain since
there is a vast sea
of 1's in-between each prime, and that mine is not really even
a 'formula' since it
does not achieve an infinite consecutive list, but I see these
objections as
being more semantic than substantial in nature. I also believe that
the entire set of
primes occurring in the sequence represented by x^2 - x + 1 will
eventually appear in the
formula below,(about 1/2 of all primes) but this is yet to be
proved, and it is unclear due
to equipment limitations if the extreme density of primes at the
beginning of the sequence
continues indefinitely or breaks down at some point.

Let f(x) := x^2 - x + 41, x = 1, k = 2, a = 3 .
Repeat indefinitely a two-step process:
x := x + 1,
If GCD( x^2 - x + 41, (x-1)^2 - (x - 1) + 41 - a* k ) >1,
then k := k + 1;

The first 20 values of the sequence that do not equal '1' are:
47, 227, 71, 359, 113, 563, 173, 839, 251,1187,347, 1607,
461,2099,593, 2663, 743,
3299, 911,4007

Many other values for the 'a' variable above, perhaps infinitely
many, (but not all)
may be substituted in the formula and produce long initial prime
chains.
for a = 3 : 3237 consecutive primes,
a = 2 : 736
a = 4 : 817
a = 5 : 858
a = 6 : 161
a = 7 : 1159
a = 8 : 221
a = 9 : 284
a = 10 : 1276 etc.

Many other equations, perhaps infinitely many, (but not all) may be
substituted
for x^2 - x + 1 in the given formula, or a more complicated
development of that formula,
with good results. Example: f(x):= 5*x^2 + 5*x + 1, x = 1,
k = 2, a = 10 : 553 consecutive primes, 276 distinct primes.

Aldrich Stevens

Anonymous said...

Just found this site and am trying to convert the code into "J".

In "J" (not java) you would write the following for GCD:

x +. y

So as an example:

x: 147573952589676412927 +. 761838257287

will return

761838257287

Buy the way:

1 - 2^67 is 147573952589676412927

and factors to the following numbers of:

193707721 761838257287

Anonymous said...

You said Benoit Cloitre proved his prime-generating formula. Did Benoit Cloitre proved his prime-generating formula? Eric Rowland said he did not in this presentation on the last page: http://thales.math.uqam.ca/~rowland/talks/Formulas_for_primes.pdf

Jeffrey Shallit said...

Why don't you just ask him, instead of me?

Unknown said...

Define the following sequence (given using Haskell syntax).

r 2 = 1
r n = (n-1 - (n-2)*(signum ((gcd x (n-1)) - 1))) * x
where x = r (n-1)

Then (r (n+1)) / (r n) is always 1 or a prime for n >= 2. Furthermore the sequence contains all the prime numbers in increasing order. Is this related in some way?