A few weeks ago, I stumbled upon an algorithmic problem that I was able to reduce to finding the order of a specific member of the multiplicative group modulo N
. In this post I'll talk about three different ways of doing that, from the naive way, the fairly obvious one and finally to the less obvious way which is also the fastest of the tree. I'm writing about this even though I'm not a mathematician because I've been unable to find the proof for the fastest algorithm by searching online, even though it's quite elementary. I'll write about the original problem itself in a separate post because it is interesting in its own right. You can find the code that I used to benchmark these algorithms here, and read the whole story in the rest of this (long) post, if you're interested.
So, the problem we're addressing is the following: given numbers K
and N
, compute the order of K
in the multiplicative group modulo N
, i.e. the smallest positive integer e
such that K^e = 1 (mod N)
. This is supposed to work quickly, say in few hundred milliseconds, for N
as big as one billion, i.e. 10^9. It's often instructive to think about these boundaries when looking for an algorithm because we want the simplest algorithm that will be fast enough. With N
in the billions, we're usually looking for something that's O(sqrt(N))
, possibly with a logarithmic factor. This restirction immediately eliminates the naive algorithm, which would just repeatedly multiply by K
in the group until the order was found, and is therefore O(N)
.
To start off, it's instructive to think about some meaningful restrictions on K
. Obviously, K
must itself be a member of this group to have an order in it, and that means it has to be relatively prime with N
(and obviously smaller than N
as well). Furthermore, all the members of this group (call it G_N
) have this property and there are phi(N)
of them, where phi
is Euler's totient function. Not to stray too much off topic, I'll just say that phi(N)
can easily be computed in O(sqrt(N))
time by decomposing N
into prime factors, so from now on we can assume that we know n = phi(N)
, the cardinality of our group G_N
(and we're still on target in terms of complexity). Here is some code that computes n
.
int mod_group_order(int N) {
int d = 2;
int n = 1; // this will be the order of the group
while (d <= N/d) {
if (N%d == 0) { // d is a prime p
int num = 1; // accumulate p^k
do {
N /= d;
num *= d;
} while (N%d == 0);
n *= (num - num/d); // totient(p^k) = p^k-p^(k-1)
}
++d;
}
if (N > 1) { // this is a prime, totient(p) = p - 1
n *= N - 1;
}
return n;
}
Having n
, we can now use Lagrange's theorem to get a much faster algorithm for computing the order of an element than the naive algorithm mentioned at the beginning. Lagrange's theorem states that the cardinality of any subgroup of a finite group (such as our G_N
) must divide the cardinality of the group. Now, the order of an element K
is actually the cardinality of a cyclic group generated by K
which is obviously a subgroup of G_N
. Therfore, the order of K
must be a divisor of n
. Since n = phi(N)
is O(N)
and we can find all the divisors of n
in O(sqrt(n))
by trial division, we have an algorithm that might hit our target. Also note that any multiple of the order of an element also takes that element to one, i.e. K^n
will always be 1
.
int order_divisors(int N, int K, int n) {
int up_order = n; // this works for sure
for (int d=2; d<=n/d; ++d) {
if (n%d == 0) {
if (fastexp_mod(K, d, N) == 1) {
return d; // found it!
}
if (d*d<n && fastexp_mod(K, n/d, N)==1) {
// can't return here
up_order = n/d;
}
}
}
return up_order;
}
The order of K
can either be less than or equal to sqrt(n)
, or it can be greater. In the former case, we can just return the smallest exponent that takes K
to one (line 6), but in the latter we can't yet be sure that there are no smaller exponents that will do the same thing so we must go on (line 9). The fastexp_mod
function raises its first argument to the second argument power modulo the third argument in logarithmic time. Therefore, the complexity of this algorithm is O(sqrt(n)*log(n))
. More precisely, this algorithm is O(sqrt(n) + d(n)*log(n))
, where d(n)
is the number of divisors of n
. Obviously, when the order of an element is very small, this algorithm will be very fast but when it is larger than sqrt(n)
it will actually be Theta(sqrt(n) + d(n)*log(n))
.
The third algorithm to consider is based on the prime decomposition of n
, i.e. the order of the group. Assume that n = p_0^q_0 * p_1^q_1 * ... * p_m^q_m
is the prime decomposition of n
. Then the order of K
is the product of p_i^(q_i - f(K, p_i, n))
for all i
, where f(K, p_i, n)
is the largest exponent e_i
no greater than q_i
such that K^(n/(p_i^e_i)) = 1 (mod N)
. We will prove this is true in the remainder of this post and then see how this algorithm stacks up against the divisor-based one in terms of efficiency. Before that, here is the code that implements this algorithm (see the whole program for comments).
int f(int p, int lim, int num, int n, int K, int N) {
while (lim > 0) {
if (fastexp_mod(K, n/num, N) == 1) {
return lim;
}
--lim;
num /= p;
}
return 0;
}
int order_primes_iter(int N, int K, int n) {
int sol = 1;
int t = n;
int d = 2;
while (d <= t/d) {
if (t%d == 0) {
int cnt = 0;
int num = 1;
do {
num *= d;
t /= d;
++cnt;
} while (t%d == 0);
sol *= fastexp(d, cnt-f(d, cnt, num, n, K, N));
}
++d;
}
if (t > 1) {
sol *= fastexp(t, 1 - f(t, 1, t, n, K, N));
}
return sol;
}
So, after having introduced e_i
as the value returned by the function f
, we can rewrite the formula for the order of K
as the product of p_i^(q_i - e_i)
for all i
, and this is actually n
divided by the product of p_i^e_i
. We need to prove two things: that this number actually takes K
to one in the group and that there is no smaller number that does so.
Theorem: If K^(n/a) = 1 (mod N)
and K^(n/b) = 1 (mod N)
and a
and b
are relatively prime, then K^(n/ab) = 1 (mod N)
.
This theorem would allow us to compose what we know about the individual e_i
numbers into the formula we're trying to prove. Let u = K^(n/ab)
. Then both u^a = 1 (mod N)
and u^b = 1 (mod N)
by assumption. This can only be true if u = 1
or if a
and b
share a common factor larger than one, i.e. are both multiples of the order of u
. By the assumption that a
and b
are relatively prime, it must be that u = 1
which completes the proof.
We know by the definition of e_i
that n/(p_i^e_i)
takes K
to one for all i
. Using the above theorem, we can compose all the p_i^e_i
factors and easily prove by induction that the suggested formula indeed takes K
to one. It remains to be proven that no smaller number does so. Assume that there is such a number, and call it L
. We know that L
must be a divisor of our n/(\prod p_i^e_i)
, which means that at least one of the e_i
numbers in it is larger. Choose any one of those, say g_i > e_i
. Since every multiple of L
also takes K
to one, consider n/(p_i^g_i)
. This is obviously a multiple of L
and therefore K^(n/(p_i^g_i)) = 1 (mod N)
. However, we know by definition of e_i
that it must hold that g_i <= e_i
and therefore we conclude that n/(\prod p_i^e_i)
is indeed the smallest number that takes K
to one, i.e. the order of K
.
So what is the complexity of this algorithm? It is easy to come up with a loose upper bound but it would be fairly difficult to compare this with the complexity of the divisor-based algorithm since the time required to execute either algorithms depends heavily on the properties of N
and the order we're looking for. Instead, I wrote a program to measure the running time cumulatively for the 200k values of N
above 99800000 and random K
. This algorithm outperformed the divisor-based one by about a factor of 19 on my computer (it ran for about 1s versus about 19s for the divisor-based algorithm). I also tried some optimizations in the implementation of the algorithm, namely special treatment of two (which allows the main loop to skip even numbers) and precomputing a list of primes and both of theese yield about a 10% improvement. Check out the code if you're interested.
I'd be very interested to know if someone can find a nice way to asymptotically quantify this difference between the algorithms. The runtime of the faster algorithm depends on both the number of prime factors in a number and the exponents of those factors. Therefore, I don't expect it is possible to relate that back to n
in a tight bound, but there might be meaningful to compare the amortized cost of these algorithms over some distribution of N
and K
.
Thanks if you read this far, and sorry about not using LaTeX :).