Showing posts with label code. Show all posts
Showing posts with label code. Show all posts

Thursday, March 22, 2012

Fun with bugs

I encountered a fairly interesting bug a few days ago. I will not post the minimal code snippet that reproduces the bug, but rather something similar to the actual code that I had the bug in so that you might try to find the bug for fun (preferably just by looking at the code :). Note however, that the bug I'm going to write about is not really a logic error, but more of a "language issue" (I don't want to be too specific yet because a hint will quickly ruin the fun :).

First, a bit of introduction about what the code does, or is supposed to do. I was implementing an interval tree that would efficiently find the maximal value in an interval of keys. When creating an empty tree (with no values yet inserted), you'd want to initialize all the values to some very small value that I'll call -INF. Basically, the intent of initialization is the set up the M vector that represents the intervals that the tree will be storing directly. So here is the initialization code.

class IntervalTree {
  public:
  IntervalTree(int maxval): n(maxval), M(2*maxval+1) {
    init(0, 0, n);
  }

  private:
  int n;
  std::vector M;
  
  int init(int node, int l, int r) {
    while (node >= (int)M.size()) {
      M.push_back(0);
    }
    if (l == r) {
      return (M[node] = -INF);
    } else {
      int mid = l + (r-l)/2;
      return (M[node] = max(init(2*node + 1, l, mid),
                            init(2*node + 2, mid+1, r)));
    }
  }
};

The constructor is pretty minimal. It stores the rightmost interval boundary (the leftmost one is implicitly zero) and tries to initialize the vector that holds the tree nodes. Now, unfortunately, the maximum index of a node in the tree might be bigger than 2n if the tree is not complete, i.e. the number of keys is not a power of two. An example of this case is n=5 when the maximum index of a node is 12 (the node representing the interval [4,4]).

One way to fix this problem would be to use the first larger power of two, but that can potentially waste space. So what I did here was in the recursive init phase add on additional nodes as necessary. The remaining code is fairly straightforward: if the interval contains only one key then initialize that leaf node (here you might use A[l] if you were initializing the tree with an initial array of values); otherwise, recursively initialize the left and the right halved subintervals and assign this node the maximum of those. Again, in this case it would be OK to just assign M[node] to be -1, but if we were using an initial value array, this is pretty much how you do it.

So, where's the bug? :) This code actually has undefined behavior and is fairly likely to fail spectacularly (with a segfault). If you don't want to try to find the bug, then read on for the answer. Before that, here's a hint - the bug is related to std::vector. Can you spot it now? :)

Here's the explanation. The bug is actually in the line that recursively calls init and assigns the max of the two calls to the tree node. As you might have guessed by now, there are two issues in interplay here. First of all, when you push_back to a std::vector, you will get a reallocation if the previous size of the vector equaled the capacity (i.e., all the preallocated memory was exhausted). When that happens, all pointers and references to the elements of the vector are invalidated, for obvious reasons (meaning you must not use them). This is why you should always be wary of taking pointers (or references) to vector elements, i.e. carefully consider if it is logically possible for the vector to grow which might cause it to get reallocated.

However, one might wonder, where are the pointers or references in this code? The answer is, fairly obviously, vector's operator[]. The thing is, the compiler is free to evaluate M[node] before, between or after both init calls are made, i.e. it is free to evaluate the left side of the assignment before the right. Since init uses push back on the same vector it's assigning to, it might happen that the reference returned by operator[] in M[node] gets invalidated and you assign to it, breaking the program.

The manifestation of this bug was actually quite entertaining. I got a segfault for many executions, but sometimes I got a std::bad_alloc (for minute trees of a dozen or so nodes :). Quite spectacularly, it actually caused my AV to flag the executable as a trojan on one run (though that is, I suspect, not really closely related to this particular bug).

It is fairly easy to fix this problem in this case... just use a temporary to store the max of the recursive calls and then assign it to the appropriate location in the vector. Another solution would be to determine how many nodes there will be in the constructor and avoid having to extend the vector. Both of these solutions are either uglier or more complex than this one, but they are correct while this one is not :). However, I think this bug is generally fairly subtle and I wouldn't be surprised if it "leaked out" (in the abstraction sense) of some libraries that use std::vector internally.

Tuesday, February 7, 2012

Efficiently computing the order of an element in a multiplicative group

A few weeks ago, I stumbled upon an algorithmic problem that I was able to reduce to finding the order of a specific member of the multiplicative group modulo N. In this post I'll talk about three different ways of doing that, from the naive way, the fairly obvious one and finally to the less obvious way which is also the fastest of the tree. I'm writing about this even though I'm not a mathematician because I've been unable to find the proof for the fastest algorithm by searching online, even though it's quite elementary. I'll write about the original problem itself in a separate post because it is interesting in its own right. You can find the code that I used to benchmark these algorithms here, and read the whole story in the rest of this (long) post, if you're interested.

So, the problem we're addressing is the following: given numbers K and N, compute the order of K in the multiplicative group modulo N, i.e. the smallest positive integer e such that K^e = 1 (mod N). This is supposed to work quickly, say in few hundred milliseconds, for N as big as one billion, i.e. 10^9. It's often instructive to think about these boundaries when looking for an algorithm because we want the simplest algorithm that will be fast enough. With N in the billions, we're usually looking for something that's O(sqrt(N)), possibly with a logarithmic factor. This restirction immediately eliminates the naive algorithm, which would just repeatedly multiply by K in the group until the order was found, and is therefore O(N).

To start off, it's instructive to think about some meaningful restrictions on K. Obviously, K must itself be a member of this group to have an order in it, and that means it has to be relatively prime with N (and obviously smaller than N as well). Furthermore, all the members of this group (call it G_N) have this property and there are phi(N) of them, where phi is Euler's totient function. Not to stray too much off topic, I'll just say that phi(N) can easily be computed in O(sqrt(N)) time by decomposing N into prime factors, so from now on we can assume that we know n = phi(N), the cardinality of our group G_N (and we're still on target in terms of complexity). Here is some code that computes n.

int mod_group_order(int N) {
    int d = 2;
    int n = 1; // this will be the order of the group
    while (d <= N/d) {
        if (N%d == 0) { // d is a prime p
            int num = 1; // accumulate p^k
            do {
                N /= d;
                num *= d;
            } while (N%d == 0);
            n *= (num - num/d); // totient(p^k) = p^k-p^(k-1)
        }
        ++d;
    }
    if (N > 1) { // this is a prime, totient(p) = p - 1
        n *= N - 1;
    }
    return n;
}

Having n, we can now use Lagrange's theorem to get a much faster algorithm for computing the order of an element than the naive algorithm mentioned at the beginning. Lagrange's theorem states that the cardinality of any subgroup of a finite group (such as our G_N) must divide the cardinality of the group. Now, the order of an element K is actually the cardinality of a cyclic group generated by K which is obviously a subgroup of G_N. Therfore, the order of K must be a divisor of n. Since n = phi(N) is O(N) and we can find all the divisors of n in O(sqrt(n)) by trial division, we have an algorithm that might hit our target. Also note that any multiple of the order of an element also takes that element to one, i.e. K^n will always be 1.

int order_divisors(int N, int K, int n) {
    int up_order = n; // this works for sure
    for (int d=2; d<=n/d; ++d) {
        if (n%d == 0) {
            if (fastexp_mod(K, d, N) == 1) {
                return d; // found it!
            }
            if (d*d<n && fastexp_mod(K, n/d, N)==1) {
                // can't return here
                up_order = n/d;
            }
        }
    }
    return up_order;
}

The order of K can either be less than or equal to sqrt(n), or it can be greater. In the former case, we can just return the smallest exponent that takes K to one (line 6), but in the latter we can't yet be sure that there are no smaller exponents that will do the same thing so we must go on (line 9). The fastexp_mod function raises its first argument to the second argument power modulo the third argument in logarithmic time. Therefore, the complexity of this algorithm is O(sqrt(n)*log(n)). More precisely, this algorithm is O(sqrt(n) + d(n)*log(n)), where d(n) is the number of divisors of n. Obviously, when the order of an element is very small, this algorithm will be very fast but when it is larger than sqrt(n) it will actually be Theta(sqrt(n) + d(n)*log(n)).

The third algorithm to consider is based on the prime decomposition of n, i.e. the order of the group. Assume that n = p_0^q_0 * p_1^q_1 * ... * p_m^q_m is the prime decomposition of n. Then the order of K is the product of p_i^(q_i - f(K, p_i, n)) for all i, where f(K, p_i, n) is the largest exponent e_i no greater than q_i such that K^(n/(p_i^e_i)) = 1 (mod N). We will prove this is true in the remainder of this post and then see how this algorithm stacks up against the divisor-based one in terms of efficiency. Before that, here is the code that implements this algorithm (see the whole program for comments).

int f(int p, int lim, int num, int n, int K, int N) { 
    while (lim > 0) {
        if (fastexp_mod(K, n/num, N) == 1) {
            return lim;
        }
        --lim;
        num /= p;
    }
    return 0;
}

int order_primes_iter(int N, int K, int n) {
    int sol = 1;
    int t = n;
    int d = 2;
    while (d <= t/d) {
        if (t%d == 0) {
            int cnt = 0;
            int num = 1;
            do {
                num *= d;
                t /= d;
                ++cnt;
            } while (t%d == 0);
            sol *= fastexp(d, cnt-f(d, cnt, num, n, K, N));
        }
        ++d;
    }
    if (t > 1) {
        sol *= fastexp(t, 1 - f(t, 1, t, n, K, N));
    }
    return sol;
}

So, after having introduced e_i as the value returned by the function f, we can rewrite the formula for the order of K as the product of p_i^(q_i - e_i) for all i, and this is actually n divided by the product of p_i^e_i. We need to prove two things: that this number actually takes K to one in the group and that there is no smaller number that does so.

Theorem: If K^(n/a) = 1 (mod N) and K^(n/b) = 1 (mod N) and a and b are relatively prime, then K^(n/ab) = 1 (mod N).

This theorem would allow us to compose what we know about the individual e_i numbers into the formula we're trying to prove. Let u = K^(n/ab). Then both u^a = 1 (mod N) and u^b = 1 (mod N) by assumption. This can only be true if u = 1 or if a and b share a common factor larger than one, i.e. are both multiples of the order of u. By the assumption that a and b are relatively prime, it must be that u = 1 which completes the proof.

We know by the definition of e_i that n/(p_i^e_i) takes K to one for all i. Using the above theorem, we can compose all the p_i^e_i factors and easily prove by induction that the suggested formula indeed takes K to one. It remains to be proven that no smaller number does so. Assume that there is such a number, and call it L. We know that L must be a divisor of our n/(\prod p_i^e_i), which means that at least one of the e_i numbers in it is larger. Choose any one of those, say g_i > e_i. Since every multiple of L also takes K to one, consider n/(p_i^g_i). This is obviously a multiple of L and therefore K^(n/(p_i^g_i)) = 1 (mod N). However, we know by definition of e_i that it must hold that g_i <= e_i and therefore we conclude that n/(\prod p_i^e_i) is indeed the smallest number that takes K to one, i.e. the order of K.

So what is the complexity of this algorithm? It is easy to come up with a loose upper bound but it would be fairly difficult to compare this with the complexity of the divisor-based algorithm since the time required to execute either algorithms depends heavily on the properties of N and the order we're looking for. Instead, I wrote a program to measure the running time cumulatively for the 200k values of N above 99800000 and random K. This algorithm outperformed the divisor-based one by about a factor of 19 on my computer (it ran for about 1s versus about 19s for the divisor-based algorithm). I also tried some optimizations in the implementation of the algorithm, namely special treatment of two (which allows the main loop to skip even numbers) and precomputing a list of primes and both of theese yield about a 10% improvement. Check out the code if you're interested.

I'd be very interested to know if someone can find a nice way to asymptotically quantify this difference between the algorithms. The runtime of the faster algorithm depends on both the number of prime factors in a number and the exponents of those factors. Therefore, I don't expect it is possible to relate that back to n in a tight bound, but there might be meaningful to compare the amortized cost of these algorithms over some distribution of N and K.

Thanks if you read this far, and sorry about not using LaTeX :).

Friday, April 10, 2009

Standard C function for reading arbitrarily long lines from files

This is a function I wrote back when I was learning C (and understanding why standards matter, thanks mostly to the wonderful Usenet group comp.lang.c). I translated the variable names and comments from Croatian to English and thought I'd post it here in case someone might find it useful. What it does basically is read a file one character at a time attempting to allocate more memory as needed. It's written in standards compliant C, and packed in a small test driver program. I'm not including a header file for it since it's only one function, but you can write one easily yourself.

You can compile it with something like

gcc read_line.c -W -Wall -ansi -pedantic

and should get no warnings. If you want to test it, you should probably reduce ALLOC_INCREMENT to something like 4 or so.

Most of the code is self explanatory and has detailed comments. There are a few design decisions I'd like to talk about. First off, the function returns the number of characters read as an int, which is not really in line with the standard library which uses size_t for buffer sizes. The simple reason is that I personally don't like working with unsigned values, except when doing modulo 2^n arithmetic and doing bitmask handling. It's also a nice way to report an error by returning -1 (although you could return (size_t)-1, but that is less nice). Also, in contexts where I use this function, int is more then enough to represent all the possible line sizes.

Second, the final amount of allocated memory might be higher then the number of bytes read. You can solve this easily by creating a wrapper function, similar to this one

int read_line_with_realloc(char **s, FILE *fp) {
char *p = NULL;
int len = read_line(s, fp);
if (len == 0) {
return 0;
} else {
p = realloc(*s, len + 1); /* room for '\0' */
if (p == NULL) {
free(*s);
return -1;
}
*s = p;
return len;
}
}


Finally, there's no way to reuse the memory already allocated in a buffer, that is, it is assumed that the passed buffer is initially empty. This is basically to keep the interface clean, but can also be easily fixed with a wrapper.

You can get the code here.

I've used this function many times, but as always (even with trivial code), there might be some errors in there. If you find any, please let me know.

Tuesday, April 7, 2009

Vim C++ macro-like expansion script

Hi again!

Today I'll release a Vim script I wrote a few weeks ago (and have been planing to write for a long time). What it does is enable fast typing of idiomatic things like for loops, STL container declarations etc. The philosophy behind using a script like this is in line with the general philosophy of Vim, which is to make editing as fast as possible (fewest keystrokes to achieve a result). Also, writing idiomatic code is known to be error prone (ever typed ++i in an inner loop instead of ++j?) and is definitely tedious.

Another reason for writing this script are TopCoder competitions. Unlike most algorithm competitions, at TopCoder speed is a big factor. You only have 75 mintes to solve 3 problems (of increasing difficulty) and your submissions are timed from the moment you open the problem statement. Also, there is no partial credit for problems. After the coding phase, there's a short intermission and then a 15 minute challenge phase where other coders get a chance to view your source code and construct legal inputs that will break it (thus getting additional points for themselves and bringing your point total for that problem to 0). Finally, if your code survives the challenge phase, it has to pass all the automated system tests (some of which are "random", and some of which are specifically constructed to trip up wrong approaches). Some people at TopCoder use macros to achieve pretty much the same effect as this script. The main problem I have with this option is that it makes the code look ugly (sort of obfuscated). This makes it unnecesarilly hard for other people to challenge (although that's probably not the main intention) and, more importantly, it's bad style. Of course, using this script (or the macro method) probably won't increase your TopCoder rating; TopCoder algorithm competitions are primarily about quicky inventing (or applying) the correct algorithm, and the implementing it correctly and with reasonable efficiency. Still, being able to type code faster is always a good thing :).

So let me finally give you a short tour of what exaclty the script does (for the full list, see the comment on top of the file). The current functionality can be seperated into two groups: loop/block expansion and containers and type expansions.

Very often, you'll want to loop through an STL container, or repeat something n times. To do this, you'll write something like this:

for (int i=0; i<(int)my_vec.size(); ++i) {
// per-item code goes here
}

Note that in "real" code you might use size_t or, better yet, vector<int>::size_type. Also notice the int cast on the container size. Comparing a signed to an unsigned value is not entirely safe in C++ (The int gets converted to an unsigned (assuming vector<int>::size_type is size_t and size_t is a typedef for unsigned, as is common on PCs) in what is called the usual arithmetic conversions. This can cause problems if the value of the int is negative, in which case, on a 2s complement machine, becomes a positive number and you get the "wrong answer".). Still, in idiomatic code like this, using a cast is an OK option IMO.

Anyway, to write the above loop with this script, you'd write:

@f i #my_vec

and press <F2> (I'll talk about the keyboard mappings in a sec). Vim will then expand the line to the above code (without the // per-item code goes here comment, of course :) and (if you were in insert mode) position the cursor at the "right" place (one indent level inside the block) so you can continue hacking.

If you want to loop to a value (or variable), and not trough a container, you'd write something like this:

@f i n

and get

for (int i=0; i<n; ++i) {

}

You probably guessed it, but the general syntax is:

@f index_var_name upper_limit

Where the upper limit can be preceded by a # to produce the size of a container.

If you want the loop condition to be less than or equal, prefix the upper limit with an equals sign like:

@f index_var_name =upper_limit


There's also a version that lets you specify a non-zero lower limit

@f index_var_name lower_limit upper_limit


and versions for looping downward

@fd index_var_name [lower_limit] upper_limit

(the square brackets indicate that the lower_limit is optional).

Finally, there are similar expressions for while and do-while loops, as well as for if branches:


@w expression
@d expression
@i expression


Finally, there is also support for for loops with iterators through containers. The syntax for this is (I'll explain what <container_expression> and <expanded_container_expression> are in the next paragraph):

@iter iter_name <container_expression> container_name

which generates

for (<expanded_container_expression>::iterator iter_name=container_name.begin(); iter_name!=container_name.end(); ++iter_name) {

}

There's also a @iterc version that generates a ::const_iterator iteration.

The second part of the functionality is container and type expansions. Basically, for:

@vi

(and <F2>) you get:

vector<int>

This works in all parts of the line (except in double quotes and comments), unlike the loop/block expansions that only work on the start of lines (modulo whitespace characters). I won't list all the supported expansions, but all the vector, set, map and pair that I tend to use frequently are supported. A few examples:

@vvi
@sd
@msi
@pllll

expands to:

vector< vector<int> >
set<double>
map<string, int>
pair<long long, long long>

Keyboard mappings are defined at the end of the script (so you can easily change it, if you like).

nmap <F2> :call BudaCppExpand()<Enter>
imap <F2> <Esc>:call BudaCppExpand()<Enter>A


I have some further ideas that I'm going to add in the short term. Since this post has gotten very long, I won't discuss them here except to say that you can read about them in the comment on top of the script :).

One important thing in Vim is making a habit out of things (so you don't have to think about whether or not you're going to press 'i' or 'a' or 'A' etc.). Since this script is pretty new I still haven't really gotten into the habit of using it as much as I'd like, but I'm trying :).

Without further ado, you can get the script here. To use it, you can put it into the ftplugins Vim directory.

If you find the script useful, have any suggestions about improvements and possible further additions or find a bug, please let me know!


Edit:
While writing this post, I noticed that it might be useful (and in line with the style of the script) to be able to expand things after } else to form if/else blocks properly which wasn't possible with the version I posted. I fixed that now so you should be able to do things like


if (a > b) {
// do something
} else @i a+b+c > 42

and get it expanded into

if (a > b) {
// do something
} else if (a+b+c > 42) {

}