Thursday, March 22, 2012

Fun with bugs

I encountered a fairly interesting bug a few days ago. I will not post the minimal code snippet that reproduces the bug, but rather something similar to the actual code that I had the bug in so that you might try to find the bug for fun (preferably just by looking at the code :). Note however, that the bug I'm going to write about is not really a logic error, but more of a "language issue" (I don't want to be too specific yet because a hint will quickly ruin the fun :).

First, a bit of introduction about what the code does, or is supposed to do. I was implementing an interval tree that would efficiently find the maximal value in an interval of keys. When creating an empty tree (with no values yet inserted), you'd want to initialize all the values to some very small value that I'll call -INF. Basically, the intent of initialization is the set up the M vector that represents the intervals that the tree will be storing directly. So here is the initialization code.

class IntervalTree {
  public:
  IntervalTree(int maxval): n(maxval), M(2*maxval+1) {
    init(0, 0, n);
  }

  private:
  int n;
  std::vector M;
  
  int init(int node, int l, int r) {
    while (node >= (int)M.size()) {
      M.push_back(0);
    }
    if (l == r) {
      return (M[node] = -INF);
    } else {
      int mid = l + (r-l)/2;
      return (M[node] = max(init(2*node + 1, l, mid),
                            init(2*node + 2, mid+1, r)));
    }
  }
};

The constructor is pretty minimal. It stores the rightmost interval boundary (the leftmost one is implicitly zero) and tries to initialize the vector that holds the tree nodes. Now, unfortunately, the maximum index of a node in the tree might be bigger than 2n if the tree is not complete, i.e. the number of keys is not a power of two. An example of this case is n=5 when the maximum index of a node is 12 (the node representing the interval [4,4]).

One way to fix this problem would be to use the first larger power of two, but that can potentially waste space. So what I did here was in the recursive init phase add on additional nodes as necessary. The remaining code is fairly straightforward: if the interval contains only one key then initialize that leaf node (here you might use A[l] if you were initializing the tree with an initial array of values); otherwise, recursively initialize the left and the right halved subintervals and assign this node the maximum of those. Again, in this case it would be OK to just assign M[node] to be -1, but if we were using an initial value array, this is pretty much how you do it.

So, where's the bug? :) This code actually has undefined behavior and is fairly likely to fail spectacularly (with a segfault). If you don't want to try to find the bug, then read on for the answer. Before that, here's a hint - the bug is related to std::vector. Can you spot it now? :)

Here's the explanation. The bug is actually in the line that recursively calls init and assigns the max of the two calls to the tree node. As you might have guessed by now, there are two issues in interplay here. First of all, when you push_back to a std::vector, you will get a reallocation if the previous size of the vector equaled the capacity (i.e., all the preallocated memory was exhausted). When that happens, all pointers and references to the elements of the vector are invalidated, for obvious reasons (meaning you must not use them). This is why you should always be wary of taking pointers (or references) to vector elements, i.e. carefully consider if it is logically possible for the vector to grow which might cause it to get reallocated.

However, one might wonder, where are the pointers or references in this code? The answer is, fairly obviously, vector's operator[]. The thing is, the compiler is free to evaluate M[node] before, between or after both init calls are made, i.e. it is free to evaluate the left side of the assignment before the right. Since init uses push back on the same vector it's assigning to, it might happen that the reference returned by operator[] in M[node] gets invalidated and you assign to it, breaking the program.

The manifestation of this bug was actually quite entertaining. I got a segfault for many executions, but sometimes I got a std::bad_alloc (for minute trees of a dozen or so nodes :). Quite spectacularly, it actually caused my AV to flag the executable as a trojan on one run (though that is, I suspect, not really closely related to this particular bug).

It is fairly easy to fix this problem in this case... just use a temporary to store the max of the recursive calls and then assign it to the appropriate location in the vector. Another solution would be to determine how many nodes there will be in the constructor and avoid having to extend the vector. Both of these solutions are either uglier or more complex than this one, but they are correct while this one is not :). However, I think this bug is generally fairly subtle and I wouldn't be surprised if it "leaked out" (in the abstraction sense) of some libraries that use std::vector internally.

0 comments:

Post a Comment