Computer Science and Programming Haven: Using vector and deque "the right way"

If you're using the STL, chances are you're using vector and deque a lot. They are two of the three sequences defined in section 21.1.1 of the C++ standard (the third one being list which I won't talk about in this post).

The nice thing about dealing with these sequences is that they manage their memory automatically when you're using push_back and pop_back and you don't really have to think about it. Naturally, I did a search to see if this has been beaten dead, and sure enough, I found a great article about it by Andrew Koenig and Barbara Moo of Accelerated C++: Practical Programming by Example (C++ In-Depth Series) fame (also see Koenig lookup).

Instead of repeating everything that's in there, I'll just recap the main point. The C++ standard (in section 23.2.4.1) guarantees amortized constant time insert and erase at the end for vector. This basically means that you can grow a vector from empty to size O(n) in O(n) time using push_back, although some individual push_back calls might take linear time (the ones that cause reallocation, which consists of allocating a new block of memory, copying the old objects, destroying them and deallocating the old memory block). One subtle thing to note here is that reallocation (which should mostly be transparent and not something you worry about) invalidates pointers and iterators into the sequence.

The vector class offers some facilities that let you take a more active role in this memory management process. These first function is void reserve(size_type n) which makes the vector allocate enough memory for at least n items, effectively making all push_back calls up to n items take constant time. The other function is size_type capacity() const that lets you know the currently allocated memory for the vector.

The standard doesn't define how this is to be implemented, but by reading around and looking and source code, it seems most implementations grow the vector exponentially, with factor between 1.5 and 2 (although there might be others). Fortunately, even if you can't access the source of your implementation, you can easily test it out with a program like this:


#include <cstdio>
#include <vector>

int main() {
    std::vector<int> v;

    std::vector<int>::size_type old_capacity = v.capacity();
    for (int i=0; i<1000; ++i) {
        v.push_back(i);
        const std::vector<int>::size_type new_capacity = v.capacity();
        if (old_capacity != new_capacity) {
            std::printf("%4u %4u\n", (unsigned)v.size(), (unsigned)v.capacity());
            old_capacity = new_capacity;
        }
    }
    std::puts("growth done");
    for (int i=0; i<1000; ++i) {
        v.pop_back();
        const std::vector<int>::size_type new_capacity = v.capacity();
        if (old_capacity != new_capacity) {
            std::printf("%4u %4u\n", (unsigned)v.size(), (unsigned)v.capacity());
            old_capacity = new_capacity;
        }
    }
    std::puts("shrink done");

    return 0;
}

On my machine, this outputs


   1    1
   2    2
   3    4
   5    8
   9   16
  17   32
  33   64
  65  128
 129  256
 257  512
 513 1024
growth done
shrink done

As you can see, the factor here is obviously two. Two other points are of interest. First, the vector starts out with no allocated memory. Secondly, the memory is not released when the size shrinks. This means that the memory will be allocated until the object is destroyed, unless you explicitly call clear or resize. Also, if you use reserve with some amount just before the loop, you'll basically seed the sequence with something other then one. The point here is that you get a lot of small reallocations at the start, which does seem like a waste.

The standard actually recommends using vector "by default" (section 23.1.1.2). Browsing around Herb Sutter's Guru of the Week I stumbled upon an entry where he recommends using deque as the default. So, for a bit of background, a deque is a sequence that supports constant time insert and erase operations on both the front and the end (via push/pop front/back). It also supports random access iterators and random access via operator[] and at(), but random access is not constant time since a deque is basically a linked list of fixed size memory chunks. This works quite well for memory management as the memory chunks are small and no reallocations are required when the sequence needs to grow. Still, the point Herb is making is kind of defeated by the timing data he includes in the post since vector outperforms deque in pretty much everything (I'd say, as expected). Still, deque is a great sequence and is actually used by both stack and queue by default (for good reason) and is a great structure for implementing 0-1 BFS.

Going back to vector, I'm wondering how people use reserve in "real life". Often, I'll have a pretty good idea of how big a vector will end up after a loop, and using reserve in those cases seems like the right thing to do.


std::vector<int> v;
v.reserve(n);
for (int i=0; i<n; ++i) {
  v.push_back(i);
}

Should be better then


std::vector<int> v(n);
for (int i=0; i<n; ++i) {
  v[i] = i;
}

because the latter actually initializes n ints to zero in the constructor (and the benefit would increase when the items in the vector are objects with more state and possibly elaborate default constructors).

Now the strange bit. Even though I've been programming in C++ for about five years now and I feel I know it really well and consider it my "main language", I've never used it on a multi-person project. In fact, I've never done a project over a few thousand lines in it, although doing a quick grep on the code I have on my home machine shows about 95k lines of C++ code I wrote. Due to strange circumstance, I've only had a chance to work on smallish sized projects (~5k lines) in C#, JavaScript, Python, PHP and Java, and on a medium (~30k lines) project in Java. So, I'm asking you, C++ developers, what are your experiences with using vector and deque in real projects? Is reserve used and how do you estimate the right size to use? Thanks!

Computer Science and Programming Haven

Friday, April 17, 2009

Using vector and deque "the right way"

1 comments:

Post a Comment

Labels

Blog Archive

About Me