vector
and deque
a lot. They are two of the three sequences defined in section 21.1.1 of the C++ standard (the third one being list
which I won't talk about in this post).The nice thing about dealing with these sequences is that they manage their memory automatically when you're using
push_back
and pop_back
and you don't really have to think about it. Naturally, I did a search to see if this has been beaten dead, and sure enough, I found a great article about it by Andrew Koenig and Barbara Moo of Accelerated C++: Practical Programming by Example (C++ In-Depth Series) fame (also see Koenig lookup).Instead of repeating everything that's in there, I'll just recap the main point. The C++ standard (in section 23.2.4.1) guarantees amortized constant time insert and erase at the end for vector. This basically means that you can grow a
vector
from empty to size O(n)
in O(n)
time using push_back
, although some individual push_back
calls might take linear time (the ones that cause reallocation, which consists of allocating a new block of memory, copying the old objects, destroying them and deallocating the old memory block). One subtle thing to note here is that reallocation (which should mostly be transparent and not something you worry about) invalidates pointers and iterators into the sequence.The
vector
class offers some facilities that let you take a more active role in this memory management process. These first function is void reserve(size_type n)
which makes the vector
allocate enough memory for at least n
items, effectively making all push_back
calls up to n
items take constant time. The other function is size_type capacity() const
that lets you know the currently allocated memory for the vector
.The standard doesn't define how this is to be implemented, but by reading around and looking and source code, it seems most implementations grow the vector exponentially, with factor between 1.5 and 2 (although there might be others). Fortunately, even if you can't access the source of your implementation, you can easily test it out with a program like this:
#include <cstdio>
#include <vector>
int main() {
std::vector<int> v;
std::vector<int>::size_type old_capacity = v.capacity();
for (int i=0; i<1000; ++i) {
v.push_back(i);
const std::vector<int>::size_type new_capacity = v.capacity();
if (old_capacity != new_capacity) {
std::printf("%4u %4u\n", (unsigned)v.size(), (unsigned)v.capacity());
old_capacity = new_capacity;
}
}
std::puts("growth done");
for (int i=0; i<1000; ++i) {
v.pop_back();
const std::vector<int>::size_type new_capacity = v.capacity();
if (old_capacity != new_capacity) {
std::printf("%4u %4u\n", (unsigned)v.size(), (unsigned)v.capacity());
old_capacity = new_capacity;
}
}
std::puts("shrink done");
return 0;
}
On my machine, this outputs
1 1
2 2
3 4
5 8
9 16
17 32
33 64
65 128
129 256
257 512
513 1024
growth done
shrink done
As you can see, the factor here is obviously two. Two other points are of interest. First, the
vector
starts out with no allocated memory. Secondly, the memory is not released when the size shrinks. This means that the memory will be allocated until the object is destroyed, unless you explicitly call clear
or resize
. Also, if you use reserve
with some amount just before the loop, you'll basically seed the sequence with something other then one. The point here is that you get a lot of small reallocations at the start, which does seem like a waste.The standard actually recommends using vector "by default" (section 23.1.1.2). Browsing around Herb Sutter's Guru of the Week I stumbled upon an entry where he recommends using
deque
as the default. So, for a bit of background, a deque
is a sequence that supports constant time insert and erase operations on both the front and the end (via push
/pop
front
/back
). It also supports random access iterators and random access via operator[]
and at()
, but random access is not constant time since a deque
is basically a linked list of fixed size memory chunks. This works quite well for memory management as the memory chunks are small and no reallocations are required when the sequence needs to grow. Still, the point Herb is making is kind of defeated by the timing data he includes in the post since vector
outperforms deque
in pretty much everything (I'd say, as expected). Still, deque
is a great sequence and is actually used by both stack
and queue
by default (for good reason) and is a great structure for implementing 0-1 BFS.Going back to
vector
, I'm wondering how people use reserve
in "real life". Often, I'll have a pretty good idea of how big a vector
will end up after a loop, and using reserve
in those cases seems like the right thing to do.
std::vector<int> v;
v.reserve(n);
for (int i=0; i<n; ++i) {
v.push_back(i);
}
Should be better then
std::vector<int> v(n);
for (int i=0; i<n; ++i) {
v[i] = i;
}
because the latter actually initializes n ints to zero in the constructor (and the benefit would increase when the items in the vector are objects with more state and possibly elaborate default constructors).
Now the strange bit. Even though I've been programming in C++ for about five years now and I feel I know it really well and consider it my "main language", I've never used it on a multi-person project. In fact, I've never done a project over a few thousand lines in it, although doing a quick grep on the code I have on my home machine shows about 95k lines of C++ code I wrote. Due to strange circumstance, I've only had a chance to work on smallish sized projects (~5k lines) in C#, JavaScript, Python, PHP and Java, and on a medium (~30k lines) project in Java. So, I'm asking you, C++ developers, what are your experiences with using
vector
and deque
in real projects? Is reserve
used and how do you estimate the right size to use? Thanks!
1 comments:
Very significant Information for us, I have think the representation of this Information is actually superb one. This is my first visit to your site. Top Restaurant Classifieds Ads Africa
Post a Comment