Wednesday, April 22, 2009

Book Review: M. Garey, D. Johnson: Computers and Intractability: A Guide to the Theory of NP-Completeness

Today I'll review the wonderful Computers and Intractability: A Guide to the Theory of NP-Completeness. I first heard about this book while watching Steven Skiena's online video lectures (which I blogged about recently) three or four years ago, where he recommends it as the best book on the topic (and actually uses it in class). I've since read it several times and used it as reference many more, and before going into the details, I'll just say I think it's awesome.

And now for the details :). This is not your standard "read before sleep" book. It's more like a math book (and math books are read with pen and paper in hand :). When I first tried reading it (that was when I had just a basic grasp on the topic), I found it pretty hard going, and had to restart a few times. This is not to say it's not well written; on the contrary, the writing is great. It just took me a while to get into the right mindset for it. If you're a fan of formalisms and Turing machines, you'll have no problems whatsoever. If you're not, but still want to understand this stuff, you'll have to make more of an effort, but I think it's well worth it. The book doesn't force formalisms down your throat for no reason, but develops the theory from the formal to the informal and more intuitive (but still correct) in a very elegant way.

The book is logically divided into three parts. In the first part, you'll learn why complexity matters and some basic terminology regarding Turing machines, languages, encoding schemes, decision problems and "real problems" (and how they all relate to each other) and the all important polynomial transformation between problems. Then you'll read about what it means for a problem to be in P and NP. After this introduction, the authors present Cook's theorem that is basically the origin of the whole story, in which the famous satisfiability problem is defined and proved to be NP-complete. What follows are proofs (via polynomial transformation) that "6 basic" NP-complete problems are indeed NP-complete. These are 3-satisfiability, 3-dimension matching, vertex cover, clique, hamiltonian circuit and partition. These transformations are a bit more involved then most (since you don't have many problems to choose from when you start building up the NP-complete set), as is the proof of Cook's theorem, but I find them mathematically beautiful (and in general, constructing NP-completeness proofs is a nice mind "game", not to mention useful in many cases :). However, on your first reading, you'll probably want to skim these and just get the intuition behind them and understand "the big picture".

After that, the authors take a step back and try to define three general techniques for constructing NP-completeness proofs (restriction, local replacement and component design) and show a few examples of each. You are then presented with fourteen exercise problems on which to try the techniques out. This ends the first part of the book (about 65 pages).

The second part of the book starts with exploring how the NP-completeness of a problem changes when you constrain it or change it. Then it explains what it means for a problem to be NP-complete in the strong sense and what are Turing reducibility and NP-hardness. The next chapter gives some basic information about dealing with NP-complete problems through approximation algorithms, but what I think is far more interesting (in the context of this book), they show how using the theory of NP-completeness you can prove that some problems can't even be approximated well (for certain definitions of "a good approximation" which I won't go into here) in polynomial time (unless P = NP). The second part ends with expanding the complexity hierarchy with some more advanced concepts like co-NP, gama-reductions and gama-completeness, #P-completeness, PSPACE, DLOGSPACE, EXPTIME etc. (these are all covered with examples and provide solid intuition about what these things are on the first reading, but need to be studied a bit more carefully for full understanding).

Finally, the third part of the book is a catalog of about 300 NP-complete problems (sorted into several categories). This is a great reference! If you have an NP-complete problem on your hands, chances are it has a name and it's on this list :) Given the age of this book, I'd imagine the current list of known NP-complete problems is somewhat bigger, but this one is comprehensive enough that you can probably at least find a problem that is close enough to your problem that you'll be able to construct a proof yourself.

So to summarize, this is an awesome book. NP-complete problems pop up just about everywhere, and it's important to be able to recognize them (and not waste too much time trying to solve them to optimality quickly). My advice for a reading algorithm: read the first part, skimming over the proofs for intuition, then reread it a month later going through the proofs with pen and paper. Then read the second part (it's not as important so you can skim it or read it for full comprehension, depending on how much you want to get out of it) and skim the catalog to get a feel for which problems are there (so you can use it as a reference). You'll recognize quite a few of them, probably some you've met at work :). Also, even if you're not in Computer Science, you might like this book from an epistemological perspective ;)

Sunday, April 19, 2009

Book Review: Andrew Tanenbaum: Computer Networks (4th edition)

Today I'm going to review the classic Computer Networks (4th Edition) by Andrew Tanenbaum.

I first read a significant portion of this book a few years ago when I took a class in computer networks at the University (it's a 4th year graduate course, considered by most to be one of the hardest in the program) that was mostly based on it. Since I'm a big fan of reading books cover to cover (which is BTW the ideal way to read this particular book), I took some time the last few weeks and did just that. Having read most of it before, and dealing with networks quite a lot in my work and spare time, I read it mostly for review value.

As the title suggests, this is a very general book, and as such holds real value. After all, in today's world, who can ignore networking? If anything, networks will become more and more important, and understanding them is already pretty much mandatory for anyone in ICT. This book will give you a great framework for dealing with all sorts of networks you might come across in the wild. For the most part, it explains how networks are built in layers, with each layer offering services to the layer above it and consuming the services of the layer below it. Two major reference models are covered: the OSI/ISO one and the TCP/IP one. You will get a solid understanding of just how much politics influences standard's bodies and better understand why some things are not as good as they could be :).

For the most part, the book then goes through the layers (of something that I've heard called the "Tanenbaum hybrid model" which is basically what is used in practice (in the Internet for example). You will learn some basic things about the physical layer like the kinds of cables in use and their characteristics, fiber, radio, microwave and satellite transmission..., encoding bits with volts. After the physical layer, there's a high level explanation of how the telephone system works. Then he goes on a detailed tour of the data link layer (which is basically a direct "wire-like" connection between two machines), talking about error correction codes (Hamming code, CRC), framing and flow control. Here he also discusses Bluetooth, Ethernet (in some detail), WLAN etc. In covering the network layer, the main focus is on routing, congestion control and internetworking. There's also a relatively detailed discussion of the network layer in the Internet where IPv4 and IPv6 are discussed (among other things). Then he goes on to cover the transport layer which mainly deals with establishing connections, retransmissions and congestion control. Here he talks a bit about UDP and TCP, which are the "main" transport layer protocols in the Internet. Finally, there's a chapter about the application layer that features pretty much anything you've heard about, from DNS and e-mail to the World Wide Web and its protocols, to multimedia.

The book finishes with about 200 pages on network security, which covers your basic cryptography and different protocols in the different layers that implement various measures. There is also a great section on further reading per topic.

One final note. Although this is the 4th edition of the book, the book was released in 2002. which is (in computer terms) ancient. To put things in perspective, Wikipedia was started in 2001., YouTube in 2005., social networks didn't really exists in 2002. etc. Still, the things covered in this book have not changed that much so this should not worry you too much.

So to summarize, this is an awesome book that I think everyone in ICT should read. It's OK as a reference, although you'll probably want to consult specialized literature for specific topics (like the relevant RFCs for Internet related stuff). It's pretty expensive, so you might want to get a used copy, but it's worth owning!

Friday, April 17, 2009

Using vector and deque "the right way"

If you're using the STL, chances are you're using vector and deque a lot. They are two of the three sequences defined in section 21.1.1 of the C++ standard (the third one being list which I won't talk about in this post).

The nice thing about dealing with these sequences is that they manage their memory automatically when you're using push_back and pop_back and you don't really have to think about it. Naturally, I did a search to see if this has been beaten dead, and sure enough, I found a great article about it by Andrew Koenig and Barbara Moo of Accelerated C++: Practical Programming by Example (C++ In-Depth Series) fame (also see Koenig lookup).

Instead of repeating everything that's in there, I'll just recap the main point. The C++ standard (in section 23.2.4.1) guarantees amortized constant time insert and erase at the end for vector. This basically means that you can grow a vector from empty to size O(n) in O(n) time using push_back, although some individual push_back calls might take linear time (the ones that cause reallocation, which consists of allocating a new block of memory, copying the old objects, destroying them and deallocating the old memory block). One subtle thing to note here is that reallocation (which should mostly be transparent and not something you worry about) invalidates pointers and iterators into the sequence.

The vector class offers some facilities that let you take a more active role in this memory management process. These first function is void reserve(size_type n) which makes the vector allocate enough memory for at least n items, effectively making all push_back calls up to n items take constant time. The other function is size_type capacity() const that lets you know the currently allocated memory for the vector.

The standard doesn't define how this is to be implemented, but by reading around and looking and source code, it seems most implementations grow the vector exponentially, with factor between 1.5 and 2 (although there might be others). Fortunately, even if you can't access the source of your implementation, you can easily test it out with a program like this:

#include <cstdio>
#include <vector>

int main() {
std::vector<int> v;

std::vector<int>::size_type old_capacity = v.capacity();
for (int i=0; i<1000; ++i) {
v.push_back(i);
const std::vector<int>::size_type new_capacity = v.capacity();
if (old_capacity != new_capacity) {
std::printf("%4u %4u\n", (unsigned)v.size(), (unsigned)v.capacity());
old_capacity = new_capacity;
}
}
std::puts("growth done");
for (int i=0; i<1000; ++i) {
v.pop_back();
const std::vector<int>::size_type new_capacity = v.capacity();
if (old_capacity != new_capacity) {
std::printf("%4u %4u\n", (unsigned)v.size(), (unsigned)v.capacity());
old_capacity = new_capacity;
}
}
std::puts("shrink done");

return 0;
}


On my machine, this outputs

1 1
2 2
3 4
5 8
9 16
17 32
33 64
65 128
129 256
257 512
513 1024
growth done
shrink done


As you can see, the factor here is obviously two. Two other points are of interest. First, the vector starts out with no allocated memory. Secondly, the memory is not released when the size shrinks. This means that the memory will be allocated until the object is destroyed, unless you explicitly call clear or resize. Also, if you use reserve with some amount just before the loop, you'll basically seed the sequence with something other then one. The point here is that you get a lot of small reallocations at the start, which does seem like a waste.

The standard actually recommends using vector "by default" (section 23.1.1.2). Browsing around Herb Sutter's Guru of the Week I stumbled upon an entry where he recommends using deque as the default. So, for a bit of background, a deque is a sequence that supports constant time insert and erase operations on both the front and the end (via push/pop front/back). It also supports random access iterators and random access via operator[] and at(), but random access is not constant time since a deque is basically a linked list of fixed size memory chunks. This works quite well for memory management as the memory chunks are small and no reallocations are required when the sequence needs to grow. Still, the point Herb is making is kind of defeated by the timing data he includes in the post since vector outperforms deque in pretty much everything (I'd say, as expected). Still, deque is a great sequence and is actually used by both stack and queue by default (for good reason) and is a great structure for implementing 0-1 BFS.

Going back to vector, I'm wondering how people use reserve in "real life". Often, I'll have a pretty good idea of how big a vector will end up after a loop, and using reserve in those cases seems like the right thing to do.

std::vector<int> v;
v.reserve(n);
for (int i=0; i<n; ++i) {
v.push_back(i);
}

Should be better then

std::vector<int> v(n);
for (int i=0; i<n; ++i) {
v[i] = i;
}

because the latter actually initializes n ints to zero in the constructor (and the benefit would increase when the items in the vector are objects with more state and possibly elaborate default constructors).

Now the strange bit. Even though I've been programming in C++ for about five years now and I feel I know it really well and consider it my "main language", I've never used it on a multi-person project. In fact, I've never done a project over a few thousand lines in it, although doing a quick grep on the code I have on my home machine shows about 95k lines of C++ code I wrote. Due to strange circumstance, I've only had a chance to work on smallish sized projects (~5k lines) in C#, JavaScript, Python, PHP and Java, and on a medium (~30k lines) project in Java. So, I'm asking you, C++ developers, what are your experiences with using vector and deque in real projects? Is reserve used and how do you estimate the right size to use? Thanks!

Wednesday, April 15, 2009

Book Review: Matt Weisfeld: The Object-Oriented Thought Process

Today I'll write a quick review of the ambitiously titled The Object-Oriented Thought Process (3rd Edition) (Developer's Library).

Just by browsing through the table of contents (which is always a nice way to start thinking about buying or reading a book), you'll see that this book attempts to cover a lot of ground. In fact, on the second edition cover, the subtitle said "An introduction to object-oriented concepts for programmers looking to master modern application development tools, including Java and .NET" (my emphasis), but thankfully, they decided to remove that :). In reality, this book scratches the very surface of a lot of facets of "modern application development", but lacks any depth that could lead you to mastering anything. What you get is basically a lesson in object-oriented terminology. You'll hear about classes, objects, encapsulation, inheritance, composition and all the usual suspects, and then see a few (standard and very artificial) examples modeled with CRC cards and UML.

Unfortunately, the book then gets sidetracked into relational databases, XML, HTML, CSS, JavaScript, JavaBeans, CORBA and Web Services (you get about two to three pages about each). Finally, there's a twenty page chapter on design patterns that tells you what they are and lists a few. You can skip this whole part.

To it's credit, the book is short, well written and (for the most part) correct. If you've just heard of object-oriented programming this book will give you a nice overview of what it's about, and possibly keep you from forming some misconceptions. On the other hand, if you're already familiar with OOP, you won't really benefit much. I also think this might be a good book for people who aren't really planing on being developers but will be talking to developers in their work (like managers and marketers) as it is lightweight enough and doesn't really require previous knowledge (there's some Java and C# code in there, but it's very basic and not very important either). Finally, since this book doesn't hold any value as a reference and reading it more then once doesn't make much sense, if you want to read it, I recommend you lend it at the library and save some money. :)

Saturday, April 11, 2009

Book Review: M. Feathers: Working Effectively With Legacy Code

First off all, happy Easter!

For my first book review, I decided to write about a wonderful book I discovered about a year ago that I think not too many people know about, and that is Working Effectively with Legacy Code (Robert C. Martin Series).

I got this book in my first week as a Google intern. Since I had some spare time in the evenings and on the weekends I read it right away and found it absolutely amazing. It basically talks about how to deal with the fact of changing requirements. In the early days of OOP, there was a tendency to over-engineer a project (trying to predict every possible change ahead of time, overuse of inheritance etc.). People soon realized that, while proper design is essential, you just can't predict every possible change, and sometimes you'll have to change your design to accommodate the new requirements. This is where the idea of refactoring code comes in. It's a process in which you change your code while preserving behavior, making it "nicer", in an attempt to make the behavior easier to change in a later step.

The main problem here is preserving behavior while changing things. This can lead to all sorts of headaches and problems, but can be made a lot easier if your code has tests. This is also the main premise of this book; the author loosely defines legacy code as code without tests (and also code that's older then three months or so, which is probably a lot shorter than most people would think). There has been some controversy over tests a few months ago in the "blogosphere", mainly in posts and podcasts by Joel Spolsky and Jeff Attwod on one side, and Robert Martin on the other, all of whom are well respected (for good reason) in the community. Basically, Joel and Jeff argue that Test Driven Development is going too far, and test can make your code harder to change (because you have to change your tests as well). While this might be true sometimes, I don't think it's a valid point if you design your test right. You should be testing your APIs and not your implementations (which is certainly not always possible, but should be a goal). Sure, writing a piece of code (a class for example) will take a bit longer initially because you'll need to write tests for it, but you get several benefits from that. First, you write code that uses your class, therefore exposing some possible weak spots in your API. Second, once you're done, you can be reasonably sure that your class does what you want. Also, when it comes time to change the implementation (or the API, which is not too big a deal if it's a small internal API - and someone else might be making the change, or it's been a year since you wrote the initial code), you'll have a safety-net since you'll be able to tell early if you broke something (there might of course be things you miss, but a lot fewer of them). On the other hand, you just can't unit test everything, but that's OK too. You can do some tests manually, or use specialized tools to automate that as much as possible. I feel that unit tests also provide great documentation for what the API is doing and how to use it (still, you should write more formal docs for external APIs :).

One area especially where I think unit tests are extremely important are scripting languages. Many people used to think (or still do) that programs in scripting tend to be harder to change or more prone to "silly" bugs due to typos that don't get detected by the compiler. While this can be somewhat mitigated with using good IDE-s, having solid unit tests will pretty much eliminate these kinds of bugs and you'll be dealing with much the same bugs you were dealing with in compiled languages.

So back to the book... This book will show you a wealth of options for creating this testing safety-net where there isn't any through a series of realistic examples in Java, C++ and C (although most of the techniques presented are applicable to most languages). It starts of by talking about breaking dependencies as the crucial step in getting (parts of) legacy code under tests. BTW, there is a lot of great material on dependency (and dependency injection) and how testing can make your code less coupled in the first place on the Google Testing Blog. Well worth reading and watching the YouTube videos.

Then it introduces fakes and mocks that let you replace complex objects and interactions in your tests with simple ones, thus making tests faster (connecting to a fake database locally, vs. connecting to a database over the network for example), more focused and less flaky (they won't fail if the network fails or the database dies). Then Feathers talks about seams, which are ways to change behavior of some code without editing in that place. He introduces a lot of techniques that range from fairly obscure (like preprocessing and linking seams) to the more common object seams.

In the next third of the book, he discusses how to deal with things like big classes that do a ton of things, how to go about understanding how objects interact in a complicated system and things like that. This part is mostly about creating the right mindset for dealing with legacy code, although most of the techniques presented are quite practical.

Finally, the third part of the book contains the dependency breaking techniques the first two parts have been building up to in a really nice catalog that can be a great reference. You should read this book once from cover to cover to get it, and then skim it a few times in the following year to index the techniques better into your brain (and of course use the book as a reference).

In conclusion, I think this book is definitely worth reading (if not owning). Even if you don't deal with legacy code on a day to day basis (which is probably rare), it will change the way you look at the new code you're writing, and hopefully convince you that you should be writing unit tests. Try it! :) (if you can get someone to review your code as well, you'll sleep even better, but that's another topic :).

Friday, April 10, 2009

Standard C function for reading arbitrarily long lines from files

This is a function I wrote back when I was learning C (and understanding why standards matter, thanks mostly to the wonderful Usenet group comp.lang.c). I translated the variable names and comments from Croatian to English and thought I'd post it here in case someone might find it useful. What it does basically is read a file one character at a time attempting to allocate more memory as needed. It's written in standards compliant C, and packed in a small test driver program. I'm not including a header file for it since it's only one function, but you can write one easily yourself.

You can compile it with something like

gcc read_line.c -W -Wall -ansi -pedantic

and should get no warnings. If you want to test it, you should probably reduce ALLOC_INCREMENT to something like 4 or so.

Most of the code is self explanatory and has detailed comments. There are a few design decisions I'd like to talk about. First off, the function returns the number of characters read as an int, which is not really in line with the standard library which uses size_t for buffer sizes. The simple reason is that I personally don't like working with unsigned values, except when doing modulo 2^n arithmetic and doing bitmask handling. It's also a nice way to report an error by returning -1 (although you could return (size_t)-1, but that is less nice). Also, in contexts where I use this function, int is more then enough to represent all the possible line sizes.

Second, the final amount of allocated memory might be higher then the number of bytes read. You can solve this easily by creating a wrapper function, similar to this one

int read_line_with_realloc(char **s, FILE *fp) {
char *p = NULL;
int len = read_line(s, fp);
if (len == 0) {
return 0;
} else {
p = realloc(*s, len + 1); /* room for '\0' */
if (p == NULL) {
free(*s);
return -1;
}
*s = p;
return len;
}
}


Finally, there's no way to reuse the memory already allocated in a buffer, that is, it is assumed that the passed buffer is initially empty. This is basically to keep the interface clean, but can also be easily fixed with a wrapper.

You can get the code here.

I've used this function many times, but as always (even with trivial code), there might be some errors in there. If you find any, please let me know.

Thursday, April 9, 2009

Computer Science video lectures worth watching

In this post I'm going to list a few great courses from the world's top CS Universities that have video lectures of the whole course.

There are three excellent algorithm courses of this kind I'm aware of. First, there's the MIT Introduction to Algorithms course, thought by Charles Leiserson and Eric Demaine. Both these professors are great lecturers, with Eric Demaine being one of the best I've seen (it's really unfortunate that his other courses like Advanced Data Structures don't have video lectures). The course is based on the Introduction to Algorithms book, commonly known as CLRS (which are the initials of the authors). The L in CLRS is Charles Leiserson. As an aside, the R is Ron Rivest, who is also the R in RSA. I'll probably do a review of this book some time, but it is considered by many to be the best introductory algorithms book out there (and in this sense, introductory basically means general algorithm design and analysis). It is mathematically pretty rigorous, very well written with lots of great exercises and problems and is very comprehensive (for example, it describes pretty much all the "textbook algorithms" you'll ever need at TopCoder). The course doesn't cover the whole book, but gives a great introduction to asymptotic analysis, sorting, randomized algorithms, hashing, self-balancing tree structures, greedy algorithms, dynamic programming and shortest paths algorithms. There's also a nice introduction to amortized analysis (which is sort of an advanced topic). The course ends with a series of lectures about "hot topics" in algorithms today, namely caching and parallelism in algorithms. Only the 2005 course has video lectures, which is unfortunate since they change the topics of these last lectures every time (you can still find the lecture notes though). Also, recitation which covers some interesting stuff is also not available in video (although it used to be, but the files were removed a while ago).

As motivation for the course, prof. Leiserson mentions that it is believed that you can become a great programmer by programming for 10 years, or by programming for 2 years and taking an algorithms course. While this comment is a half-joke, there's no question that understanding algorithms makes you a better programmer. As far as the 10 years of programming, check out the classic post by Peter Norvig called Teach Yourself Programming in Ten Years (if you managed to miss it until now :). The number is pretty arbitrary, but it's probably not off by much.

The second, less known course about algorithms is Data Structures and Algorithms from IIT Delhi (the link is to the first lecture). The lectures are in clear English (except occasional burst of the local language :) and also follow CLRS. The course consists of 36 lectures and covers things from basic data structures like stacks and queues to hashing, sorting, self-balancing tree structures and graph algorithms. The lectures are for the most part pretty independent, and they cover some things that are not covered in the MIT course or in CLRS (for example AVL trees which are covered through exercises in the book).

The third algorithms course is the one by Steven Skiena from Stony Brook. This one is quite different from the previous two in the approach it takes (although the recommended course textbook is also CLRS). It far less mathematically rigorous and doesn't try so hard to prove everything. This might sound like a bad thing, but the emphasis of the course is on understanding how algorithms work and on general techniques for algorithm design. This is in line with Skiena's book The Algorithm Design Manual. This is not really an introductory book as it glosses over a lot of the details you might want to know, but if you're already familiar with algorithms it's a great read. It also contains a great algorithms catalog that contains enough information to recognize the problem you're facing and get an idea what to do (it used to be available online, but I can't find it right now). The course ends with a great introduction to complexity theory and identifying and tackling intractable problems, which is Skiena's primary research area.

Finally, I'll mention one non-algorithms course, namely CS 61C Machine Structures from UC Berkley. They post video lectures for this course every semester, but the one I'm linking to is held by Dan Garcia who is the best lecturer I've ever seen. Apart from that, the course is so fundamental to Computer Science that I think it's the one course everyone should take. In my University program, we had about three or four courses that cover the the individual parts of this course in greater depth, but I feel adding this kind of all-in-one course would actually be great. This is an awesome course to learn about some of the key abstractions in computers. It starts with an introduction to C and basic programming, then moves on to MIPS and some low-level programming. Then there's a basic lecture about compiling where people get a feeling of how the things they write in C get converted to MIPS. After that there's a short and elementary lecture on basic electronics (basically how you move from the abstraction level of electrons and protons to the level of transistors and electronic circuits) and several lectures on digital logic elements. Here they explain how you make things like adders and registers from gates. Then they use this elements to create a functional, pipelined CPU, explain caching and virtual memory, and connect the hardware with the MIPS code through the powerful idea of the ISA (Instruction Set Architecture). After listening to this course, it's really easy to understand how computers work "from the ground up" at an elementary level. There are a few lectures in the course that are sort of out of place, like the one on networking (and it's not very good either) and optimization, but you can skip those.

If you know a good course that I missed, let me know! :)

Tuesday, April 7, 2009

Vim C++ macro-like expansion script

Hi again!

Today I'll release a Vim script I wrote a few weeks ago (and have been planing to write for a long time). What it does is enable fast typing of idiomatic things like for loops, STL container declarations etc. The philosophy behind using a script like this is in line with the general philosophy of Vim, which is to make editing as fast as possible (fewest keystrokes to achieve a result). Also, writing idiomatic code is known to be error prone (ever typed ++i in an inner loop instead of ++j?) and is definitely tedious.

Another reason for writing this script are TopCoder competitions. Unlike most algorithm competitions, at TopCoder speed is a big factor. You only have 75 mintes to solve 3 problems (of increasing difficulty) and your submissions are timed from the moment you open the problem statement. Also, there is no partial credit for problems. After the coding phase, there's a short intermission and then a 15 minute challenge phase where other coders get a chance to view your source code and construct legal inputs that will break it (thus getting additional points for themselves and bringing your point total for that problem to 0). Finally, if your code survives the challenge phase, it has to pass all the automated system tests (some of which are "random", and some of which are specifically constructed to trip up wrong approaches). Some people at TopCoder use macros to achieve pretty much the same effect as this script. The main problem I have with this option is that it makes the code look ugly (sort of obfuscated). This makes it unnecesarilly hard for other people to challenge (although that's probably not the main intention) and, more importantly, it's bad style. Of course, using this script (or the macro method) probably won't increase your TopCoder rating; TopCoder algorithm competitions are primarily about quicky inventing (or applying) the correct algorithm, and the implementing it correctly and with reasonable efficiency. Still, being able to type code faster is always a good thing :).

So let me finally give you a short tour of what exaclty the script does (for the full list, see the comment on top of the file). The current functionality can be seperated into two groups: loop/block expansion and containers and type expansions.

Very often, you'll want to loop through an STL container, or repeat something n times. To do this, you'll write something like this:

for (int i=0; i<(int)my_vec.size(); ++i) {
// per-item code goes here
}

Note that in "real" code you might use size_t or, better yet, vector<int>::size_type. Also notice the int cast on the container size. Comparing a signed to an unsigned value is not entirely safe in C++ (The int gets converted to an unsigned (assuming vector<int>::size_type is size_t and size_t is a typedef for unsigned, as is common on PCs) in what is called the usual arithmetic conversions. This can cause problems if the value of the int is negative, in which case, on a 2s complement machine, becomes a positive number and you get the "wrong answer".). Still, in idiomatic code like this, using a cast is an OK option IMO.

Anyway, to write the above loop with this script, you'd write:

@f i #my_vec

and press <F2> (I'll talk about the keyboard mappings in a sec). Vim will then expand the line to the above code (without the // per-item code goes here comment, of course :) and (if you were in insert mode) position the cursor at the "right" place (one indent level inside the block) so you can continue hacking.

If you want to loop to a value (or variable), and not trough a container, you'd write something like this:

@f i n

and get

for (int i=0; i<n; ++i) {

}

You probably guessed it, but the general syntax is:

@f index_var_name upper_limit

Where the upper limit can be preceded by a # to produce the size of a container.

If you want the loop condition to be less than or equal, prefix the upper limit with an equals sign like:

@f index_var_name =upper_limit


There's also a version that lets you specify a non-zero lower limit

@f index_var_name lower_limit upper_limit


and versions for looping downward

@fd index_var_name [lower_limit] upper_limit

(the square brackets indicate that the lower_limit is optional).

Finally, there are similar expressions for while and do-while loops, as well as for if branches:


@w expression
@d expression
@i expression


Finally, there is also support for for loops with iterators through containers. The syntax for this is (I'll explain what <container_expression> and <expanded_container_expression> are in the next paragraph):

@iter iter_name <container_expression> container_name

which generates

for (<expanded_container_expression>::iterator iter_name=container_name.begin(); iter_name!=container_name.end(); ++iter_name) {

}

There's also a @iterc version that generates a ::const_iterator iteration.

The second part of the functionality is container and type expansions. Basically, for:

@vi

(and <F2>) you get:

vector<int>

This works in all parts of the line (except in double quotes and comments), unlike the loop/block expansions that only work on the start of lines (modulo whitespace characters). I won't list all the supported expansions, but all the vector, set, map and pair that I tend to use frequently are supported. A few examples:

@vvi
@sd
@msi
@pllll

expands to:

vector< vector<int> >
set<double>
map<string, int>
pair<long long, long long>

Keyboard mappings are defined at the end of the script (so you can easily change it, if you like).

nmap <F2> :call BudaCppExpand()<Enter>
imap <F2> <Esc>:call BudaCppExpand()<Enter>A


I have some further ideas that I'm going to add in the short term. Since this post has gotten very long, I won't discuss them here except to say that you can read about them in the comment on top of the script :).

One important thing in Vim is making a habit out of things (so you don't have to think about whether or not you're going to press 'i' or 'a' or 'A' etc.). Since this script is pretty new I still haven't really gotten into the habit of using it as much as I'd like, but I'm trying :).

Without further ado, you can get the script here. To use it, you can put it into the ftplugins Vim directory.

If you find the script useful, have any suggestions about improvements and possible further additions or find a bug, please let me know!


Edit:
While writing this post, I noticed that it might be useful (and in line with the style of the script) to be able to expand things after } else to form if/else blocks properly which wasn't possible with the version I posted. I fixed that now so you should be able to do things like


if (a > b) {
// do something
} else @i a+b+c > 42

and get it expanded into

if (a > b) {
// do something
} else if (a+b+c > 42) {

}

Monday, April 6, 2009

Getting started

Hi there!

This is my first blog post ever, and I'll use to to introduce myself and talk a bit about what I plan to write about and why. Unfortunately, it will probably be a bit dry, but I have to start somewhere :)

I'm a 24 year old Research Assistant and PhD student at the School of Electrical Engineering and Computer Science, University of Zagreb, Croatia in the Computer Science department. I graduated about a year ago and shared the "student of the generation" award with another colleague in the CS department. During college, I took part in several software projects, most notably in the design and implementation of a SmartHome system for Microsoft's Imagine Cup 2007, and in the Ericsson Nikola Tesla (the Croatian branch of Ericsson) Summer Camp 2007 where I helped implement a two way SIP/SOAP gateway for general purpose SIP/SOAP integration (primarily to enable SIP-aware mobile devices to consume Web Services on the Internet).

Just before graduation, I took part in a 3-month internship at Google HQ in Mountain View, CA where I worked in the OpenSocial group. If you're not familiar with OpenSocial, it's basically a social networking platform backed and developed by Google, Yahoo!, MySpace, hi5 and a bunch of other social network sites (pretty much anyone that's not Facebook). My time at Google was probably one of the best times in my life. The work experience there is really nothing like anything I've had a chance to experience before (or since).

I try to keep my interests broad, but if I had to choose just a few (professional ones), it would have to be distributed systems, Web technologies, algorithm design and analysis, discrete math, programming languages and compilers, and complexity theory (still pretty broad :). Most of these coincide really well with the University classes I'm a TA on. I currently work on the Geppeto project in the Consumer Computing Lab at my University. Geppeto is basically a Gadget composition tool designed for consumers (i.e. people with no specialist knowledge about computer programming and Web technologies). If you're using iGoogle and Firefox 3, you can install the current version of Geppeto and see what it's all about.

When I'm not working, I enyoj solving algorithmic problems on sites like TopCoder. Unfortunately, I rarely get a chance to compete in online events, but they give access to all the problems and automatic testing of solutions after each contest. The community is also really awesome, and all the problems get editorials that discuss how the problem can be solved which can be helpful if you get stuck.

So, after covering the boring bits, I'll set out the plan for this blog. Basically, I'll try to release code snippets for doing various useful stuff (Vim scripts, algorithm/data structure implementations, WoW addons etc.). I also plan to do pretty regular book reviews on the relevant CS/programming/math book I've read. Finding good reviews online is hard, and I feel they can really be useful since buying and reading a bad book is a big waste of time. Finally, I'll write about all sorts of stuff I stumble upon that might be interesting.

That's it for my first post! Take care :)