Friday, April 10, 2009

Standard C function for reading arbitrarily long lines from files

This is a function I wrote back when I was learning C (and understanding why standards matter, thanks mostly to the wonderful Usenet group comp.lang.c). I translated the variable names and comments from Croatian to English and thought I'd post it here in case someone might find it useful. What it does basically is read a file one character at a time attempting to allocate more memory as needed. It's written in standards compliant C, and packed in a small test driver program. I'm not including a header file for it since it's only one function, but you can write one easily yourself.

You can compile it with something like

gcc read_line.c -W -Wall -ansi -pedantic

and should get no warnings. If you want to test it, you should probably reduce ALLOC_INCREMENT to something like 4 or so.

Most of the code is self explanatory and has detailed comments. There are a few design decisions I'd like to talk about. First off, the function returns the number of characters read as an int, which is not really in line with the standard library which uses size_t for buffer sizes. The simple reason is that I personally don't like working with unsigned values, except when doing modulo 2^n arithmetic and doing bitmask handling. It's also a nice way to report an error by returning -1 (although you could return (size_t)-1, but that is less nice). Also, in contexts where I use this function, int is more then enough to represent all the possible line sizes.

Second, the final amount of allocated memory might be higher then the number of bytes read. You can solve this easily by creating a wrapper function, similar to this one

int read_line_with_realloc(char **s, FILE *fp) {
char *p = NULL;
int len = read_line(s, fp);
if (len == 0) {
return 0;
} else {
p = realloc(*s, len + 1); /* room for '\0' */
if (p == NULL) {
free(*s);
return -1;
}
*s = p;
return len;
}
}


Finally, there's no way to reuse the memory already allocated in a buffer, that is, it is assumed that the passed buffer is initially empty. This is basically to keep the interface clean, but can also be easily fixed with a wrapper.

You can get the code here.

I've used this function many times, but as always (even with trivial code), there might be some errors in there. If you find any, please let me know.

1 comments:

pheobe22 said...

Quickly this site will indisputably be famous among all blogging people, because of its fastidious articles or reviews.questions

Post a Comment