Need help: string manipulation in c

Started by
4 comments, last by waterwalk 18 years, 8 months ago
hello, I'm learning to program in c(not c++), but often get frustrated with it's string manipulation. Often, i'm not sure: 1. When to use fixed length char buffer, and when to use dynamic memory allocation? For example, i want to read from a file, one line each time. Shall i assume the max length of a line, and use "char buf[MAX_LENGTH]", or dynamically allocates a buffer, and when the line is larger than the buffer, reallocates the buffer? In VB, i never have to worry about this. I can simply write: Line Input #fileno, curLine Then everything goes well. 2. If i write this: char str1[] = "abc" then contents of str1 can be changed, if i write: char * str2 = "abc" then string str2 pointing to can'be be changed. I wonder when to use char array and when to use char pointer. I think it sometimes causes confusion. Shall I always put a string constant in a char buffer? I read several textbooks, but none of them demonstrates the idioms of how to properly use strings in C(at least I think so). I also searched the internet, but didn't get any results. So please give me some clue or resources.
Advertisement
1) In most cases, people assume the line has a fixed length. The biggest problem is that you have to makes sure that the input will fit into that space otherwise you get a buffer overflow. Buffer overflows on user input is one of the most common source of security holes. Why do they do that? Because it is simpler than having a resizable buffer, juggling with input reading and buffer resizing may be a complex affair.

2) When you use the array, the string literal you used to initialize the array is actually copied into the array. When you use a pointer, you set it to the address of the string literal, which exists somewhere in the code of the program (and thus cannot be modified) - strictly speaking, it should be a const char*, but well, this is C... If you don't need to change the contents of the string, then you can use a pointer, if you need to manipulate it, you'll have to go with the array.
"Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." — Brian W. Kernighan
Sounds like you got the right idears.

I would advice to never use char *p = "litteral"; since as you poitned out yourself string literals are indeed const, if it need to change put it in a buffer directly like char sz[] = "litteral"; if you feel the urge to put it into a pointer be sure to const qualify it.

And I would say start out with a good guesstimate on how big your buffer need to be then reallocate (be sure to remember to add space for the trailing null), most often you want to overallocate quite a bit (1.5 - 2 times is a good ratio) because if you had to grow the buffer once chances are you'll want to do it again soon if you only create a "tight fit".

If you plan on doing heavy string work I would advice you to wrap them up in some sort of structure and create your own utility functions (probably built on str/mem -functions).
HardDrop - hard link shell extension."Tread softly because you tread on my dreams" - Yeats
Here's an example of line-wise reading (skipping empty lines) from file should work (it passes my tests but I could have missed some corner case) as you see you need to manually free the returned buffer afterwards.
#include <stdio.h>#include <malloc.h>int EatEmptyLines(FILE *fp){	fscanf( fp, "%*[\r\n]");	return !feof( fp);}#define READLINE_BLOCK 16#define STR(x) #x#define MAKESTR(x) STR(x)#define READLINE_FORMAT "%"MAKESTR(READLINE_BLOCK)"[^\r\n]%n%1[\r\n]"char *ReadLine(FILE *fp){	char *p = 0;	int size = READLINE_BLOCK, cap = READLINE_BLOCK;	int c = 0, n = 0;	char peek = 0;	if( feof( fp)) 		return 0;	p = malloc( READLINE_BLOCK + 1);	while( EatEmptyLines( fp) && fscanf( fp, READLINE_FORMAT, p + c, &n, &peek) != EOF && !peek)	{		cap -= n;		c += n;		if( cap < READLINE_BLOCK)		{			//need to grow			char *tmp = realloc( p, size + size + 1);			if( !tmp)				return p;			p = tmp;			cap += size;			size += size;		}		peek = 0;	}	if( !c && !n)		return (free( p), 0);	return p;}int main(void){	char *sz = 0;	while( sz = ReadLine( stdin))	{		printf( "%s\n", sz);		free( sz);	}}


HtH
HardDrop - hard link shell extension."Tread softly because you tread on my dreams" - Yeats
Quote:Original post by waterwalk
For example, i want to read from a file, one line each time. Shall i assume the max length of a line, and use "char buf[MAX_LENGTH]", or dynamically allocates a buffer, and when the line is larger than the buffer, reallocates the buffer?


When reading from a file, the easiest and most efficient thing to do is simply set aside a large chunk of memory to store each line in, far larger than you expect to encounter. This waste doesn't matter because you will free it when you finish reading the file, assuming you're just processing and not storing each line. Just be sure to check that you don't read in more data than you can store each time.
Thanks! I'll try it out.

This topic is closed to new replies.

Advertisement