Garbage being copied in between strings during concatenation

Started by
5 comments, last by Zahlman 17 years, 1 month ago

char* process() {
	FILE* fp;
	char* str = NULL, *newstr = NULL;
	fp = fopen("file.txt", "r");

	while( (str = getaline(fp)) != NULL ) {
		if( str[0] == '>' ) {

		}
		else {
			if( newstr == NULL ) {
				newstr = (char*)realloc(newstr, strlen(str)*sizeof(char));
				strcpy(newstr, str);
			}
			else {
				newstr = (char*)realloc(newstr, strlen(str)*sizeof(char));
				strcat(newstr, str);
			}
			free(str);
		}
	}

	fclose(fp);
	return newstr;
}
I have a file where entries span multiple lines and are marked with a starting character. My problem is when I try concatenating the strings as I read them in, they get corrupted along the way. Ex. File: >blah abcde abcde When I run this function the string I'll get back is something like: abcde@#$*@#abcde
//------------------------------------------------------------------------------------------------------The great logician Bertrand Russell once claimed that he could prove anything if given that 1+1=1. So one day, some fool asked him, "Ok. Prove that you're the Pope." He thought for a while and proclaimed, "I am one. The Pope is one. Therefore, the Pope and I are one."
Advertisement
You're not providing space for the terminator.

Also, moved to For Beginners.
^^ what he said ;)
Also:
-- sizeof(char) is always 1, so you can remove that bit of code.
-- When (newstr==NULL) you should be using malloc, not realloc
-- When (newstr!=NULL) you should realloc to strlen(newstr)+strlen(str) (plus 1 for the terminator)
			if( newstr == NULL ) {				newstr = (char*)malloc(strlen(str)+1);				strcpy(newstr, str);			}			else {				newstr = (char*)realloc(newstr, strlen(newstr)+strlen(str)+1);				strcat(newstr, str);			}
Ah, the problem was the first realloc. I though if the buffer you passed in was null, it would just act as a malloc.
//------------------------------------------------------------------------------------------------------The great logician Bertrand Russell once claimed that he could prove anything if given that 1+1=1. So one day, some fool asked him, "Ok. Prove that you're the Pope." He thought for a while and proclaimed, "I am one. The Pope is one. Therefore, the Pope and I are one."
Quote:Original post by Sfpiano
Ah, the problem was the first realloc. I though if the buffer you passed in was null, it would just act as a malloc.


well, it might on some compilers, but its not good style :)

I'd be more worried about using "alloc(strlen(x))" instead of "alloc(strlen(x)+1)"
Quote:Original post by Sfpiano
...I though if the buffer you passed in was null, it would just act as a malloc.


Yes, you can use realloc in place of both malloc and free.
John BoltonLocomotive Games (THQ)Current Project: Destroy All Humans (Wii). IN STORES NOW!
0) The obligatory "why aren't you doing this in C++ where it's infinitely easier and the overhead is provably negligible or even negative (since a smarter algorithm is implemented than what you were thinking of), look, I'll show you" rant goes here :) But even in C we can learn tips and tricks that are applicable to programming in general, such as (1) below:

1) You don't need a special case at all. Handling NULL as a special case smells; consider "null objects" instead. Granted, you don't really have "objects" in C, but the natural "null object" for a C-string is... an empty string. If we just initialize 'newstr' as "" instead of NULL, then we can realloc the first time around just fine, as the strlen() of the "existing" string is 0. But it does seem that we need to allocate this string so that realloc() doesn't try to free() a literal :s

2) As noted, you need space for the null terminator. Also, are you sure your getaline() function does null termination?

3) I assume you are going to have multiple entries in the file eventually. You probably will need more robust handling of the '>' characters in that case. Right now, with the empty if clause, the code will concatenate everything it encounters until the end of the file, simply skipping over the '>' lines.

I assume that the text on these lines is intended to "label" the entries. You should consider if you need support for a multi-line label ;) But otherwise, the process you really want is something like:

- read lines until we find a label.- Starting with the line after that, loop until a peek fails:  - If the peeked character is a '>', immediately return what we have so far.  - Otherwise, read the next line and concatenate it.


4) Just because C requires you to declare variables at top of scope doesn't mean you shouldn't still initialize instead of assigning as much as possible.

int peek(FILE* f) {	int result = getc(fp);	ungetc(c, fp); /* doesnt matter if this is EOF. */	return result;}/* The loop body seems useful enough in general to extract: */char* append_reallocated(char* base, char* suffix) {	/* You shouldn't cast malloc() results in modern C, and I	   assume that applies to realloc() too. It hides the error of	   failing to #include <stdlib.h>, IIRC. */	base = realloc(base, strlen(suffix) + 1);	strcat(base, suffix);	return base; // for chaining, I guess... seems consistent with how	// the standard library behaves.}/* Since this function just reads one item of many, it shouldn't do the file   opening/closing now. */char* process(FILE* fp) {	char c;	char* line;	char* result = malloc(1);	assert(result); *result = '\0';	/* Skip to the next item. */	while ((c = peek(fp)) != EOF && c != '>') {		free(getaline(fp));	}	/* Skip the label (safe if we're at EOF; we just free NULL) */	free(getaline(fp));	/* Read the item (returns "" if we're at EOF OR an item is empty...	   hmm, if you need empty items then you'll need to fix this ;) */	while ((c = peek(fp)) != EOF && c != '>') {		append_reallocated(result, (str = getaline(fp)));		free(str);	}	return result;}int main() {	FILE* fp = fopen("file.txt", "r");	char* str;	/* Check for the "null object", instead of a null pointer... */	while((str = process(fp))[0]) {		printf("Read an item: %s", str);	}	fclose(fp);}

This topic is closed to new replies.

Advertisement