[solved] Find the memory leak (C code)

Started by
18 comments, last by Zahlman 16 years, 6 months ago
Quote:Original post by legalize
I've noticed that far too often, posters on gdnet just dump there code into the forums and expect other people to find all their problems and post the corrected code. Remember that people here are not paid to be your slave; we answer other people's questions out of our own unpaid time.

Yet, nobody is forced to even read the code, let alone finding the bug. As long as the code is reasonably formatted, source-tags are used and the only the relevant portion is posted, I don't mind that.

Cheers,
Pat.


Advertisement
Quote:Original post by darookie
Quote:Original post by legalize
I've noticed that far too often, posters on gdnet just dump there code into the forums and expect other people to find all their problems and post the corrected code. Remember that people here are not paid to be your slave; we answer other people's questions out of our own unpaid time.

Yet, nobody is forced to even read the code, let alone finding the bug. As long as the code is reasonably formatted, source-tags are used and the only the relevant portion is posted, I don't mind that.

Cheers,
Pat.


I didn't say anyone was forced to read the code or debug it, but the behavior is just plain rude.

Its like me coming over to a party at your place and taking a big shit on the living room carpet and asking if you wouldn't mind cleaning that up for me.

My free book on Direct3D: "The Direct3D Graphics Pipeline"
My blog on programming, vintage computing, music, politics, etc.: Legalize Adulthood!

Quote:Original post by legalize
I didn't say anyone was forced to read the code or debug it, but the behavior is just plain rude.

Its like me coming over to a party at your place and taking a big shit on the living room carpet and asking if you wouldn't mind cleaning that up for me.


I think its reasonable to ask, because it is explained:
Quote:
well, the daemon is running since june and it's vmSize grew from 20Mb to 80Mb in 5 months.
That's a memory leak, but a very small one (a few bytes each call to this function) which accumulates in time.
Funny enough, valgrind didn't report memory leak in june when I tested the daemon.

edit:

must be in this function, because it's the only function which allocates small ammounts of memory (a few bytes for a path) each call.


The OP states clearly that the code is extensively tested. I think all the OP wants is a second opinion that he isn't missing something minor.
Quote:Original post by rip-off
The OP states clearly that the code is extensively tested. I think all the OP wants is a second opinion that he isn't missing something minor.


Only after being asked more questions; the first post in the thread is a drive-by "debug my code for me, here it is".

My free book on Direct3D: "The Direct3D Graphics Pipeline"
My blog on programming, vintage computing, music, politics, etc.: Legalize Adulthood!

"char *tok" shouldn't have to be freed.... but at this point try it anyway.
Besides that, are you sure the input parameter "char *path" gets freed after the call??
I'm almost sure it isn't being called with a const string or with a stack-allocated string, which means there are other pieces of code to analyze.
The code looks fine. If it's an error there, it is very small.
Try the following isolation:

int main()
{
for( i=0; i<100000000000; i++ )
{
utils_mkdir( "MyWorking dir" );
}
}

And start debugging :)
Memory should rise if it's there.
Also remove the last pieces of code and start adding it until memory starts growing.
I can't help you any further

Hope this helps
Dark Sylinc

PS: maaaaaaaaaaaaaaayyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy be it's a bug inside the C functions you use
Quote:Original post by Matias Goldberg
PS: maaaaaaaaaaaaaaayyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy be it's a bug inside the C functions you use


As I've already said, if you just run the daemon in question under a tool like Purify (which supported *nix stuff since it first came out, over 15+ years ago), then you will find out whether or not you are abusing memory. Purify is exhaustive and reliable, unlike human eyeballs which stare at broken code repeatedly not seeing the problem.

Use the right tools for the job.

My free book on Direct3D: "The Direct3D Graphics Pipeline"
My blog on programming, vintage computing, music, politics, etc.: Legalize Adulthood!

... Noone reviewing my second offering at all? If you don't allocate, you don't risk leaking an allocation...
The assumption that this function "must" cause the leak isn't necessarily true. Memory allocations can come from all kinds of places, including library calls.
enum Bool { True, False, FileNotFound };
ok, enough 'storming' about the code. I hope someone can close the topic.

I managed to find the leak in the meanwhile (and yes, I posted the code here because I needed a SECOND opinion).

first of all, some answers:
legalize:

thanks for the tip, I used valgrind (and reported leaks in libpthread which actually aren't leaks, but that's all it reported), next time I'll have a look at purify


Zahlman:
Quote:Just because you "have to" (???) use C is no excuse for not organizing your code better to avoid all the trickiness in the allocation code. And even in C, you can initialize variables at the point of declaration, and don't have to put them all at the top of the function... braces still introduce new scopes.
(Oh, and what exactly is recursive about this?)


1. Maybe the comment about what the function isn't very well formulated. The function is not recursive, but what the function does for path='/a/b/c' is mkdir('/a'); mkdir('/a/b'); mkdir('/a/b/c') and that "looked" like recursive. sorry, my mistake.

2. your code doesn't compile (unless you change FILE into XFILE) :D

3. yes, I provided an extra space for the null terminator :D

4. your second example (without any allocations) segfaults nicely when called like this: utils_mkdir("/a/b/c") (and not utils_mkdir(some_buf)) because you cannot modify the supplied string.

5. I didn't say I "have to" use C, I LIKE using C and sorry, but your 'organised' code looks like C code from a C++ programmer. I try code like that in C++ and C#, but NOT in C/linux. Do an apt-get source any_daemon_you_like and look at how the code is written. Maybe my personal preference isn't the best, but I like to continue writing it like that.

6. casting return from malloc is in general "good C practice" (similar to "using static_cast is in general good C++ practice", each world with his 'habbits'). I'm not a beginner so you don't have to tell me "did you know you don't have to cast the result from malloc()?"
Quote:
You can store the result of malloc into any pointer variable without a cast, because ISO C automatically converts the type void * to another type of pointer when necessary. But the cast is necessary in contexts other than assignment operators or if you might want your code to run in traditional C.

this is taken from http://www.gnu.org/software/libc/manual/html_node/Basic-Allocation.html.. what can I say, call me a 'traditionalist' ...



Now, for the FINAL answer to the original post:
I managed to solve the problem by using dmalloc (a very nice library, available at http://dmalloc.com/). It reported 2 free (NULL), one was in libpthread (I guess they know what they're doing) and one in this function.

Took me one day to find the problem: strsep() was causing it.
I misread the man page for it (no, I don't like remembering by heart all the functions in a library, that's why they invented documentation) and only saw the part which says "it modifies the supplied buffer and replaces token with \0" (that's why I use strdup on the original path).
What I didn't read was "*stringp is updated to point past the token", which in my case changed the buf pointer so the final free (buf) ended up in free(NULL).

corrected function:
int utils_mkdir (char *path){  char *buf;  char *bp; // instead of char **bp...  ...  bp = buf;  while ((tok = strsep(&bp, "/"))) {    ...


Sorry if I offended anyone with the original post, but as someone said: you don't have to read the code, NOR answer the post.
I was hoping that someone with a strong C and linux background might point what I'm doing wrong.

And some more infos about the daemon: it is used in production as image server (serves images to a bunch of web servers, from a storage server). We have around 13 million images stored on our servers and 6 million pageviews each day and I couldn't afford to have the storage server get all the 'hits' for images (and client INSISTS on having all the files on one server, they feel "safer" like that - whatever).
So, the system works like this:
- storage server runs this daemon (uses pthread to handle incoming connections)
- web servers run apache and php, whenever a request for 'image x' is received then apache tries to serve the file, if file not found, it runs php script which gets the file from the image server, stores it locally (future requests for the file will be satisfied directly by apache, until file is deleted because the frontend's cache was cleared).
Of course, users can also upload pics and in that case the daemon is used to safely store them on the storage server (after being processed on a web server, resize, etc.)

Maybe not the 'most elegant' way to do it (also, not the 'ugliest' way to do it, the person who handled this before me used NFS and had problems with 10 webservers crashing because one of the two storage server wasn't responding) but it runs very well (since january until now it served millions of request, I said the daemon runs since june because I restarted it then and started wondering about the vmSize of the process).

Thanks for all your replies.
Q: How many programmers does it take to write a nice piece of software?A: MORE.
Quote:Original post by cobru
1. Maybe the comment about what the function isn't very well formulated. The function is not recursive, but what the function does for path='/a/b/c' is mkdir('/a'); mkdir('/a/b'); mkdir('/a/b/c') and that "looked" like recursive. sorry, my mistake.


It's common to label things like that, but it's also common for people reviewing code to see the word "recursively" and expect an actual recursive call.

I think the word you are looking for is iteratively.

Quote:2. your code doesn't compile (unless you change FILE into XFILE) :D


X_X You are correct, of course.

Quote:4. your second example (without any allocations) segfaults nicely when called like this: utils_mkdir("/a/b/c") (and not utils_mkdir(some_buf)) because you cannot modify the supplied string.


Well, yes, but you're supposed to be able to assume that you can modify something that isn't marked as const. Such a disgusting wart on the language.

Anyway, you can still strdup() that if you must; one allocation is better than two.

Quote:5. I didn't say I "have to" use C, I LIKE using C


I suspect that there is a strong correlation between the ability to maintain this attitude and the fact that C is still used for anything at all these days. :x

Anyway, I didn't mean to imply that you said any such thing; I assumed that you wouldn't be using C if you had a choice. Oh Well.

Quote:and sorry, but your 'organised' code looks like C code from a C++ programmer.


Well, to be honest, it is C code from a C++ programmer. Your attempt to make this into a derogatory comment is somewhat amusing to me, though.

I like to think of the techniques shown here as just plain good factoring, which applies in any language (at least, any language sophisticated enough to support functions properly). A *real* C++ approach would be much more cavalier about allocation (because of being accustomed to RAII and the standard library doing the hard work automatically) :D After all, optimizing away the memory work can hardly be a major concern when you're interacting with the file system, hmm?

In the C code, though, it makes sense to hack around with NULL-dropping and avoiding the allocations because, well, allocations are a lot harder to manage without C++'s tools. (But I could, if pressed, emulate the "allocation-free" approach in modern C++ with std::string, too. And it'd avoid that nastiness with string literals, for free - at least in terms of coding effort.)

Quote:I try code like that in C++ and C#, but NOT in C/linux. Do an apt-get source any_daemon_you_like and look at how the code is written. Maybe my personal preference isn't the best, but I like to continue writing it like that.


I have heard of people who like to continue believing that the sun and planets orbit the Earth, too. And you know, it actually is a defensible way of looking at things, according to my understanding of relativity. But it sure makes the calculations harder - and for what benefit?

Quote:6. casting return from malloc is in general "good C practice" (similar to "using static_cast is in general good C++ practice", each world with his 'habbits'). I'm not a beginner so you don't have to tell me "did you know you don't have to cast the result from malloc()?"


You may think that non-beginners are supposed to know this stuff, but based on my experience, many of them don't. For that matter, you may expect any given non-beginner to, well, have learned things about programming, but again, you'd be amazed x.x In general, I don't assume people know things when they write code that demonstrates an ignorance of those things, deliberate or otherwise.

Although I can't imagine why you'd take a *further* step back from the decision to use C, by writing stuff carefully to baby-sit older (and I mean practically paleolithic, in computing terms) compilers.

Quote:What I didn't read was "*stringp is updated to point past the token"


And it didn't crash right away? "Lucky" you.
Anyway, I suppose the fact that you're expected to pass a ** should have been a clue? :) (TBH I'd never even *heard* of strsep(), but now I'm thinking I should have picked up on that too.)

This topic is closed to new replies.

Advertisement