Sign in to follow this  
cobru

[solved] Find the memory leak (C code)

Recommended Posts

cobru    193
written for some linux daemon...
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <errno.h>

/* 
   recursive create all directories for path (like mkdir -p command)
 */

int utils_mkdir (char *path)
{
  char *buf;
  char **bp;
  char *tok;
  char *workpath;
  struct stat statbuf;
  int len, worklen, toklen;

  len = strlen (path) + 1;
  buf = strdup (path);

  if (!buf)
    return 0;

  workpath = (char *) malloc (len + 1);

  if (!workpath) {
    free (buf);
    return 0;
  }

  workpath[0] = 0;
  worklen = 0;

  bp = &buf;
  while ((tok = strsep (bp, "/"))) {
    toklen = strlen (tok);
    if (toklen == 0)		// "path like // doesn't matter"
      continue;

    // append tok to the full path
    if (worklen != 0) {
      workpath[worklen] = '/';
      worklen++;
    }

    memcpy ((workpath + worklen), tok, toklen + 1);
    worklen += toklen;

    if (stat (workpath, &statbuf) < 0) { // stat failed
      if (errno == ENOENT)	// dir not found, ok to mkdir it!
      {
	if (mkdir (workpath, S_IRWXU) < 0)
	  if (errno != EEXIST) // someone else could have made it , check errno
           { 
              free (buf);
              free (workpath);
	      return 0; // can't mkdir path (check perms)
           }
      }
      else { // some other errno
        free (buf);
        free (workpath);
	return 0; // can't stat path! (check perms)
      }
    }
    else if (!S_ISDIR (statbuf.st_mode)) {
      errno = EEXIST;
      free (buf);
      free (workpath);
      return 0; // path contains an existing file, can't mkdir a file!
    }

  }

  free (buf);
  free (workpath);

  return 1; // mkdir -p worked :D
}


[Edited by - cobru on October 11, 2007 1:51:56 AM]

Share this post


Link to post
Share on other sites
Jonny_S    149
I'm not the best C coder in the world (considering I've never used it), but I've looked over twice and couldn't see that would cause a memory leak...

Share this post


Link to post
Share on other sites
cobru    193
well, the daemon is running since june and it's vmSize grew from 20Mb to 80Mb in 5 months.
That's a memory leak, but a very small one (a few bytes each call to this function) which accumulates in time.
Funny enough, valgrind didn't report memory leak in june when I tested the daemon.

edit:

must be in this function, because it's the only function which allocates small ammounts of memory (a few bytes for a path) each call.

Share this post


Link to post
Share on other sites
cobru    193
Quote:
Original post by ToohrVyk
Have you considered heap fragmentation? This can increase the memory usage of a process without any actual memory leaks.


yes, I did... but the paths supplied to the function (given the way I'm using it currently) are pretty much the same length (+- a few bytes) libc should be able to re-use the freed blocks.
All I malloc() are these paths and some small control structures, the big chunks of memory I use are obtained via mmap() and munmap(), that makes the memory available immediately after releasing it and doesn't cause heap fragmentation.

Share this post


Link to post
Share on other sites
Evil Steve    2017
What about using alloca() instead of malloc here? That allocates memory on the stack, and will be fine for small chunks of data like this. Or just use static arrays.

Share this post


Link to post
Share on other sites
Zahlman    1682
Just because you "have to" (???) use C is no excuse for not organizing your code better to avoid all the trickiness in the allocation code. And even in C, you can initialize variables at the point of declaration, and don't have to put them all at the top of the function... braces still introduce new scopes.

(Oh, and what exactly is recursive about this?)


#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <errno.h>

enum { FILE, FOLDER, DNE, ERROR } fs_entry_type;

int determine_type(char* path) {
struct stat statbuf;
if (stat(path, &statbuf) < 0) {
return (errno == ENOENT) ? DNE : ERROR;
}
return S_ISDIR(statbuf.st_mode) ? FOLDER : FILE;
}

int utils_mkdir_helper(char *path, char *buf, char *workpath) {
char *tok;
int worklen = 0;

// Put in a null terminator in case of an empty path... ?
workpath[worklen] = 0;

while ((tok = strsep(&buf, "/"))) {
int toklen = strlen(tok);
if (toklen == 0) { continue; }

// append tok to the full path
if (worklen != 0) {
workpath[worklen] = '/';
worklen++;
}
// We copy the null terminator each time and overlap writes on top of
// the previous null terminators; that way the string is properly terminated
// at the end without having to do it manually... ?
memcpy ((workpath + worklen), tok, toklen + 1);
worklen += toklen;

switch (determine_type(workpath)) {
case FILE:
errno = EEXIST;
return 0;

case DNE:
if (mkdir(workpath, S_IRWXU) < 0) {
if (errno != EEXIST) { return 0; } // propagate error
}
// AND FALL THROUGH TO
case FOLDER:
break; // folder exists or created successfully (whether by us or between
// the stat() check and mkdir() call); try the next path component.

default:
return 0;
}
}
return 1; // all path components succeeded.
}


int utils_mkdir (char *path) {
int result = 0;
char* buf = strdup(path);
/* Don't cast the result from malloc()! It is supposed to work without a
cast as long as you are properly including stdlib.h.
Also, it looks like you provided space for a null terminator twice ;)
I have reproduced that here anyway... */

char* workpath = malloc(strlen(path) + 2);
if (buf && workpath) {
result = utils_mkdir_helper(path, buf, workpath);
}
if (buf) { free(buf); }
if (workpath) { free(workpath); }
return result;
}





Although I don't think you need to do all that tricky string copying. Just *find* the slashes, and temporarily replace them with nulls (switching them back at the end of the iteration).

EDIT: Actually, I don't think you have to allocate anything at all.


#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <errno.h>

enum { FILE, FOLDER, DNE, ERROR } fs_entry_type;

int determine_type(char* path) {
struct stat statbuf;
if (stat(path, &statbuf) < 0) {
return (errno == ENOENT) ? DNE : ERROR;
}
return S_ISDIR(statbuf.st_mode) ? FOLDER : FILE;
}

int utils_mkdir_helper(char *path) {
switch (determine_type(path)) {
case FILE:
errno = EEXIST;
return 0;

case DNE:
if (mkdir(path, S_IRWXU) < 0) {
if (errno != EEXIST) { return 0; } // propagate error
}
// AND FALL THROUGH TO
case FOLDER:
break; // folder exists or created successfully (whether by us or between
// the stat() check and mkdir() call); try the next path component.

default:
return 0;
}
}
return 1; // all path components succeeded.
}

int utils_mkdir(char *path) {
char *tok;
int result = 1;
while ((tok = strchr(path, '/')) && result) {
*tok = '\0';
result = utils_mkdir_helper(path);
*tok = '/';
}
return result;
}



[Edited by - Zahlman on October 10, 2007 1:30:37 PM]

Share this post


Link to post
Share on other sites
legalize    116
Its a pet peeve of mine when people dump their code into a forum and say "here, find my bug!".

Learn how to use a tool like Purify or Bounds Checker. Purify has been around for 15 years, its not exactly a new tool, and it is guaranteed to find every single memory abuse bug in your code. You can even get a limited time free trial and I believe BC is the same way.

I've noticed that far too often, posters on gdnet just dump there code into the forums and expect other people to find all their problems and post the corrected code. Remember that people here are not paid to be your slave; we answer other people's questions out of our own unpaid time.

Share this post


Link to post
Share on other sites
darookie    1441
Quote:
Original post by legalize
I've noticed that far too often, posters on gdnet just dump there code into the forums and expect other people to find all their problems and post the corrected code. Remember that people here are not paid to be your slave; we answer other people's questions out of our own unpaid time.

Yet, nobody is forced to even read the code, let alone finding the bug. As long as the code is reasonably formatted, source-tags are used and the only the relevant portion is posted, I don't mind that.

Cheers,
Pat.


Share this post


Link to post
Share on other sites
legalize    116
Quote:
Original post by darookie
Quote:
Original post by legalize
I've noticed that far too often, posters on gdnet just dump there code into the forums and expect other people to find all their problems and post the corrected code. Remember that people here are not paid to be your slave; we answer other people's questions out of our own unpaid time.

Yet, nobody is forced to even read the code, let alone finding the bug. As long as the code is reasonably formatted, source-tags are used and the only the relevant portion is posted, I don't mind that.

Cheers,
Pat.


I didn't say anyone was forced to read the code or debug it, but the behavior is just plain rude.

Its like me coming over to a party at your place and taking a big shit on the living room carpet and asking if you wouldn't mind cleaning that up for me.

Share this post


Link to post
Share on other sites
rip-off    10976
Quote:
Original post by legalize
I didn't say anyone was forced to read the code or debug it, but the behavior is just plain rude.

Its like me coming over to a party at your place and taking a big shit on the living room carpet and asking if you wouldn't mind cleaning that up for me.


I think its reasonable to ask, because it is explained:
Quote:

well, the daemon is running since june and it's vmSize grew from 20Mb to 80Mb in 5 months.
That's a memory leak, but a very small one (a few bytes each call to this function) which accumulates in time.
Funny enough, valgrind didn't report memory leak in june when I tested the daemon.

edit:

must be in this function, because it's the only function which allocates small ammounts of memory (a few bytes for a path) each call.


The OP states clearly that the code is extensively tested. I think all the OP wants is a second opinion that he isn't missing something minor.

Share this post


Link to post
Share on other sites
legalize    116
Quote:
Original post by rip-off
The OP states clearly that the code is extensively tested. I think all the OP wants is a second opinion that he isn't missing something minor.


Only after being asked more questions; the first post in the thread is a drive-by "debug my code for me, here it is".

Share this post


Link to post
Share on other sites
Matias Goldberg    9577
"char *tok" shouldn't have to be freed.... but at this point try it anyway.
Besides that, are you sure the input parameter "char *path" gets freed after the call??
I'm almost sure it isn't being called with a const string or with a stack-allocated string, which means there are other pieces of code to analyze.
The code looks fine. If it's an error there, it is very small.
Try the following isolation:

int main()
{
for( i=0; i<100000000000; i++ )
{
utils_mkdir( "MyWorking dir" );
}
}

And start debugging :)
Memory should rise if it's there.
Also remove the last pieces of code and start adding it until memory starts growing.
I can't help you any further

Hope this helps
Dark Sylinc

PS: maaaaaaaaaaaaaaayyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy be it's a bug inside the C functions you use

Share this post


Link to post
Share on other sites
legalize    116
Quote:
Original post by Matias Goldberg
PS: maaaaaaaaaaaaaaayyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy be it's a bug inside the C functions you use


As I've already said, if you just run the daemon in question under a tool like Purify (which supported *nix stuff since it first came out, over 15+ years ago), then you will find out whether or not you are abusing memory. Purify is exhaustive and reliable, unlike human eyeballs which stare at broken code repeatedly not seeing the problem.

Use the right tools for the job.

Share this post


Link to post
Share on other sites
cobru    193
ok, enough 'storming' about the code. I hope someone can close the topic.

I managed to find the leak in the meanwhile (and yes, I posted the code here because I needed a SECOND opinion).

first of all, some answers:
legalize:

thanks for the tip, I used valgrind (and reported leaks in libpthread which actually aren't leaks, but that's all it reported), next time I'll have a look at purify


Zahlman:
Quote:
Just because you "have to" (???) use C is no excuse for not organizing your code better to avoid all the trickiness in the allocation code. And even in C, you can initialize variables at the point of declaration, and don't have to put them all at the top of the function... braces still introduce new scopes.
(Oh, and what exactly is recursive about this?)


1. Maybe the comment about what the function isn't very well formulated. The function is not recursive, but what the function does for path='/a/b/c' is mkdir('/a'); mkdir('/a/b'); mkdir('/a/b/c') and that "looked" like recursive. sorry, my mistake.

2. your code doesn't compile (unless you change FILE into XFILE) :D

3. yes, I provided an extra space for the null terminator :D

4. your second example (without any allocations) segfaults nicely when called like this: utils_mkdir("/a/b/c") (and not utils_mkdir(some_buf)) because you cannot modify the supplied string.

5. I didn't say I "have to" use C, I LIKE using C and sorry, but your 'organised' code looks like C code from a C++ programmer. I try code like that in C++ and C#, but NOT in C/linux. Do an apt-get source any_daemon_you_like and look at how the code is written. Maybe my personal preference isn't the best, but I like to continue writing it like that.

6. casting return from malloc is in general "good C practice" (similar to "using static_cast is in general good C++ practice", each world with his 'habbits'). I'm not a beginner so you don't have to tell me "did you know you don't have to cast the result from malloc()?"
Quote:

You can store the result of malloc into any pointer variable without a cast, because ISO C automatically converts the type void * to another type of pointer when necessary. But the cast is necessary in contexts other than assignment operators or if you might want your code to run in traditional C.

this is taken from http://www.gnu.org/software/libc/manual/html_node/Basic-Allocation.html.. what can I say, call me a 'traditionalist' ...



Now, for the FINAL answer to the original post:
I managed to solve the problem by using dmalloc (a very nice library, available at http://dmalloc.com/). It reported 2 free (NULL), one was in libpthread (I guess they know what they're doing) and one in this function.

Took me one day to find the problem: strsep() was causing it.
I misread the man page for it (no, I don't like remembering by heart all the functions in a library, that's why they invented documentation) and only saw the part which says "it modifies the supplied buffer and replaces token with \0" (that's why I use strdup on the original path).
What I didn't read was "*stringp is updated to point past the token", which in my case changed the buf pointer so the final free (buf) ended up in free(NULL).

corrected function:

int utils_mkdir (char *path)
{
char *buf;
char *bp; // instead of char **bp...

...

bp = buf;
while ((tok = strsep(&bp, "/"))) {
...



Sorry if I offended anyone with the original post, but as someone said: you don't have to read the code, NOR answer the post.
I was hoping that someone with a strong C and linux background might point what I'm doing wrong.

And some more infos about the daemon: it is used in production as image server (serves images to a bunch of web servers, from a storage server). We have around 13 million images stored on our servers and 6 million pageviews each day and I couldn't afford to have the storage server get all the 'hits' for images (and client INSISTS on having all the files on one server, they feel "safer" like that - whatever).
So, the system works like this:
- storage server runs this daemon (uses pthread to handle incoming connections)
- web servers run apache and php, whenever a request for 'image x' is received then apache tries to serve the file, if file not found, it runs php script which gets the file from the image server, stores it locally (future requests for the file will be satisfied directly by apache, until file is deleted because the frontend's cache was cleared).
Of course, users can also upload pics and in that case the daemon is used to safely store them on the storage server (after being processed on a web server, resize, etc.)

Maybe not the 'most elegant' way to do it (also, not the 'ugliest' way to do it, the person who handled this before me used NFS and had problems with 10 webservers crashing because one of the two storage server wasn't responding) but it runs very well (since january until now it served millions of request, I said the daemon runs since june because I restarted it then and started wondering about the vmSize of the process).

Thanks for all your replies.

Share this post


Link to post
Share on other sites
Zahlman    1682
Quote:
Original post by cobru
1. Maybe the comment about what the function isn't very well formulated. The function is not recursive, but what the function does for path='/a/b/c' is mkdir('/a'); mkdir('/a/b'); mkdir('/a/b/c') and that "looked" like recursive. sorry, my mistake.


It's common to label things like that, but it's also common for people reviewing code to see the word "recursively" and expect an actual recursive call.

I think the word you are looking for is iteratively.

Quote:
2. your code doesn't compile (unless you change FILE into XFILE) :D


X_X You are correct, of course.

Quote:
4. your second example (without any allocations) segfaults nicely when called like this: utils_mkdir("/a/b/c") (and not utils_mkdir(some_buf)) because you cannot modify the supplied string.


Well, yes, but you're supposed to be able to assume that you can modify something that isn't marked as const. Such a disgusting wart on the language.

Anyway, you can still strdup() that if you must; one allocation is better than two.

Quote:
5. I didn't say I "have to" use C, I LIKE using C


I suspect that there is a strong correlation between the ability to maintain this attitude and the fact that C is still used for anything at all these days. :x

Anyway, I didn't mean to imply that you said any such thing; I assumed that you wouldn't be using C if you had a choice. Oh Well.

Quote:
and sorry, but your 'organised' code looks like C code from a C++ programmer.


Well, to be honest, it is C code from a C++ programmer. Your attempt to make this into a derogatory comment is somewhat amusing to me, though.

I like to think of the techniques shown here as just plain good factoring, which applies in any language (at least, any language sophisticated enough to support functions properly). A *real* C++ approach would be much more cavalier about allocation (because of being accustomed to RAII and the standard library doing the hard work automatically) :D After all, optimizing away the memory work can hardly be a major concern when you're interacting with the file system, hmm?

In the C code, though, it makes sense to hack around with NULL-dropping and avoiding the allocations because, well, allocations are a lot harder to manage without C++'s tools. (But I could, if pressed, emulate the "allocation-free" approach in modern C++ with std::string, too. And it'd avoid that nastiness with string literals, for free - at least in terms of coding effort.)

Quote:
I try code like that in C++ and C#, but NOT in C/linux. Do an apt-get source any_daemon_you_like and look at how the code is written. Maybe my personal preference isn't the best, but I like to continue writing it like that.


I have heard of people who like to continue believing that the sun and planets orbit the Earth, too. And you know, it actually is a defensible way of looking at things, according to my understanding of relativity. But it sure makes the calculations harder - and for what benefit?

Quote:
6. casting return from malloc is in general "good C practice" (similar to "using static_cast is in general good C++ practice", each world with his 'habbits'). I'm not a beginner so you don't have to tell me "did you know you don't have to cast the result from malloc()?"


You may think that non-beginners are supposed to know this stuff, but based on my experience, many of them don't. For that matter, you may expect any given non-beginner to, well, have learned things about programming, but again, you'd be amazed x.x In general, I don't assume people know things when they write code that demonstrates an ignorance of those things, deliberate or otherwise.

Although I can't imagine why you'd take a *further* step back from the decision to use C, by writing stuff carefully to baby-sit older (and I mean practically paleolithic, in computing terms) compilers.

Quote:
What I didn't read was "*stringp is updated to point past the token"


And it didn't crash right away? "Lucky" you.
Anyway, I suppose the fact that you're expected to pass a ** should have been a clue? :) (TBH I'd never even *heard* of strsep(), but now I'm thinking I should have picked up on that too.)

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this