Recursion and Memory Problems

Started by
10 comments, last by iMalc 19 years, 8 months ago
I couldn't sleep last night, so I had the bright idea of using the win32 api to make a little index of files (from which I could later make comparison's via checksums and such..). Getting it implemented was simple enough.. I use FindFirstFile() and FindNextFile() along with SetCurrentDirectory() to move through the directory tree. There is a single function that looks through the contents of a directory.. if it finds a file it adds one to the file counter (i'll add more later).. if it finds a directory, it calls itself, passing the directory name. The function sets the current directory as the one that was passed to it and does all it's stuff on that, and so on. The problem isn't that it doesnt' work.. it works flawlessly. It was much more elegant than I figured it would be. The problem is that I run out of memory before it completes. My functions define an int, a WIN32_FIND_DATA, and a HANDLE at the beginning of the recursive function. The way the code is written, I cannot easily think of a way to make those functions global (well, I can.. but that would require more variables and no lessening of memory costs). Perhaps it's that I didn't sleep last night.. but I don't know quite what to do. I tried dynamically allocating the win32_find_data and the handle in hopes that i might slow down the memory monster, but the problem of limited memory still remains. The best solution my sleepless mind has come up with would be to pre-define all the variables in a global array. The function then attempts to requisition these variables.. if there isn't enough of them, the function won't go any farther and will return. This way, all resources will be re-useable and I will be able to directly control where it's going. This shouldn't be too difficult.. but if there is a better, simpler, and more elegant way to handle me running out of memory.. I'd love to hear about it. Again.. perhaps i can't think because I was coding all night.. dunno. Thanks for your input.
Disclaimer: "I am in no way qualified to present advice on any topic concerning anything and can not be held responsible for any damages that my advice may incurr (due to neither my negligence nor yours)"
Advertisement
What I suggest is to turn your recursive solution into an imperative one.

I assume your function looks something like this:

function( )  data processing  recursive calls( argument )


Create a stack that holds objects of the same type as the "argument". Whenever you would call the recursive function with an argument, push that argument onto the stack instead.

Whenever the function returns, pop the topmost argument from the argument stack, and call the function on it. If the stack is empty, you're done going through the directories.

To initialize the process, simply call the function once (this will fill the stack with whatever is necessary.

So in the end your function will look like:

apply_to_all( first argument )  function( first argument )  while( stack not empty )    function( pop stack )function( argument )  data processing  push data onto stack

Mm.. i see. somewhat similar to I was suggesting.. but much much simpler than the way I was going to go about it. Thanks.
Disclaimer: "I am in no way qualified to present advice on any topic concerning anything and can not be held responsible for any damages that my advice may incurr (due to neither my negligence nor yours)"

Actually, I don't see how you're running out of stack space (or any form of memory)... Your recursion will only require as much memory as a multiple of the deepest directory; and the amounts that are being pushed onto the stack are really quite small.

Say your deepest directory was 20 deep, (which is quite large) you'll only need 20 * (obvious stack usage per function), even if this worked out as much as 1K per call, that's still only 20k or memory - much less than the default win32 stack size (1Mb).

Are you sure your recursion termination conditions are correct and you're not attempting to recurse into special directories like '.' and '..'. Alternatively there's another memory-consuming item you've not told us about.

Jans.



If he is allocating strings on the stack, there is the risk of overflowing the stack. And by the sound of him passing the directory name to the recursive calls...
There's also the stack frame for every function call (not sure how big it is but it adds a few bytes). I think Jansic is probably on the right track, there could be a special directory that's being recursed into. The special directories "." or ".." would both cause this kind of trouble. When debugging it, you can set a breakpoint with a condition or just look at your call stack in the debugger when you get the stack overflow.

All that said, I generally don't like recursive functions for pretty much this reason. ToohrVyk's solution is simpler to maintain. Anything that can be done recursively can be done iteratively (or imperative, whatever it's called). Remember that writing a recursive function just means that you're using the computer's stack instead of making your own.
If possible, place the recursive function call at the very last line of the function. Assuming the compiler does a minimum of optimization, it'll reuse the stack frame for every recursive call, and essentially behave like the iterative version, and not eat up stack memory for each recursion.
(http://en.wikipedia.org/wiki/Tail_recursion)
Spoonster: he might be having more than one recursive call per function (one per subdirectory), so tail recursion is not really an option here. If the stack doesn't support it, he'll have to do without the stack.
rather than passing a completely new string to each recursive call, simply append onto the existing string for each dir and pass that, then for the next subdir just overwrite the last subdir name you just added and pass the same string etc.

You should only need 1 char array of 256 bytes or so, for the whole thing.

Also definately check that you aren't going into special directories such as "." and "..", and beware of some oddities in the "documents and settings" dir. That dir can seem to have mappings to dirs which are on another machine sometimes.
"In order to understand recursion, you must first understand recursion."
My website dedicated to sorting algorithms
Alright, I finally fixed the problem. I implemented the global memory control as I had thought, but I hadn't thought of doing it with a stack (Thanks a shitload, ToohrVyk.. I was making it more complex than it needed to be).. saved me a lot of time. I removed all variable definitions and arguments from the recursive function (made them global.. with a little extra tweaking it fit). I kept the function recursive though.

The memory didn't build up as fast as before.. but it still did. It soon hit me that the win32 file search was most likely using more memory than was immediately obvious... luckily it didn't take me too long to realize this and I immediately looked to see if there was a way to close the search.. and indeed there was. Now, with a combination of controlled global variables and correct closing of the searches, the memory usage remains constant. So far it can scan through a directory or entire drive and accurately tell how many files are contained within.

My next step is to create an index of the searched files from which I can match their file size. I'll then run a quick 128-bit hash on files with similar size.. using this method I'll be able to find doubles of files. I'll eventually add a simple gui and some search filtering. Would any of you use a program like this?
Disclaimer: "I am in no way qualified to present advice on any topic concerning anything and can not be held responsible for any damages that my advice may incurr (due to neither my negligence nor yours)"

This topic is closed to new replies.

Advertisement