Sign in to follow this  
falkone

Recursion and Memory Problems

Recommended Posts

I couldn't sleep last night, so I had the bright idea of using the win32 api to make a little index of files (from which I could later make comparison's via checksums and such..). Getting it implemented was simple enough.. I use FindFirstFile() and FindNextFile() along with SetCurrentDirectory() to move through the directory tree. There is a single function that looks through the contents of a directory.. if it finds a file it adds one to the file counter (i'll add more later).. if it finds a directory, it calls itself, passing the directory name. The function sets the current directory as the one that was passed to it and does all it's stuff on that, and so on. The problem isn't that it doesnt' work.. it works flawlessly. It was much more elegant than I figured it would be. The problem is that I run out of memory before it completes. My functions define an int, a WIN32_FIND_DATA, and a HANDLE at the beginning of the recursive function. The way the code is written, I cannot easily think of a way to make those functions global (well, I can.. but that would require more variables and no lessening of memory costs). Perhaps it's that I didn't sleep last night.. but I don't know quite what to do. I tried dynamically allocating the win32_find_data and the handle in hopes that i might slow down the memory monster, but the problem of limited memory still remains. The best solution my sleepless mind has come up with would be to pre-define all the variables in a global array. The function then attempts to requisition these variables.. if there isn't enough of them, the function won't go any farther and will return. This way, all resources will be re-useable and I will be able to directly control where it's going. This shouldn't be too difficult.. but if there is a better, simpler, and more elegant way to handle me running out of memory.. I'd love to hear about it. Again.. perhaps i can't think because I was coding all night.. dunno. Thanks for your input.

Share this post


Link to post
Share on other sites
What I suggest is to turn your recursive solution into an imperative one.

I assume your function looks something like this:


function( )
data processing
recursive calls( argument )


Create a stack that holds objects of the same type as the "argument". Whenever you would call the recursive function with an argument, push that argument onto the stack instead.

Whenever the function returns, pop the topmost argument from the argument stack, and call the function on it. If the stack is empty, you're done going through the directories.

To initialize the process, simply call the function once (this will fill the stack with whatever is necessary.

So in the end your function will look like:


apply_to_all( first argument )
function( first argument )
while( stack not empty )
function( pop stack )

function( argument )
data processing
push data onto stack

Share this post


Link to post
Share on other sites

Actually, I don't see how you're running out of stack space (or any form of memory)... Your recursion will only require as much memory as a multiple of the deepest directory; and the amounts that are being pushed onto the stack are really quite small.

Say your deepest directory was 20 deep, (which is quite large) you'll only need 20 * (obvious stack usage per function), even if this worked out as much as 1K per call, that's still only 20k or memory - much less than the default win32 stack size (1Mb).

Are you sure your recursion termination conditions are correct and you're not attempting to recurse into special directories like '.' and '..'. Alternatively there's another memory-consuming item you've not told us about.

Jans.



Share this post


Link to post
Share on other sites
There's also the stack frame for every function call (not sure how big it is but it adds a few bytes). I think Jansic is probably on the right track, there could be a special directory that's being recursed into. The special directories "." or ".." would both cause this kind of trouble. When debugging it, you can set a breakpoint with a condition or just look at your call stack in the debugger when you get the stack overflow.

All that said, I generally don't like recursive functions for pretty much this reason. ToohrVyk's solution is simpler to maintain. Anything that can be done recursively can be done iteratively (or imperative, whatever it's called). Remember that writing a recursive function just means that you're using the computer's stack instead of making your own.

Share this post


Link to post
Share on other sites
If possible, place the recursive function call at the very last line of the function. Assuming the compiler does a minimum of optimization, it'll reuse the stack frame for every recursive call, and essentially behave like the iterative version, and not eat up stack memory for each recursion.
(http://en.wikipedia.org/wiki/Tail_recursion)

Share this post


Link to post
Share on other sites
Spoonster: he might be having more than one recursive call per function (one per subdirectory), so tail recursion is not really an option here. If the stack doesn't support it, he'll have to do without the stack.

Share this post


Link to post
Share on other sites
rather than passing a completely new string to each recursive call, simply append onto the existing string for each dir and pass that, then for the next subdir just overwrite the last subdir name you just added and pass the same string etc.

You should only need 1 char array of 256 bytes or so, for the whole thing.

Also definately check that you aren't going into special directories such as "." and "..", and beware of some oddities in the "documents and settings" dir. That dir can seem to have mappings to dirs which are on another machine sometimes.

Share this post


Link to post
Share on other sites
Alright, I finally fixed the problem. I implemented the global memory control as I had thought, but I hadn't thought of doing it with a stack (Thanks a shitload, ToohrVyk.. I was making it more complex than it needed to be).. saved me a lot of time. I removed all variable definitions and arguments from the recursive function (made them global.. with a little extra tweaking it fit). I kept the function recursive though.

The memory didn't build up as fast as before.. but it still did. It soon hit me that the win32 file search was most likely using more memory than was immediately obvious... luckily it didn't take me too long to realize this and I immediately looked to see if there was a way to close the search.. and indeed there was. Now, with a combination of controlled global variables and correct closing of the searches, the memory usage remains constant. So far it can scan through a directory or entire drive and accurately tell how many files are contained within.

My next step is to create an index of the searched files from which I can match their file size. I'll then run a quick 128-bit hash on files with similar size.. using this method I'll be able to find doubles of files. I'll eventually add a simple gui and some search filtering. Would any of you use a program like this?

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this