My memory is broken. I didn't do it!!!

Started by
11 comments, last by Zahlman 16 years, 9 months ago
I made this application today to count how many words everyone said on AIM in logs, and which words were everyone's favorite. So the program has to 1) see all the .log files in a directory, 2) open all the logs 3) count all the words in logs Sounds simple, right? Well, it is. But my program somehow manages to crash randomly, and my debugger flipped out on me. Can someone please run this through a debugger (or take a look at the code?) and tell me whats wrong with it? (C++/WinApi/made with Dev-C++,mingwgcc) http://pastebin.com/m22a347b9 << easier to see. Someone please help me with this... I am just at a loss...

#include &lt;cstdlib&gt;
#include &lt;iostream&gt;
#include &lt;fstream&gt;
#include &lt;cstring&gt;
 #include &lt;sys/stat.h&gt;


#define null NULL

//,log
using namespace std;


struct wordstr{
       char*word;
       int len;
       int count;
}*word;
int numberWord=0;
void separateWords(char * buffer,int size) // size is in bytes
{

int letterCount=0;

numberWord=0;


for(int i=0;i&lt;size;i++)
{
       
letterCount++;
if(numberWord &gt; 1020)
return;

if(buffer&lt;64 || buffer&gt;122)
 {
 if(letterCount&gt;2) {
                   
                   for(int l=0;l&lt;numberWord;l++) // Make sure its not a duplicate
                   if(0==memcmp(&buffer+i-letterCount,word[l].word,letterCount))
                   {
                   word[l].count++;
                   letterCount=0;
                   break;
                   }
                   else{
                   word[numberWord].len=letterCount;
                   word[numberWord].word=(char*) malloc(sizeof(char)*letterCount+1);
                   for(int o=0;o&lt;letterCount;o++)
                   word[numberWord].word[o]=buffer[i-o];
                   word[numberWord].word[letterCount+1]='\n';
                   letterCount=0;
                   word[numberWord].count++;
                   numberWord++;}
                   }
  letterCount=0;
 
 }

}




}

int main(int argc, char *argv[])
{
    system (" dir /b  &gt; temp.txt" );
    ifstream myFile;
    myFile.open("temp.txt");
    string filename[50];   /// Slower, and nastier, but easier to debug.
    int files=0; 
    char *temp = (char*) malloc(128*sizeof(char));
    while(myFile.eof()==0)
    {myFile.getline(temp,80);
    if(strstr(temp,".log")!=null)
    {files++;filename[files]=temp;}
    }
    myFile.close();
    free(temp);
    struct stat results;
    
     word= (wordstr*)malloc(1024*sizeof(wordstr));
               
               
               for(;files&gt;0;files--)
{
       memset(&results,0,sizeof(results));
       myFile.clear();
       myFile.open(filename[files].c_str(),ios::binary);
       stat(filename[files].c_str(), &results);
       temp=(char*)malloc(results.st_size);
       myFile.seekg (0, ios::beg);
       myFile.read(temp,results.st_size);
       
       
       
       separateWords(temp,results.st_size);
       
       
       free(temp);
       myFile.close();
}

 ofstream myFile2;
 myFile2.open("Results.txt");
 myFile2&lt;&lt; "Total words:" &lt;&lt;numberWord &lt;&lt;endl;
 
 for(;numberWord&gt;0;numberWord--)
 myFile2&lt;&lt;(char*)word[numberWord].word &lt;&lt;" with number of used"&lt;&lt; word[numberWord].count&lt;&lt; endl;
 
 myFile2.close();
 myFile.close();
    free(word);
    system("del temp.txt");
    system("PAUSE");
    return EXIT_SUCCESS;
}
Advertisement
Are you sure it's nothing like number of files being greater than 50 or number of different words being greater than 1024 or something like that then?

I don't have AIM so I can't test it.

Edit: Should you really be resetting numberWord to 0 all the time?
Quote:Original post by DvDmanDT
Are you sure it's nothing like number of files being greater than 50 or something like that then?

I don't have AIM so I can't test it.


I have trillian, and it stores all logs. But to test this I only put one log into the application folder. It opens it perfectly tho :/
1. Null terminate your strings. (otherwise it'll crash when you try to output them to your result file)
2. Something is wierd in your code.

You loop through all known words, to see if the word already exists. If it's not the first word, you'll add it, then you'll check if it's the second word, otherwise add it, then you'll check if it's the third word, otherwise add it and so on. At least I think so. There's an if with an else, and that if shouldn't have that else I think. Instead use { and } on your for construct.

Edit: Consider using std::string instead of char*, it may very well save you tons of headache. You can convert a std::string to a char* using .c_str() anyway.

Edit2: Resetting numberWord to 0 each time you call separateWords is also kinda strange. Shouldn't matter as long as you only use one file, but once you use more files, something will become wierd.
You're obviously using C++ (if you're including iostream), why aren't you using the rest of the SC++L (eg. std::vector)?
The comment "/// Slower, and nastier, but easier to debug." appears on the one line of the program where you actually attempt to handle text data in a *sane* manner, i.e. by using std::string.

I'm crying here.

I couldn't even figure out where to begin cleaning this up, so I just started from scratch.

#include <fstream>#include <string>#include <map>#include <cstdlib>#include <cctype>int main() {	std::map<std::string, int> wordTally;	system("dir /b > temp.txt");	std::ifstream catalog("temp.txt");	std::string filename;	while (std::getline(catalog, filename)) {		if (filename.substr(filename.size() - 4) == ".log") {			std::ifstream logfile(filename.c_str());			std::string word;			// Normally, checking for EOF manually like this is a bad idea, but			// for this algorithm, we actually do want to pick up a -1 value once.			while (!logfile.eof()) {				int c = logfile.get();				if (std::isalpha(c)) {					word += static_cast<char>(c);				}				else if (word != "") {					wordTally[word] += 1; // automatically inserts zeros where needed.					word = "";				}			}		}	}	std::ofstream results("Results.txt");	results << "Total words: " << wordTally.size() << std::endl;	for (std::map<std::string, int>::iterator it = wordTally.begin();	     it != wordTally.end(); ++it) {	results << it->first << " was used " << it->second << " time(s)."	        << std::endl;	}	system("del temp.txt"); }


I hope it's not a major problem for you that this will implicitly sort the output words in alphabetical order. Or that it will, in addition to not having any arbitrary limits on "vocabulary" size, quite possibly run faster. Or that it correctly handles the punctuation characters between 'Z' and 'a' in ASCII.
Oh, at least one of the actual problems is as follows:

memcmp(&buffer+i-letterCount, ...


'buffer' is already a pointer, so '&buffer' is not what you want.

Quote:Original post by DvDmanDT
You loop through all known words, to see if the word already exists. If it's not the first word, you'll add it, then you'll check if it's the second word, otherwise add it, then you'll check if it's the third word, otherwise add it and so on. At least I think so. There's an if with an else, and that if shouldn't have that else I think. Instead use { and } on your for construct.


You missed his 'break' statement, which breaks out of the enclosing for-loop (and thus all the way to the end of the function). I agree though that the formatting is horrible (also in the pastebin version). Also, the separation into functions is poorly thought out; the function is called "separateWords", but it doesn't actually separate the words - it actually updates a global tally.

Quote:Edit2: Resetting numberWord to 0 each time you call separateWords is also kinda strange. Shouldn't matter as long as you only use one file, but once you use more files, something will become wierd.


Yes, this is a problem, as far as I can tell.
Zahlman, I didn't miss the break. If it's not the first word, it'll add it before looking at the second (notice how the if and the else is in the for loop). So for each different word it finds before the current word, it'll add the current word. Or am I missing something? Edit: The break will break out of the inner for, not the outer, right?

I guess it never gets that far because of the &buffer thingy.
Thank you so much Zahlman! I compiled it and it works perfectly! :D

Now I realize that I don't know enough C++ (std::map, our teacher tried to teach us that in Java...never knew C++ had it), and so I tried to substitute C in its place.

Now back to the books for std::map.
I think I got the hang of the map, and the iterator. But can somebody tell me how it is actually made? Like, how fast is it compared to if you wrote your own map class?

This topic is closed to new replies.

Advertisement