Sign in to follow this  
ShadowPhoenix

My memory is broken. I didn't do it!!!

Recommended Posts

ShadowPhoenix    109
I made this application today to count how many words everyone said on AIM in logs, and which words were everyone's favorite. So the program has to 1) see all the .log files in a directory, 2) open all the logs 3) count all the words in logs Sounds simple, right? Well, it is. But my program somehow manages to crash randomly, and my debugger flipped out on me. Can someone please run this through a debugger (or take a look at the code?) and tell me whats wrong with it? (C++/WinApi/made with Dev-C++,mingwgcc) http://pastebin.com/m22a347b9 << easier to see. Someone please help me with this... I am just at a loss...
#include &lt;cstdlib&gt;
#include &lt;iostream&gt;
#include &lt;fstream&gt;
#include &lt;cstring&gt;
 #include &lt;sys/stat.h&gt;


#define null NULL

//,log
using namespace std;


struct wordstr{
       char*word;
       int len;
       int count;
}*word;
int numberWord=0;
void separateWords(char * buffer,int size) // size is in bytes
{

int letterCount=0;

numberWord=0;


for(int i=0;i&lt;size;i++)
{
       
letterCount++;
if(numberWord &gt; 1020)
return;

if(buffer[i]&lt;64 || buffer[i]&gt;122)
 {
 if(letterCount&gt;2) {
                   
                   for(int l=0;l&lt;numberWord;l++) // Make sure its not a duplicate
                   if(0==memcmp(&buffer+i-letterCount,word[l].word,letterCount))
                   {
                   word[l].count++;
                   letterCount=0;
                   break;
                   }
                   else{
                   word[numberWord].len=letterCount;
                   word[numberWord].word=(char*) malloc(sizeof(char)*letterCount+1);
                   for(int o=0;o&lt;letterCount;o++)
                   word[numberWord].word[o]=buffer[i-o];
                   word[numberWord].word[letterCount+1]='\n';
                   letterCount=0;
                   word[numberWord].count++;
                   numberWord++;}
                   }
  letterCount=0;
 
 }

}




}

int main(int argc, char *argv[])
{
    system (" dir /b  &gt; temp.txt" );
    ifstream myFile;
    myFile.open("temp.txt");
    string filename[50];   /// Slower, and nastier, but easier to debug.
    int files=0; 
    char *temp = (char*) malloc(128*sizeof(char));
    while(myFile.eof()==0)
    {myFile.getline(temp,80);
    if(strstr(temp,".log")!=null)
    {files++;filename[files]=temp;}
    }
    myFile.close();
    free(temp);
    struct stat results;
    
     word= (wordstr*)malloc(1024*sizeof(wordstr));
               
               
               for(;files&gt;0;files--)
{
       memset(&results,0,sizeof(results));
       myFile.clear();
       myFile.open(filename[files].c_str(),ios::binary);
       stat(filename[files].c_str(), &results);
       temp=(char*)malloc(results.st_size);
       myFile.seekg (0, ios::beg);
       myFile.read(temp,results.st_size);
       
       
       
       separateWords(temp,results.st_size);
       
       
       free(temp);
       myFile.close();
}

 ofstream myFile2;
 myFile2.open("Results.txt");
 myFile2&lt;&lt; "Total words:" &lt;&lt;numberWord &lt;&lt;endl;
 
 for(;numberWord&gt;0;numberWord--)
 myFile2&lt;&lt;(char*)word[numberWord].word &lt;&lt;" with number of used"&lt;&lt; word[numberWord].count&lt;&lt; endl;
 
 myFile2.close();
 myFile.close();
    free(word);
    system("del temp.txt");
    system("PAUSE");
    return EXIT_SUCCESS;
}
[/code]

Share this post


Link to post
Share on other sites
DvDmanDT    1941
Are you sure it's nothing like number of files being greater than 50 or number of different words being greater than 1024 or something like that then?

I don't have AIM so I can't test it.

Edit: Should you really be resetting numberWord to 0 all the time?

Share this post


Link to post
Share on other sites
ShadowPhoenix    109
Quote:
Original post by DvDmanDT
Are you sure it's nothing like number of files being greater than 50 or something like that then?

I don't have AIM so I can't test it.


I have trillian, and it stores all logs. But to test this I only put one log into the application folder. It opens it perfectly tho :/

Share this post


Link to post
Share on other sites
DvDmanDT    1941
1. Null terminate your strings. (otherwise it'll crash when you try to output them to your result file)
2. Something is wierd in your code.

You loop through all known words, to see if the word already exists. If it's not the first word, you'll add it, then you'll check if it's the second word, otherwise add it, then you'll check if it's the third word, otherwise add it and so on. At least I think so. There's an if with an else, and that if shouldn't have that else I think. Instead use { and } on your for construct.

Edit: Consider using std::string instead of char*, it may very well save you tons of headache. You can convert a std::string to a char* using .c_str() anyway.

Edit2: Resetting numberWord to 0 each time you call separateWords is also kinda strange. Shouldn't matter as long as you only use one file, but once you use more files, something will become wierd.

Share this post


Link to post
Share on other sites
Zahlman    1682
The comment "/// Slower, and nastier, but easier to debug." appears on the one line of the program where you actually attempt to handle text data in a *sane* manner, i.e. by using std::string.

I'm crying here.

I couldn't even figure out where to begin cleaning this up, so I just started from scratch.


#include <fstream>
#include <string>
#include <map>
#include <cstdlib>
#include <cctype>

int main() {
std::map<std::string, int> wordTally;
system("dir /b > temp.txt");
std::ifstream catalog("temp.txt");
std::string filename;
while (std::getline(catalog, filename)) {
if (filename.substr(filename.size() - 4) == ".log") {
std::ifstream logfile(filename.c_str());
std::string word;
// Normally, checking for EOF manually like this is a bad idea, but
// for this algorithm, we actually do want to pick up a -1 value once.
while (!logfile.eof()) {
int c = logfile.get();
if (std::isalpha(c)) {
word += static_cast<char>(c);
}
else if (word != "") {
wordTally[word] += 1; // automatically inserts zeros where needed.
word = "";
}
}
}
}
std::ofstream results("Results.txt");
results << "Total words: " << wordTally.size() << std::endl;
for (std::map<std::string, int>::iterator it = wordTally.begin();
it != wordTally.end(); ++it) {
results << it->first << " was used " << it->second << " time(s)."
<< std::endl;
}
system("del temp.txt");
}



I hope it's not a major problem for you that this will implicitly sort the output words in alphabetical order. Or that it will, in addition to not having any arbitrary limits on "vocabulary" size, quite possibly run faster. Or that it correctly handles the punctuation characters between 'Z' and 'a' in ASCII.

Share this post


Link to post
Share on other sites
Zahlman    1682
Oh, at least one of the actual problems is as follows:


memcmp(&buffer+i-letterCount, ...


'buffer' is already a pointer, so '&buffer' is not what you want.

Quote:
Original post by DvDmanDT
You loop through all known words, to see if the word already exists. If it's not the first word, you'll add it, then you'll check if it's the second word, otherwise add it, then you'll check if it's the third word, otherwise add it and so on. At least I think so. There's an if with an else, and that if shouldn't have that else I think. Instead use { and } on your for construct.


You missed his 'break' statement, which breaks out of the enclosing for-loop (and thus all the way to the end of the function). I agree though that the formatting is horrible (also in the pastebin version). Also, the separation into functions is poorly thought out; the function is called "separateWords", but it doesn't actually separate the words - it actually updates a global tally.

Quote:
Edit2: Resetting numberWord to 0 each time you call separateWords is also kinda strange. Shouldn't matter as long as you only use one file, but once you use more files, something will become wierd.


Yes, this is a problem, as far as I can tell.

Share this post


Link to post
Share on other sites
DvDmanDT    1941
Zahlman, I didn't miss the break. If it's not the first word, it'll add it before looking at the second (notice how the if and the else is in the for loop). So for each different word it finds before the current word, it'll add the current word. Or am I missing something? Edit: The break will break out of the inner for, not the outer, right?

I guess it never gets that far because of the &buffer thingy.

Share this post


Link to post
Share on other sites
ShadowPhoenix    109
Thank you so much Zahlman! I compiled it and it works perfectly! :D

Now I realize that I don't know enough C++ (std::map, our teacher tried to teach us that in Java...never knew C++ had it), and so I tried to substitute C in its place.

Now back to the books for std::map.

Share this post


Link to post
Share on other sites
Palidine    1315
Quote:
Original post by ShadowPhoenix
Like, how fast is it compared to if you wrote your own map class?


As a beginner programmer the STL is better at everything than anything you could write yourself.

Even as an advanced programmer there aren't a whole lot of circumstances in which the STL isn't the best choice. And when it's not you're only good enough to make something better if you know it's not good in that situation.

In short: if you don't already know it's not good in a given situation you're probably not good enough to make something better. =)

-me

Share this post


Link to post
Share on other sites
ShadowPhoenix    109
Well, right now this map class seems like a magic Pandora boc: I store a key in it, and it magically also stores associate object with it. Then I can restore it. How does it work? How fast is it? Like, I know that in a non-linked array my take time is O(1). If I have to search thrught it, I get O(n). What about the map?
Is the storing algorithm O(n)? Is retrieving O(n^2)? Does it use inline assembly?

Say, is it possible to view the map.cpp?

Share this post


Link to post
Share on other sites
Zahlman    1682
Quote:
Original post by ShadowPhoenix
How does it work?


Very roughly, and omitting lots of other nice stuff that you don't care about yet: first, templates are used to define a structure that wraps up a key and a value together. Then, another structure is similarly defined, which I'll call a node, consisting of a key-value pair plus two pointers-to-nodes. The map code builds a binary search tree out of nodes, only considering the key when sorting them. Most popular implementations use a red-black tree, to ensure it is reasonably well balanced.

Quote:
How fast is it? Like, I know that in a non-linked array my take time is O(1). If I have to search thrught it, I get O(n). What about the map?
Is the storing algorithm O(n)? Is retrieving O(n^2)?


Map insertion, removal and retrieval are all O(lg n).

Quote:
Does it use inline assembly?


Generally, no. But that's hardly relevant anyway.

Quote:
Say, is it possible to view the map.cpp?


There is no *definitive* map.cpp (and for any given implementation, it doesn't have to *exist*; the relevant code could be embedded directly into the compiler as a resource, for example), but you can try searching for one online or on your hard drive, sure. The standard library's implementation files live in different places depending on what compiler you have.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this