Jump to content
  • Advertisement
Sign in to follow this  
xeloj

Word count problem

This topic is 4172 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi, I've been working on this homework problem where I have to write a program which counts the number of unique words in a text file, ignoring punctuation (other than apostrophes) and capitalization. For example "can't" is a different word from "cant", but "Dog" and "dog" are the same word. I can't use the STL and the only headers I can use are <stdio.h>, <stdlib.h>, and <memory.h>. My basic idea to approach this is as follows(pseudo-code): char c; wordArray[10000]; wordCount = 0; while(!eof) { while(!whitespace || period || exclaimation || questionmark ){ wordHolder[20]; arrayCounter = 0; c = getC (myFile); wordHolder[arrayCounter] = c; arrayCounter++; } CheckForDuplicate(); wordHolder = wordArray[wordCount]; //I know this won't work... wordCount++; } I know the above has some problems with it, but that's bascially my idea of how to do this program given the parameters of the assignment. How do I get the character array into the word array as a complete word? Also are there any other problems you guys can see me running into that I haven't identified? Thanks in advance for any help.

Share this post


Link to post
Share on other sites
Advertisement
Strings in C are copied using strcpy.

Also, why not directly read the word into the correct array position?

Share this post


Link to post
Share on other sites
I wouldn't copy any strings.

I would make it a tree. Each node containing an array with the size equal the number of diffrent characters you allow (A-Z and ' in this case). Just traverse the tree for each word and when a new space is found, determine if that word already existed or if the last character before the space was added. Keep a counter for each new word found.

Share this post


Link to post
Share on other sites
Quote:
Original post by ToohrVyk
Strings in C are copied using strcpy.

Also, why not directly read the word into the correct array position?


Well if I was able to create "string" variables, I would just have a "string" holder and set that equal to the current array position, and then keep iterating.

Can that be done in C?

Share this post


Link to post
Share on other sites
Okay nevermind on the "C" only stuff.

I was told I could use C++ to complete the assignment so that makes it a lot easier.

However, I had a question for the data structures experts in here.

I want to use an array to store my word list, and then check against the array to see if a given word is a duplicate or not.

Is this a good way to go about it?

Is a hash table better???

Share this post


Link to post
Share on other sites
A hash table is much better, because it makes the complexity linear. A trie is also linear in complexity.

Share this post


Link to post
Share on other sites
Edit: OK, I should have read the rest of the thread first; but the fact that they'd even consider telling you what they originally told you sickens me, and I'm already sick.

Quote:
Original post by xeloj
Hi,

I've been working on yadda yadda yadda...

I can't use the STL and the only headers I can use are <stdio.h>, <stdlib.h>, and <memory.h>.


My honest recommendation to you is to drop the course and tell everyone you can not to take it. Seriously. I am telling you this as a university graduate (who went through some similarly worthless programming courses) with real-world, professional C++ experience.

Unfortunately, I can't recommend any alternative courses for you at your school or university, and probably couldn't even if I knew what that school or university was. In fact, I can't even recommend, off the top of my head, a school or university that teaches things properly. I really can't. I really, really wish I could, but I can't. The state of things is really that bad.

If you're even *mentioning* the STL, then your course is presumably claiming to teach C++.

stdio.h, stdlib.h and memory.h don't even EXIST in proper, standardized C++, as of nineteen freakin' ninety-eight. The proper names are <cstdio>, <cstdlib> and <memory>.

Also, "the STL" is poor phrasing, because what we are really talking about is the standard C++ library. Not all of the STL is available from the standard C++ headers, and there are many other things covered in the C++ standard library (for example, all the *stream headers).

But more to the point, consider that phrase. Standard C++ library. This is as close to built-in-to-the-language as it is possible for code to get. The reason that courses at "educational" institutions make you do things manually is in some vain hope that you will learn something about "how the machine works" by hands-on-and-in experience.

In my mind, it's something like trying to teach about electricity by drawing a diagram of a battery and of a lightbulb on a chalkboard, along with "V = IR"; then putting you in the lab with a beaker of sulfuric acid, some coins, alligator clips, and lots of bare copper wire and electrical tape; and hoping that you will figure out on your own the concept of "insulation", or that sulfuric acid is really not something you want to get on your skin. The main difference being that nothing you type into your computer is likely to injure or kill anyone.

Quote:

My basic idea to approach this is as follows(pseudo-code):


You might not want to make your pseudo-code look so much like the implementation language. :) (Python programmers have an excuse ;) )

Share this post


Link to post
Share on other sites
Okay so here's what I have so far.

 
#include <iostream>
#include <fstream>
#include <string>
#include <ctype.h>

using namespace std;

int main()
{
string filename = "test.txt";

string wordList[10000];

//Open File Stream
ifstream file_stream;
file_stream.open(filename.c_str());

//Variable that keeps count of "all" words
unsigned int total_words = 0;

//Variable that keeps count of "unique words
unsigned int unique_words = 0;

if ( file_stream.is_open() )
{

while ( !file_stream.eof() ) // loop until the end of the file
{
string holder; // just a holder
getline( file_stream, holder ); // read line from the file
cout << "Current Word: " << holder << endl;

holder[0] = tolower(holder[0]); //Converts first letter capitals to lower case

//total_words++; // increment word count

//cout << "We have " << total_words << " words in our array." << endl;
for(int i = 0; i<10000; i++)
{
if(holder == wordList[unique_words])
{
cout << holder << " creates collision(word is in current index)." << endl;
}
else
{
wordList[unique_words] = holder;
cout << holder << " is inserted into word list." << endl;
unique_words++; // increment word count
cout << "Word " << unique_words << " in array: " << wordList[unique_words] << endl;
cout << "We have " << unique_words << " words in our array." << endl;
break;
}
}// End for-loop
}// End while-loop
}
else
{
//catch case
cout << "Could not load file: " << filename << endl;
}
return 0;
}




My problem seems to be when I'm trying to iterate through the array to check if the word already exist in the array, it seems to give me problems. Is it okay to check the current value of holder which is a string, to the array indices?

Also I just noticed that my code counts the words if they are each on their own line, but I want it to work where it would check through a random paragraph and still count the words.

Any tips would be helpful!

[Edited by - xeloj on April 20, 2007 3:14:17 AM]

Share this post


Link to post
Share on other sites
I personally would create a quick linked list of words and just iterate through them to find uniqueness. I think the linked list is a little more work up front, but much easier to mess with once the groundwork is laid. If you are not allowed to use something like an STL vector, list, or set just create your own quick linked list that is pointer based.


[Edited by - vtchill on April 20, 2007 4:16:07 PM]

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!