Basic question

Started by
5 comments, last by Ectara 11 years, 1 month ago

What's the easiest way to find how many times every word appears in a string?

For ex. in : "Outside there are twelve birds,twelve cars and twelve trees".

Should I just search the string for the first whitespace,save the position,then exatract everything that was before the first whitespace,save it in a string,then delete everythign that was before the whitespace,after that search the string for that word,and everytime i find it,delete it(and increment an int in order to know how many times i found it)? Then repeat the process?

Is there a better way?

Is is possible to save everything word that the user inputs via console in a vector?

like:

for every word received as input,vector.push_back(word)

I thought about that,but I can't figure a way to take each word separately,I mean when the user writes a sentence and presses enter,all words will be received at once...

Advertisement
There is a much easier way using the standard library.

Create a stringstream from the input string and read one word at a time from it, or use an input stream from the console (it will read one word at a time unless you use getline).

Store the words in a map of <string, int>. the first time you insert a word, store 1. If the word is already in the map, increment the integer.
You don't need a stringstream if your input is already comming from a stream. But otherwise, EWClay has the right of it.

cin >> string_var will read a string of character up to the next whitespace, effectively reading a word at a time. You will need some post processing to get rid of punctuation.

couldn't you ignore punctuation and do something like "if (input_string.contains("twelve") then ++twelve_counter"?

Beginner in Game Development?  Read here. And read here.

 

Store the words in a map of <string, int>. the first time you insert a word, store 1. If the word is already in the map, increment the integer.

It's even easier that that, because you don't even need special logic to handle the first insert. The map value constructs its content if the key is not present, and integers value initializes to zero, so just go ahead and increment at all times. Thus, just use the [] operator and increment it; if the key doesn't exist it is value initialized to zero before proceeding to increment it to one and everything is fine.

It's not the easiest way, however, using a radix tree provides a great deal of latitude handling string searches (i.e find the position of each word, how many times each word appear, how many words have a given prefix, etc...)

If you are managing very large documents with thousands of words, radix trees will greatly optimize the time you take to make string search operations compared to vectors.

I don't know if there's a radix tree library for C, but it's worth taking a look at. It's hard to understand but very easy to implement.

Tiago.MWeb Developer - Aspiring CG Programmer

couldn't you ignore punctuation and do something like "if (input_string.contains("twelve") then ++twelve_counter"?

If you know your input precisely (and probably already know how many of each word).

Otherwise, you could easily catch words that appear in a longer, different word, resulting in erroneous counts.

This topic is closed to new replies.

Advertisement