• Advertisement
Sign in to follow this  

Basic question

This topic is 1784 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

What's the easiest way to find how many times every word appears in a string?

 

For ex. in : "Outside there are twelve birds,twelve cars and twelve trees".

 

Should I just search the string for the first whitespace,save the position,then exatract everything that was before the first whitespace,save it in a string,then delete everythign that was before the whitespace,after that search the string for that word,and everytime i find it,delete it(and increment an int in order to know how many times i found it)? Then repeat the process?

 

Is there a better way?

 

Is is possible to save everything word that the user inputs via console in a vector?

 

like:

for every word received as input,vector.push_back(word)

 

I thought about that,but I can't figure a way to take each word separately,I mean when the user writes a sentence and presses enter,all words will be received at once...

 

 

Share this post


Link to post
Share on other sites
Advertisement
There is a much easier way using the standard library.

Create a stringstream from the input string and read one word at a time from it, or use an input stream from the console (it will read one word at a time unless you use getline).

Store the words in a map of <string, int>. the first time you insert a word, store 1. If the word is already in the map, increment the integer.

Share this post


Link to post
Share on other sites
You don't need a stringstream if your input is already comming from a stream. But otherwise, EWClay has the right of it.

cin >> string_var will read a string of character up to the next whitespace, effectively reading a word at a time. You will need some post processing to get rid of punctuation.

Share this post


Link to post
Share on other sites

Store the words in a map of <string, int>. the first time you insert a word, store 1. If the word is already in the map, increment the integer.

It's even easier that that, because you don't even need special logic to handle the first insert. The map value constructs its content if the key is not present, and integers value initializes to zero, so just go ahead and increment at all times. Thus, just use the [] operator and increment it; if the key doesn't exist it is value initialized to zero before proceeding to increment it to one and everything is fine.

Share this post


Link to post
Share on other sites

It's not the easiest way, however, using a radix tree provides a great deal of latitude handling string searches (i.e find the position of each word, how many times each word appear, how many words have a given prefix, etc...)

 

If you are managing very large documents with thousands of words, radix trees will greatly optimize the time you take to make string search operations compared to vectors.

 

I don't know if there's a radix tree library for C, but it's worth taking a look at. It's hard to understand but very easy to implement.

Share this post


Link to post
Share on other sites

couldn't you ignore punctuation and do something like "if (input_string.contains("twelve") then ++twelve_counter"?

If you know your input precisely (and probably already know how many of each word).

Otherwise, you could easily catch words that appear in a longer, different word, resulting in erroneous counts.

Edited by Ectara

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement