Jump to content
  • Advertisement
Sign in to follow this  
guyaton

fast comparison

This topic is 4924 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Guest Anonymous Poster
Quote:
Original post by guyaton
for clarification for myself, make a hash of the "comparing string" as i used in my example? havn't used a hash table in C++ but i know it is a member of the stl. how would i do it using c++ stl?

thanks again
~guyaton
No, you store the words in the string "some sample string for testing purposes" into a hash table. Like hash["some"]=0;hash["sample"]=0;... and then check if the comparing strings can be found in the hash: "hash.find("noun") != hash.end()", "hash.find("something") != hash.end()" and so on

Share this post


Link to post
Share on other sites
Advertisement
Why can't you just load up your reference strings in an array, and just cycle through any phrase something like this:


unsigned int i;
std::string my_phrase = "some sample string for testing purposes";

std::vector<std::string>my_words;
my_words.push_back("for");
my_words.push_back("noun");
my_words.push_back("something");
my_words.push_back("is");
my_words.push_back("test");
my_words.push_back("blah");

for ( i = 0; i < my_words.size(); ++i)
{
if (my_phrase.find(my_words) != std::string::npos)
{
// Do something if found.
}
}


--random_thinker

Edit: No, that won't work, it will show "test" is there...hang on...

[Edited by - random_thinker on August 29, 2005 3:20:09 PM]

Share this post


Link to post
Share on other sites
random_thinker: i was just trying to find something fairly efficiant. If I have 30 keyworks and say 40,000 works (even if i loaded it all line by line) it could take a while to search thru each word. This is why I am trying to find something efficiant.

~guyaton

edit: i came up with this real quick...still questioning how fast it is:


struct tempStr
{
string foundString;
unsigned char location;

tempStr( string &str, unsigned char loc )
{
foundString = str;
location = loc;
}
};
//.....

//elsewhere....
map<string, unsigned char> myHash;
myHash.insert( pair<string, unsigned char>("class", (unsigned char)0 ) );
myHash.insert( pair<string, unsigned char>("#include", (unsigned char)1) );

string someString = "thist is a classy function class";

string::size_type seeker = 0;



vector<tempStr> output;

while( seeker != someString.npos )
{
string::size_type tempSeeker;
string subsubstring = someString.substr( seeker, ( tempSeeker = someString.find_first_of( ' ', seeker ) ) - seeker );

//find( myHash.begin(), myHash.end(), someString );
map<string, unsigned char>::iterator someIter = myHash.find( subsubstring );

if( someIter == myHash.end() )
{
seeker = tempSeeker+1;
}
else
{
tempStr test = tempStr( subsubstring, (unsigned char)seeker ) ;
output.push_back( test );
}
}





edit #2: been reading on the Boyer-Moore Algorythm and it sounds pretty fast and cool. might check that out.

[Edited by - guyaton on August 29, 2005 4:45:13 PM]

Share this post


Link to post
Share on other sites
Oh, I see. You could also try using an ifstream that would only pick up 'whole words', something like the following (you might need to strip punctuation from current_string):


std::string current_string;
std::string my_words = "for noun something is test blah"; // or put these in a std::map<>, and later do a find(), depending upon the info you need.

std::ifstream my_text;
my_text.open("my_file.txt");

while (!my_text.fail())
{
my_text >> current_string;
if (my_words.find(current_string) != std::string::npos) // Could do a find() on a std::map<> of my_words here.
{
// Do something if found.
}
}






--random_thinker

Share this post


Link to post
Share on other sites
if anyone else is interested found a really good tut of what the Boyer-Moore algorithm is compliments of www.google.com

http://www.cs.utexas.edu/users/moore/best-ideas/string-searching/fstrpos-example.html

~guyaton

Share this post


Link to post
Share on other sites
guyaton,

I would have thought that if you could limit your string search to the 30 words rather than the 40,000 by streaming the 40,000 one at a time then the C++ string search algorithm should be fast enough, no?

--random_thinker

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Quote:
Original post by random_thinker
guyaton,

I would have thought that if you could limit your string search to the 30 words rather than the 40,000 by streaming the 40,000 one at a time then the C++ string search algorithm should be fast enough, no?

--random_thinker


Depends upon what's fast enough, but your way requires (worst case) as number of keywords times number of text words string comparisons (which are sloooow). That's about as slow as you can do it, without repeating yourself.

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
You don't need/want boyer-moore here because you're searching for specific words.. And std::map is not a hash map, it's a tree which also keeps the elements in order (a feature you don't need here). There's no standard hash map in STL (yet) although many STL implementations place some sort of hash_map into the std-namespace. hash map has less indirection than tree structures, so it's easy on cache. It should be a very fast solution.. Whether you should place the comparing strings in the hash map or the words in the other string, depends on which ones change less often (you want to build the hash only rarely)

Share this post


Link to post
Share on other sites
Quote:
Original post by Anonymous Poster
You don't need/want boyer-moore here because you're searching for specific words.. And std::map is not a hash map, it's a tree which also keeps the elements in order (a feature you don't need here). There's no standard hash map in STL (yet) although many STL implementations place some sort of hash_map into the std-namespace. hash map has less indirection than tree structures, so it's easy on cache. It should be a very fast solution.. Whether you should place the comparing strings in the hash map or the words in the other string, depends on which ones change less often (you want to build the hash only rarely)
In all honesty that's generally quite good advice, and I must admit, I didn't read every previous post before suggesting the Boyer-Moore algorithm (particularly the ones about it always being single words). I also didn't post a link because there's bound to be HEAPS of stuff on the net about it.

I'm a big fan of hash-tables myself too, however the Boyer-Moore, or other related algorithms are very likely to be around as quick as hashing the words when you consider the time taken for hashing every word.
The Boyer-Moore algorithm has a reasonably good payback for a relatively smallish setup cost, not to mention a relatively smallish memory overhead compared to hashing.

It's all about tradeoffs between memory usage / speed / flexability / simplicity etc. I'm merely saying that it's still an option, depending on how important the things I just mentioned are.[smile]

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!