Sign in to follow this  

Search for matching string

This topic is 1115 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi everyone. 
I'm trying to find a solution that would help me to find a matching string within an array.
i have a big array[10000] and within it there are strings that are the same ( only 1 matching string for each string)

strings can be different lengths.

 

the easiest and most expensive way in my mind is to sort all the strings by length size into different vectors, then loop through the vectors to find the matching string.

I could use array instead of string for read/write time improvement, but instead i am trying to see if i can use map.

Use map in order to have a key for each string which would represent length, and value would be the string.

 

How would i loop through all the strings that are of specific (map key?) string.length.

Thank You.

Share this post


Link to post
Share on other sites

There are many ways of doing it, here are some hints:

1. Use a hashvalue as key instead of length.

2. If you have a static array, calculate the hashkeys of each string, then sort the arrays (hashkey array+string array) and use binary search to find the first hashkey. Then compare each string with the same hashkey.

3. If you have a dynamic array, I would recommend to use a hashmap which allows multiple entries with the same hashkey or use list/vector to save the list of strings having the same hashkey, something like this:

map<int,list<string>>

Share this post


Link to post
Share on other sites

thank you for reply.

 

to make things simple for now i just sum the DEC values of chars in string to represent a hash key ( in my head, that works as well?)

 

i feel a bit stuck. 

	int found = 0;
	for (auto it = map.begin(); it != map.end(); it++)
	{
		auto search = map.find(it->first);

		if (search != map.end())
		{
			if (it->second != search->second && it->first == search->first)
			{
				found++;
			}
		}
	}

Is there a way to start iteration (find) from a specific position until the end of the map?
 

Share this post


Link to post
Share on other sites

to make things simple for now i just sum the DEC values of chars in string to represent a hash key ( in my head, that works as well?)

Notice that performance of a hash based solution depends heavily on the value distribution and collision probability of hash values. Because simply summing up character code values gives you a poor hash (e.g. "abc", "acb", "bac", "bca", "cab", and "cba" all give the same value), any measuring done with it is mostly worthless.

 

Well, a hash based solution usually need not use a map. (On the other hand, a map may be implemented by using a hash table.) If you want to use hashes, then e.g. create a table indexed by a hash (or a portion of a hash), where each entry of the table is a bucket (perhaps a list) of stings having the said hash value. This way reduces the search by a factor of N when using N buckets, assuming that the distribution of hash values is okay. It is called "hash table" and it well known and documented on the internet, e.g. here on wikipedia.

 

 

Another approach in case that the strings are not changed (at least not often) is the use of a kind of trie. However, they are a bit (or in case of some variants even much) more complex than a hash table.

Edited by haegarr

Share this post


Link to post
Share on other sites
This is C++. Just use std::unordered_map, which is a hash table that ships with any moderately recent C++ implementations. It's not the best hash table but it's 10000x better than rolling your own broken data structure.

Likewise, if you need a hash of a value for some other reason, use std::hash. It will provide a good hash for any of the built-in types including std::string. If you need to hash your own types, read up on the documentation of the standard library: bad hashes can easily be worse than not using hashing at all (e.g., a std::map will be better, and a std::map is pretty terrible).

Share this post


Link to post
Share on other sites

thank you all for suggestions.
one more question.

If i  use multimap, and have multiple keys with same value, but different data.
what is the easiest way to loop through all the specific keys?

 

Ashaman73

Or instead of multimap is it better to use the suggested map<int,list<string>>, my first try to make it work turned out to perform really slow in populating the string. 

Share this post


Link to post
Share on other sites

thank you all for suggestions.
one more question.
If i  use multimap, and have multiple keys with same value, but different data.
what is the easiest way to loop through all the specific keys?

 

You can use equal_range to get a pair of iterators for all values mapped to a particular key, then loop over them using standard techniques.

Share this post


Link to post
Share on other sites

Is this a frequent or infrequent thing? Or in other words, is the extra cost of creating hash tables or other metadata greater or less than the cost of all the lookups? For infrequent work a brute force approach is often good enough.
 
Second, is this only going to be used for an exact match comparison? For example, is there ever going to be a case where substrings are important, or you need detail from within a part of the string rather than a pure acceptance match?  
 
Those questions may suggest alternate data structures or patterns. There are a few other structures that can also provide great performance on different uses, such as performing an autocomplete system or or spell correction. A bit more information could help.
 
 
One person above, haegarr, mentioned one of these data structures. Depending on your answers it may be a great answer.
 
A very good structure for string acceptance is the trie. For several uses a trie can be faster than a hash table, but for other uses it can be slower, so knowing details are important. Also useful, a trie can provide alphabetical sorting, and some implementations can be heavily compressed requiring only a few bytes per entry rather than storing complete strings. It is also generally cache efficient and for that reason it is used in the fastest-known string sorting algorithm.

Share this post


Link to post
Share on other sites

If we would know your requirements, we could help you more. You have already an implementation in mind to solve a problem, but we only know this implementations and not the requirements. Requirements are something like this:

1. I have a fix set of strings, which will not change while the application is running.

2. The performance for preparing the strings is not that important.

3. I need to look up the strings very often, therefor the lookup performanc is very important.

 

Or like this:

1. The set of strings changes very often while the application is running.

2. I need to setup new sets very often, therefor the setup-time is performance critical.

3. I need to search in ranges, eg. iterate through all strings between a - c.

4. The look up needs to ignore case.

 

Once we know, what you need, we could suggest some specialist data structure/algorithm.

Share this post


Link to post
Share on other sites

This topic is 1115 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this