Data structure to represent/access synonym lists

Started by
3 comments, last by kirkd 10 years, 10 months ago

I have data which consists of list of synonyms and within each list there is a preferred synonym. For example, [A,b,c,d,e] and [Z,y,x,w,v], with the capitalized letter being the preferred name for that particular list of synonyms. I want to be able to access all the members of the synonym list with any member of that list, and I want to be able to access the preferred name for a synonym list by any member of the list.

The things that come to mind in C++ are maps and multimaps, and in Python Dictionaries. I don't know of a version of multimap in Python. Even with these, though it seems clunky.

For maps/dictionaries, I could create a vector/list of each synonym list which is stored in a map/dictionary multiple times, once for each item in the synonym list. If this were in C++ I could store the pointer to the vector of each one to save some space. Something like:

std::vector<std::string> syn_list_a

std::vector<std::string> syn_list_z

std::map<std::string, std::vector<string> > syn_map

std::map<std::string, std::string> > pref_map

syn_map["A"] = &syn_list_a

syn_map["b"] = &syn_list_a

...

syn_map["Z"] = &syn_list_z

syn_map["y"] = &syn_list_z

...

pref_map["A"] = "A"

pref_map["b"] = "A"

...

pref_map["Z"] = "B"

pref_map["y"] = "B"

For Python, I could translate this to dictionaries.

But, this all seems very inelegant and clunky. Any suggestions for a better way??

Advertisement

Depends on the exact rules you need to implement.

I have data which consists of list of synonyms and within each list there is a preferred synonym. For example, [A,b,c,d,e] and [Z,y,x,w,v], with the capitalized letter being the preferred name for that particular list of synonyms. I want to be able to access all the members of the synonym list with any member of that list, and I want to be able to access the preferred name for a synonym list by any member of the list.

If that is your actual requirements, in c++ it would be two simple containers.

std::map< synonym, preferred > mLinksBySynonym

std::multimap< preferred, synonym > mLinksByPreferred

The first one is indexed by synonym, so every entry should be unique. Look up with mLinksBySynonym.find(), then with a valid iterator use ->first or ->second to get the synonym or preferred name.

The second one is indexed by preferred name, which will be duplicated. Look up with mLinksByPreferred.equal_range(), which gives you a pair of iterators, everything between them is a match.

If space is an issue, or if the reverse lookup by preferred name is a rare occurance, you could go with just the first map object and do a much slower iteration over the entire dataset for the reverse lookup.

Hmmmm....not quite there. This seems fine to get access to the preferred name by any synonym or the list of synonyms by any preferred name. It doesn't address the question of getting the full list of synonyms from any single member of the list.

For example, given the synonym list [A,b,c,d,e,f], with A being the preferred name, I want to get the entire list as well as the preferred name by using any single member of that list. syn_list['d'] would return [A,b,c,d,e,f] as well as the preferred name 'A'. syn_list['f'] would return the same thing.

Then run two queries. The first to get the canonical form, the second to get all the aliases.

Ah, I got it now. The map provides the cannonical form, and the multimap returns the whole list/set. Seems obvious now that you point it out.

This topic is closed to new replies.

Advertisement