Jump to content
  • Advertisement
Sign in to follow this  
MeshGearFox

Difficulty withSTL algorithm next_permutation().

This topic is 3239 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I'm trying to write a program that generates anagrams based on user input. The user provides a string of words/characters/numbers, the program strips out the spaces and non-letters, sorts the string, then runs next_permutation on it. This part works. Next, the program compares the generated words with a dictionary file to see if they're valid words, and removes anything that's not entirely made of valid words. Also works. Problem is the last part. I want to remove redundant permutations from the vector containing the validated anagram strings. I'm doing this by going through the vector via for loops, tokenizing the string at index x in the vector, sorting then permutating the vector of tokens, converting the vector back into a single string, then comparing it against the list. This works when I try to generate two word anagram "sentences," but I noticed some odd behavior when trying to generate 3-word anagram "sentences." next_permutation gives the same result each time. For instance, if I gave the input "toy boat" and had it anagramize that into three separate words, I'd get output like: a boy tot at boy to bay to to boy a tot boy at to boy to at and so on, when I really just want "a boy tot", "bat to to", and "at boy to." Anyway, in the problematic function, I had it print out the test string at every step and it appeared to be generating the same exact ones each time. So, if I took "to" out of the dictionary and only got variations of a boy tot as output, the next_permutation call would ALWAYS give me "a boy tot" and not "tot a boy" or the like, which is what I want to test for. (Additionally I tried leaving the tokens unsorted. It actually gave me some permutations but not enough to catch everything). Any ideas? Full source: http://pastebin.com/m30ded777 Segment in question:
void remove_redundancy(vector<string> &permutations) {
    string test = "";
    vector<string> tokens;

    vector<string>::iterator it;

    // Add an empty entry to the end of
    // the vector. This way, if the last
    // entry in the vector would otherwise
    // be marked as redundant and deleted,
    // the iterator will safely point at
    // this isntead of going out of bounds.
    permutations.push_back(" ");

    // Remove other redundant strings from vector.
    for(int x = 0; x < permutations.size(); x++) {
        // Tokenize a given sentence.
        tokenize(tokens, permutations[x]);
        // Sort tokens.
        sort(tokens.begin(), tokens.end());
        // Permutate the tokens.
        while (next_permutation(tokens.begin(), tokens.end())) {
            // Re-convert the tokens into a string.
            for(int y = 0; y < tokens.size(); y++) {
                // Put token onto the string.
                test = test + tokens[y] + " ";
            }
            // Remove trailing space.
            test = test.substr(0, test.length() - 1);

            // Compare string to strings in
            // the permutation vector.
            for(it = permutations.begin(); it != permutations.end(); it++) {
                // Test if strings match.
                if(test == *it) {
                    // Strings match. Remove repeat.
                    permutations.erase(it);
                }
            }

            // Clear out test string and tokens.
            test = "";
            tokens.clear();
        }
    }

    // Remove the empty entry
    // from earlier.
    permutations.pop_back();
}

Share this post


Link to post
Share on other sites
Advertisement
Came up with a working, albeit still sort of chunky, solution that works in a more coherent manner.


// This functions removes redundant permutations.
// It converts a string in the permutation list into a
// vector of tokens. It then sorts these tokens.
// Then it converts the tokens back into a string,
// checks whether this string is already in the results
// vector, and if it isn't, it adds the string to the
// results vector.
// Sort, here, is basically removing all variation in the
// permutations thus making redundancy checking easier.
vector<string> remove_redundancy(const vector<string> &permutations) {
string test = "";
vector<string> tokens;
vector<string> results;

vector<string>::iterator it;

// Remove other redundant strings from vector.
for(int x = 0; x < permutations.size(); x++) {
// Tokenize a given sentence.
tokenize(tokens, permutations[x]);

// Sort tokens.
sort(tokens.begin(), tokens.end());

// Re-convert the tokens into a string.
for(int y = 0; y < tokens.size(); y++) {
// Put token onto the string.
test = test + tokens[y] + " ";
}

// Remove trailing space.
test = test.substr(0, test.length() - 1);

// See if test string is already in results.
it = find(results.begin(), results.end(), test);

// String not already in vector. Add to vector.
if(it == results.end()) {
results.push_back(test);
}

// Clear out test string and tokens.
test = "";
tokens.clear();
}
return results;
}

Share this post


Link to post
Share on other sites
You could also split the anagram into words, then push them into an std::set. 2 std::sets with the same words in should compare as equal, so you can use that to test whether a new anagram should be added to the list or rejected.

Share this post


Link to post
Share on other sites
Quote:
(Additionally I tried leaving the tokens unsorted. It actually gave me some permutations but not enough to catch everything).


Because the next_permutation loop will run from the current ordering until the "last" lexicographical ordering, and never get to the "previous" ones.

Anyway. The easiest way to remove redundant permutations is not to put them in. Each time you successfully cut up a permutation of the original string into words, sort the words there, assemble them into a string and then put that string into a set rather than a vector. This will be much faster, because (a) you don't have to generate all permutations of the current string in order to check for a duplicate (because any duplicate would have the words in a known order due to the sorting) and (b) you don't have to look at all other strings to find a duplicate (because the set is sorted internally in a tree structure).

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!