random permutation iterator

Started by
8 comments, last by yahastu 16 years, 3 months ago
I want to efficiently iterate all the possible permutations of a string in a random order. I already wrote a function to iterate the permutations in order:

void iterate_permutations(std::string pre, std::string word, void (*func)(std::string) )
{
	using namespace std;

	if (word.size() == 1)
		func(pre+word);
	else
	{
		for(unsigned int i=0; i<word.size(); ++i)
		{
			char c = word;
			string n_word = word;
			n_word.erase( n_word.begin() + i );
			iterate_permutations(pre + c, n_word, func);
		}
	}
}


I already tried making 'i' loop in random order, which is slightly more random, but but not really because of the other half of the word. ie, the following does not solve the problem:

void iterate_permutations_random(std::string pre, std::string word, void (*func)(std::string) )
{
	using namespace std;

	if (word.size() == 1)
		func(pre+word);
	else
	{
		//generate random order to traverse in
		int *order = new int[word.size()];
		for(int i=0; i<(int)word.size(); ++i)
			order = i;
		for(int i=0; i<(int)word.size(); ++i)
		{
			int j = randu(1, i);
			int order_i = order;
			order = order[j];
			order[j] = order_i;
		}

		for(unsigned int j=0; j<word.size(); ++j)
		{
			int i = order[j];

			char c = word;
			string n_word = word;
			n_word.erase( n_word.begin() + i );
			iterate_permutations_random(pre + c, n_word, func);
		}

		delete order;
	}
}


EDIT: Another condition is that the algorithm still work when the total number of permutations is greater than what can be stored in an integer.. [Edited by - yahastu on January 14, 2008 9:58:59 PM]
Advertisement
What is your motivation for this?

All permutations of P(n,n) elements is n! results. "Factorial growth" and "efficient" generally don't describe the same algorithm.


Perhaps if we knew your goal, we could come up with something that is a little more computationally-friendly.
Quote:Original post by frob
What is your motivation for this?

All permutations of P(n,n) elements is n! results. "Factorial growth" and "efficient" generally don't describe the same algorithm.

Perhaps if we knew your goal, we could come up with something that is a little more computationally-friendly.


Yeah, it is indeed a large number which is why it is important that the algorithm be efficient (in terms of not adding to the inherent time complexity and having a relatively small constant multiplier).

As for context, I'm trying to brute force attack an encryption scheme (not a serious one). The more I think about it, the more I think that it is probably impossible to write such an algorithm -- of course it doesn't hurt to see what others think. However, I'm thinking that probably the best possible alternative would be to see if the factorial is computable; if it is computable, then iterate permutations in sorted order...otherwise resort to monte-carlo sampling and use a hash table to avoid testing the same permutation twice.
If you're just trying to brute force something by generating all permutations and don't want to be gamed by someone deliberately choosing the last permutation, just start with a random permutation and permute from there in regular order.
In this case, since I am dealing with particularly BAD encryption, I may be able to detect if the key I test is similar to the true key...so that's why I want to go in random order; it maximizes my chances of getting CLOSE to the key by distributing the search across key space.
What do you mean by "close" in the permutation space? How do you measure closeness?
By close in permutation space (of strings) I mean that the two strings are similar. Anyway, aside from minor optimizations, I think this is as good as it gets for what I was asking:

size_t factorial(size_t num){	if (num==1) return 1;	size_t next = factorial(num-1)*num;	if (next < num ) return std::numeric_limits<int>::max();	return next;}


size_t num_permutations(std::string str){	using namespace std;	sort(str.begin(), str.end() );	str.erase( unique(str.begin(), str.end() ), str.end() );	return factorial(str.size());}


std::string random_permutation(std::string word){	std::string pre;	while( word.size() > 0 )	{		size_t j = rand() % word.size();		pre += word[j];		word.erase( word.begin()+j );	}	return pre;}


size_t nump = num_permutations(word);	HashTable<std::string, bool> used_key;	for(size_t i=0; i<nump; ++i){		std::string permutation = random_permutation(word);		bool val;		while( used_key.find(permutation, val) )			permutation = random_permutation(word);		used_key.insert( permutation, true );		cout << i << ": "<< permutation << endl;	}

I don't know about the ins and outs of what you're doing exactly, but you do know about std::random_shuffle() and std::next_permutation(), right?
Quote:Original post by the_edd
I don't know about the ins and outs of what you're doing exactly, but you do know about std::random_shuffle() and std::next_permutation(), right?


Quoted for emphasis. Also, that calculation of num_permutations is totally bogus. The hash_map usage is wrong, too: .find() will return an iterator, which you should be comparing to .end(), and thus there is no use for the map values - use a set instead. Also, if the number of permutations doesn't fit in an int, the map isn't going to fit in memory, so you're screwed anyway. If you don't want collisions, you're going to have to step through linearly. Sorry. (Consider that just trying the key multiple times might be as fast or faster, anyway, than trying to check for duplication.)
I wasn't sure how to calculate the number of permutations when certain characters are repeated.

I've since changed my approach, I am now looking at ways to sample the permutation space statistically for solutions that are more likely to occur, rather than trying all possibilities. If it grows so large that I run out of memory then a linear search would probably never finish either.

So, this topic can be considered solved as far as I'm concerned.

As for the hash map, it's usage is correct. it is my own container, and I use it because it's better than stdext::hash_map or std::map.

Feel free to criticize it though!


#pragma once

#include <vector>
#include <string>
#include <cstring>
#include "CoreMath.h"

typedef unsigned int UINT;

template<typename T>
struct is_equal
{
bool operator()(const T &a, const T &b) const
{
return (a == b);
}
};

template<typename T>
struct hash
{
// Fallback generic, uses a random number generator with int cast as seed
UINT operator()(const T &key) const
{
long seed = static_cast<T>(key);
return static_cast<UINT>( rand(seed)*10000.0f );
}
};

//specialize hash func for string
template<>
UINT hash<std::string>::operator()(const std::string &str) const
{
//djb2 hash
UINT hash = 5381;

const char *c = str.c_str();
for(UINT i=0; i<str.size(); ++i)
hash = ((hash << 5) + hash) + *c++;

return hash;
}


template<typename Key = std::string, typename Val = UINT, UINT Alloc = 300, typename HashFcn = hash<Key>, typename EqualKey = is_equal<Key> >
class HashTable
{
struct HashEntry
{
Key myKey;
Val myVal;
HashEntry *next;
EqualKey myEqual;

HashEntry(const Key &_myKey, const Val &_myVal)
:next(NULL), myKey(_myKey), myVal(_myVal){}

//inserts HashEntry into this node's chain
void insert(const Key &_key, const Val &_val)
{
if ( myEqual(myKey, _key) )
myVal = _val;
else {
if (next == NULL) next = new HashEntry(_key, _val);
else next->insert(_key, _val);
}
}

void remove(const Key &_key)
{
if (next == NULL) return;
else next->remove(_key);
}

//finds '_val', starts looking at itself then checks chain
bool find(const Key &_key, Val &_val) const
{
if ( myEqual(myKey, _key) ){
_val = myVal;
return true;
} else {
if (next == NULL) return false;
else return next->find(_key, _val);
}
}

//deletes all nodes in the chain (does not delete self)
void deleteChain()
{
if (next != NULL){
next->deleteChain();
delete next;
}
}
};

HashEntry *data[Alloc];
HashFcn myHash;
EqualKey myEqual;

public:
HashTable()
{
memset(data, NULL, Alloc*sizeof(HashEntry*) );
}

~HashTable()
{
for(UINT i=0; i<Alloc; ++i)
{
if (data != NULL)
{
data->deleteChain();
delete data;
}
}
}

void insert(const Key &_key, const Val &_val)
{
UINT code = myHash(_key) % Alloc;
HashEntry *&entry = data;

if (entry == NULL) entry = new HashEntry(_key, _val);
else entry->insert(_key, _val);
}

void remove(const Key &_key)
{
UINT code = myHash(_key) % Alloc;
HashEntry *&entry = data;

if (entry == NULL) return;
else if ( myEqual(entry->myKey, _key) ) {
entry->deleteChain();
delete entry;
entry = NULL;
} else entry->remove(_key);
}

bool find(const Key &_key, Val &_val) const
{
UINT code = myHash(_key) % Alloc;
HashEntry *entry = data;

if (entry == NULL) return false;
else return entry->find(_key, _val);
}

//ITERATOR INTERFACE

//returned by iterator
struct Pair
{
Key key;
Val val;
};

class iterator : public std::iterator<std::forward_iterator_tag, Val >
{
UINT i;
HashEntry *const (*data);
HashEntry *entry;
EqualKey myEqual;
Pair myPair;

void advance()
{
//end iterator cant advance
if (i < 0 || i >= Alloc) return;

//try to advance normally
if (entry != NULL) entry = entry->next;

//if that failed, advance to first non-null bucket
if (entry == NULL)
{
//if still null, look at next buckets..
while(entry == NULL && ++i < Alloc)
entry = data;
}
}

public:
iterator(HashEntry *const _data[Alloc], UINT _i) : i(_i), data(_data), entry(NULL)
{
entry = data[0];
advance();
}

iterator& operator=(const iterator& other) {
i = other.i;
data = other.data;
entry = other.entry;
return(*this);
}

bool operator==(const iterator& other) const{
return (i == other.i && data == other.data && entry == other.entry);
}

bool operator!=(const iterator& other) const{
return !(operator==(other));
}

iterator& operator++() { //prefix
if (entry != NULL)
advance();
return (*this);
}

Pair *operator*()
{
if (entry != NULL){
myPair.key = entry->myKey;
myPair.val = entry->myVal;
return &myPair;
} else return NULL;
}
};

//get iterators
iterator begin() const
{
return iterator(data, 0);
}

iterator end() const
{
return iterator(data, Alloc);
}
};

This topic is closed to new replies.

Advertisement