Jump to content
  • Advertisement
Sign in to follow this  
SonicD007

Most efficient way to find out if two words have the same letters within them

This topic is 2788 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

The topic title might sound a little confusing, but basically I want to unscramble one word by comparing it to another word. Right now I'm checking if both words have the same length then try to see if the scrambled word contains all the letters the normal word contains. At least that's what I'm trying to do. I'm not sure how I can tell if scrambled word contains the same letters as the original. I've looked online for a while now and a solution that seems promising (but also doesn't seem too good if there are special characters used i.e. # ¿ ♠) is to assign each letter a prime number and multiply these numbers together to get a unique number for that specific word.
example:
a = 2 b = 3 c = 5 d = 7 ect...
so the "word" abc would equal 30 where as the word cda would equal 70
This solution makes sense but is this the most efficient way of solving this problem? Is there any other way to do this?

Another way might be using the ASCII character values but...well..I'm not entirely sure how I would use them to get the results I want.

Thanks.

Share this post


Link to post
Share on other sites
Advertisement
If I understand what you're asking, you can make a copy of both strings, sort both of the copies and see if the copies are equal.

Share this post


Link to post
Share on other sites
I'm not completely sure but here is an idea,

Let A be some string of characters
Let B be some string of characters

To check if A and B has the same character you can do the following,

  • radixSort(A) # using the ascii values of each letter in A -- O(n)

  • radixSort(B) # using the ascii values of each letter in B -- O(n)

  • compare each letter by letter in A and B



Not sure if thats what your trying to achieve.

Share this post


Link to post
Share on other sites
You could use something similar to the counting sort algorithm. Count how many times each ASCII character appears in each string - if the counts are equal, then both strings contain the same characters.
e.g.
//assuming your "words" are less than 256 characters, we can use single-byte counters
typedef unsigned char number;

//assuming basic ASCII strings, we only have 256 different characters to consider
const int numCharacters = 256;
number word1Count[numCharacters];
number word2Count[numCharacters];

int sizeInBytes = sizeof(number)*numCharacters;
memset( word1Count, 0, sizeInBytes );
memset( word2Count, 0, sizeInBytes );

for( const char* str = word1; *str; ++str )
++word1Count[*str];

for( const char* str = word2; *str; ++str )
++word2Count[*str];

bool bothCountsAreEqual = (memcmp(word1Count, word2Count, sizeInBytes)==0);

Share this post


Link to post
Share on other sites
The problem with the prime number approach is you have a limited amount of characters you can check per number of bits available to store the multiplied primes. Assuming a 32-bit unsigned integer and 26 letters, 'z' would equal '101'. The string "zzzzzz" won't fit in 32-bits. In reality you only get like 5 characters that way. Even going up to 128 bits you only get up to 20 characters.

I'd go with SiCrane's or Hodgman's ideas.

Share this post


Link to post
Share on other sites
Quote:
Original post by Hodgman
You could use something similar to the counting sort algorithm. Count how many times each ASCII character appears in each string - if the counts are equal, then both strings contain the same characters.
e.g.*** Source Snippet Removed ***


this should be pretty efficient.

You can check first if the lengths of the strings are equal and if they are, you can use the same loop to count the occurrences.

Share this post


Link to post
Share on other sites
@nobodynews: Good catch there, I didn't even notice that error.
@Hodgman: I like that idea. I'm gonna go with your approach on this problem.

Thank you for all the replies. Learn something new everyday :)

Share this post


Link to post
Share on other sites
Here's an idea: Quickcomp. (Quickeq? I can't think of a good name.) Like quicksort, but you sort the two arrays in parallel, and early-out if the partition sizes differ.


def partition(A):
if len(A) == 1: return
pivot = A[0]
cursor = 0
for i in range(len(A)):
if A <= pivot:
A[cursor],A = A,A[cursor]
cursor += 1
return cursor

def quickcomp(A, B):
if len(A) != len(B): return false
if len(A) <= 1:
if len(A) == 0: return true
else: return (A[0] == B[0])

a_pivloc = partition(A)
b_pivloc = partition(B)

if not quickcomp(A[:a_pivloc-1], B[:a_pivloc-1]): return false
if not quickcomp(A[a_pivloc+1:], B[a_pivloc+1:]): return false

return true


Without thinking about it too hard, it seems like this should perform in average time O(n) if the sets differ, and O(n log n) if they are the same. My intuition is that that's the best you can do without non-comparison-based stuff.

Share this post


Link to post
Share on other sites
If your goal is to find all possible words based on scrambled word by searching some sort of database of words, presented solution taken as is could be to slow.
In such a case I would recommend preprocessing your words database first in the following manner:

1. You take each word from the database, sort it first, and next copy it to other database, remembering original word and sorted one. This new database should be sorted by sorted names (sounds strange ;))
For example, if you have words: 'cat', 'dog', 'god', 'good', 'ape', that would be:
act->cat
aep->ape
dog->dog, god
doog->good

It is better to perform this step previously and next work on preprocessed database, as when comparing scrambled name you will not have to sort each word from database during comparison, so it will be much faster.
If you fear it will take too long, consider that it has to be done only once and if you won't do this such way, you will have to perform this task during each search.

2. When searching for true word from the scrambled one, this task will be just simple searching word from the database, what could be obtained either by some sort of logarithmic search, or by using hash-arrays, or by constructing tree from the database.

Good luck
Pomnico

Edit:
By database I don't mean SQL necessary.

Share this post


Link to post
Share on other sites
Quote:
Original post by Sneftel
Here's an idea: Quickcomp. (Quickeq? I can't think of a good name.) Like quicksort, but you sort the two arrays in parallel, and early-out if the partition sizes differ.


def partition(A):
if len(A) == 1: return
pivot = A[0]
cursor = 0
for i in range(len(A)):
if A <= pivot:
A[cursor],A = A,A[cursor]
cursor += 1
return cursor

def quickcomp(A, B):
if len(A) != len(B): return false
if len(A) <= 1:
if len(A) == 0: return true
else: return (A[0] == B[0])

a_pivloc = partition(A)
b_pivloc = partition(B)

if not quickcomp(A[:a_pivloc-1], B[:a_pivloc-1]): return false
if not quickcomp(A[a_pivloc+1:], B[a_pivloc+1:]): return false

return true


Without thinking about it too hard, it seems like this should perform in average time O(n) if the sets differ, and O(n log n) if they are the same. My intuition is that that's the best you can do without non-comparison-based stuff.
That's a cool algorithm idea. I decided to implement it myself and one problem I ran into is that you need to ensure that the partitioning step picks the same item for both a and b. If it picks a different pivot then a different number of items will be less than it.

SiCrane mentioned my idea.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!