Avoiding Repetition

Published December 27, 2006
Advertisement
In my game engine I am trying to have most, if not all, of the content created procedurally. This means lots of reliance on pseudo-randomness or seeded random number generators.

This also means that, when creating a list of names, avoiding duplicates can be kind of tricky.

My game has fifty towns with names based on the names of towns in Ukraine near the shore of the Black Sea. I broke the town names into prefix, root, and suffix pieces, then explored ways to re-combine them to make an arbitrary number of unique names.

Here are the prefixes:
Novo_Dobro_Velkyo_Ljubo_Mala_Kryva_Stari_


Here are the roots:
ValentynSoskarCebrykKalynBlakojStepanDmytrZvirLukashKaterLevkTarasOleksandrLozuvVytjazPercunPavlVoronYelzavZacharJanysOleksMychajlPoznVojnycMostZovtnBorysStrjukLukanNatasZerebLebedFedorScasl


Here are the suffixes:
ivkaynkaankaevooveovateajka


7 Prefixes, 35 roots, and 7 suffixes. And therein lay the second of my problems.

The first problem was as follows: I was using the initially generated random numbers for the towns as seeds for their names. I took the seed number and did a modulo 50 (%50) to determine the new seed for the names. This worked well enough, but I came up with many duplicates, because more than one of the random numbers would have the same result of the modulo (e.g. 768%50==18, 29418%50==18). Therefore, many duplicates. Since the initial numbers were for all intents and purposes random, I was left with going through the array of town names every time a new town was generated and making sure duplicates were re-calculated. This bothered me for two reasons:

1. If I want to use the engine for a large (e.g. 10000 towns) world, I would have to go through a rapidly growing array 9999 times to make sure avoid duplication
2. A town shouldn't have its own name constrained based on what another town is called. In other words, the towns should not be aware of each other.

So using seeded numbers was out the door. However, each town also has an index number (0 - n) which is unique for each town. This was a little more promising. All of my mental models seemed to work, so I whipped up some code and looked at the result.

Even more duplication! In a list of fifty towns, almost 20% of them had the same names. Unacceptable.

So I went back through my model and discovered something: The theory worked perfectly! It was the data set that needed a little tweaking.

Look at the three parts of the names: three lists, 7, 35, and 7 words long. Among those numbers, the Least Common Multiple is 35, and the Lowest Common Denominator is 7. Using these numbers I got a repeat period of 35, which meant 15 duplicated names out of a list of 50 towns. Almost 20%!

As a quick experiment, I commented out one of the root words, which left me with a number set of 7,34,7, which has a LCD of 1 and a LCM of 238. With that one change I suddenly had nearly eight times the number of unique usable names available, and they were all predictable!

I don't know if this is the best way to go about creating lists of names like this, because it requires constraining the data set to match a numeric quirk of the generator; it is, however, easy to debug and has a very low impact on the processor, as compared to going through a list of a thousand words a thousand times.

Nest: Putting this knowledge to use in the creation of individual town content.
0 likes 1 comments

Comments

Tesseract
[EDIT] Oops. Been too long since I was in a math class. I meant to say GREATEST Common Denominator.
December 28, 2006 11:35 AM
You must log in to join the conversation.
Don't have a GameDev.net account? Sign up!
Profile
Author
Advertisement
Advertisement