Sign in to follow this  
  • entries
    90
  • comments
    66
  • views
    72444

Avoiding Repetition

Sign in to follow this  
Tesseract

86 views

In my game engine I am trying to have most, if not all, of the content created procedurally. This means lots of reliance on pseudo-randomness or seeded random number generators.

This also means that, when creating a list of names, avoiding duplicates can be kind of tricky.

My game has fifty towns with names based on the names of towns in Ukraine near the shore of the Black Sea. I broke the town names into prefix, root, and suffix pieces, then explored ways to re-combine them to make an arbitrary number of unique names.

Here are the prefixes:

Novo_
Dobro_
Velkyo_
Ljubo_
Mala_
Kryva_
Stari_



Here are the roots:

Valentyn
Soskar
Cebryk
Kalyn
Blakoj
Stepan
Dmytr
Zvir
Lukash
Kater
Levk
Taras
Oleksandr
Lozuv
Vytjaz
Percun
Pavl
Voron
Yelzav
Zachar
Janys
Oleks
Mychajl
Pozn
Vojnyc
Most
Zovtn
Borys
Strjuk
Lukan
Natas
Zereb
Lebed
Fedor
Scasl



Here are the suffixes:

ivka
ynka
anka
evo
ove
ovate
ajka



7 Prefixes, 35 roots, and 7 suffixes. And therein lay the second of my problems.

The first problem was as follows: I was using the initially generated random numbers for the towns as seeds for their names. I took the seed number and did a modulo 50 (%50) to determine the new seed for the names. This worked well enough, but I came up with many duplicates, because more than one of the random numbers would have the same result of the modulo (e.g. 768%50==18, 29418%50==18). Therefore, many duplicates. Since the initial numbers were for all intents and purposes random, I was left with going through the array of town names every time a new town was generated and making sure duplicates were re-calculated. This bothered me for two reasons:

1. If I want to use the engine for a large (e.g. 10000 towns) world, I would have to go through a rapidly growing array 9999 times to make sure avoid duplication
2. A town shouldn't have its own name constrained based on what another town is called. In other words, the towns should not be aware of each other.

So using seeded numbers was out the door. However, each town also has an index number (0 - n) which is unique for each town. This was a little more promising. All of my mental models seemed to work, so I whipped up some code and looked at the result.

Even more duplication! In a list of fifty towns, almost 20% of them had the same names. Unacceptable.

So I went back through my model and discovered something: The theory worked perfectly! It was the data set that needed a little tweaking.

Look at the three parts of the names: three lists, 7, 35, and 7 words long. Among those numbers, the Least Common Multiple is 35, and the Lowest Common Denominator is 7. Using these numbers I got a repeat period of 35, which meant 15 duplicated names out of a list of 50 towns. Almost 20%!

As a quick experiment, I commented out one of the root words, which left me with a number set of 7,34,7, which has a LCD of 1 and a LCM of 238. With that one change I suddenly had nearly eight times the number of unique usable names available, and they were all predictable!

I don't know if this is the best way to go about creating lists of names like this, because it requires constraining the data set to match a numeric quirk of the generator; it is, however, easy to debug and has a very low impact on the processor, as compared to going through a list of a thousand words a thousand times.

Nest: Putting this knowledge to use in the creation of individual town content.
Sign in to follow this  


1 Comment


Recommended Comments

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now