There has been a few trials in this domain, for example Elite/Frontier was (as far as i understand) using a set of tables for syllables, and randomly appending them together. Well, it was a bit more complex than that, but not much.
I wanted to test an idea that i had in mind for quite a while: generating "realistically" sounding names.
For that, i took a list of existing names: the minor planets list. It contains more than 10,000 names.
The algorithm analyzes this list to build a syllables table. The first step is called syllabification. For each word, it tries to split it into a set of syllables, according to some rules. As i haven't been able to find an algorithm to do that, i made my own. It uses a few simple rules based on the occurences of consonants and vowels, respectively c and v. x means "anything" and the dash symbol is used to separate the syllables. The rules i used are:
xvcvx => xvcv-x
xcvccx => xcv-ccx
xcvvcx => xcv-vcx
xvccvx => xvc-cvx
xcvcvx => xcv-cvx
xvvcvx => xvv-cvx
xvcccx => xvc-ccx
Application to a few examples:
Mikula => Mi-ku-la
Lucidor => Lu-ci-dor
Friederike => Fri-ede-ri-ke
Flammario => Fla-mma-rio
Minanomachi => Mi-na-no-ma-chi
Of course, it doesn't work perfectly and some words are not split at the correct position, so i'll have to improve the rules a bit. This is typically the kind of things that could take weeks to fine tune, and i do not want to waste my time on that now.
Weird syllables are discarded (1-character, more than 3 characters, or no-vowels syllables).
Once that step is done, a table is built. It stores:
- for each syllable, the probability to have that syllable starting or ending the word
- for each syllable, a table with the list of potentially following syllables, all with their own "following" probability.
The name generator is pretty simple:
- it chooses a syllable pseudo-randomly given a seed, and according to the starting syllables probabilities;
- it chooses the next syllable given the previous syllable and its "following" probablities
- once the word is generated, if the last syllable is not a terminator, discard the whole name and retry.
As a result, some syllables have a high probability of following each other, and some combinations are not possible. Strange words should be avoided.
Here are a few examples of randomly generated names:
zizzarco, griesha, pomoko, butsuma, tsunelle, koukama, korolunon, josondivaik, eryanda, rochenqian, lanagy, almamelnius ...
It's not completely finished yet though: i also have to generate composite words, and words with special symbols.