• entries
232
1463
• views
961840

# Star name generator

1820 views

I haven't posted any update since a few days, but as usual it doesn't mean that i've been inactive. I've been optimizing the code (it's not completely finished, but it's progressing well), but today, as a distraction, i coded a procedural name generator for the star systems. There will be billions of systems in the universe, i cannot assign manually a name to each of them, so i needed an algorithm to do that automatically.

There has been a few trials in this domain, for example Elite/Frontier was (as far as i understand) using a set of tables for syllables, and randomly appending them together. Well, it was a bit more complex than that, but not much.

I wanted to test an idea that i had in mind for quite a while: generating "realistically" sounding names.

For that, i took a list of existing names: the minor planets list. It contains more than 10,000 names.

The algorithm analyzes this list to build a syllables table. The first step is called syllabification. For each word, it tries to split it into a set of syllables, according to some rules. As i haven't been able to find an algorithm to do that, i made my own. It uses a few simple rules based on the occurences of consonants and vowels, respectively c and v. x means "anything" and the dash symbol is used to separate the syllables. The rules i used are:

xvcvx => xvcv-xxcvccx => xcv-ccxxcvvcx => xcv-vcxxvccvx => xvc-cvxxcvcvx => xcv-cvxxvvcvx => xvv-cvxxvcccx => xvc-ccx

Application to a few examples:
Mikula => Mi-ku-la
Lucidor => Lu-ci-dor
Friederike => Fri-ede-ri-ke
Flammario => Fla-mma-rio
Minanomachi => Mi-na-no-ma-chi

Of course, it doesn't work perfectly and some words are not split at the correct position, so i'll have to improve the rules a bit. This is typically the kind of things that could take weeks to fine tune, and i do not want to waste my time on that now.

Weird syllables are discarded (1-character, more than 3 characters, or no-vowels syllables).

Once that step is done, a table is built. It stores:
- for each syllable, the probability to have that syllable starting or ending the word
- for each syllable, a table with the list of potentially following syllables, all with their own "following" probability.

The name generator is pretty simple:
- it chooses a syllable pseudo-randomly given a seed, and according to the starting syllables probabilities;
- it chooses the next syllable given the previous syllable and its "following" probablities
- once the word is generated, if the last syllable is not a terminator, discard the whole name and retry.

As a result, some syllables have a high probability of following each other, and some combinations are not possible. Strange words should be avoided.

Here are a few examples of randomly generated names:
zizzarco, griesha, pomoko, butsuma, tsunelle, koukama, korolunon, josondivaik, eryanda, rochenqian, lanagy, almamelnius ...

It's not completely finished yet though: i also have to generate composite words, and words with special symbols.

Not being funny or anything, but with billions of planets, there might be planets called ****, ****, **** and ****. Please don't put in rules to remove them. It will be the difference between your game being Legendary and being just a very good tech demo :).

Just a suggestion: when doing composite names, the second word should have increased probabilities to include syllabes from the first one, or following to the first ones. It should be a good step towards giving the names a certain "ring".

Now that I think about it, this could also happen within the same word! But it works better as I first described it, in my opinion.

Another thing, have you noticed that the names from that list are in very different languages? It shows in the generated words. Words starting with visibly japanese syllabes and ending in english :P Is there a way to avoid that? Maybe use a different list, although I liked many of the names in this one.

Quote:
 ...there might be planets called Shit, Fuck, Toss and Wank. Please don't put in rules to remove them. It will be the difference between your game being Legendary and being just a very good...

probably also the difference between being distributed as an all-ages title and a top-shelf adults-only job [rolleyes]

Have you looked into using the various forms of language-theory to generate names? Things like L-Systems, EBNF's, RegEx's etc... ?

Jack

Sheessssh. I do apologise. I assume gamedev would **** any explitives! Honestly, sorry if I offended anyone.

I don't think i'll have to remove them, because unless i'm mistaken, there's no way these offending words can be generated from a set of simpler syllables. All the examples you've given are one syllable only, and they don't appear in my input names list.

Just a follow-up on this topic.

Today, i've been working on the name generator again. Unfortunately, i noticed one problem with the algorithm: the chances of having name collisions ( for a different seed number, getting the same name ) is pretty high. On a list of 1000 names only, i already found a few collisions... so imagine what would happen with millions of names.

I analyzed the name generation algorithm to find out why two different seeds generated the same name. It's actually pretty logical. The syllables are built from a list of words, to "teach" the system what syllables have a high chance of occuring, and following each other. Unfortunately, it pretty often happens that the combination of following syllables are extremely limited. As it's not easy to explain, i'll take an example.

Imagine there are 3 syllables in the system, i'll refer to them as "S1" to "S3". They all have an equal chance of occuring and following each other. Now, let's imagine that i meet a single word made of a new set of syllables: S4-S5-S6.

I want to generate some names randomly. If i pick up S1 as the starting syllable, i can still generate a high amount of combinations ( S1-S3-S2, S1-S2-S3, S1-S2, S1-S3 ). But if i pick up S4 as the starting syllable, the only word i can generate is S4-S5-S6, because i haven't found in my statistics table a combination of S4-S1, S4-S2 or S4-S3 yet.

In order to fix that, i went back to a simpler approach. I still use a statistics table to know what syllables have a higher chance of occuring, but i no longer restrict which syllables can follow each other.

I have also improved by a lot the syllabification process. Instead of trying to match the characters with some consonant / vowels rules, i simply recognize which characters form the beginning of a syllable. I use a hard-coded table with all the possible cases, like "bi, "bo", "bu", "gla", "gle", "gli", "ra", "re", "ri", etc.. etc.. The results i get are much better.

## Create an account

Register a new account