Home » Community » Forums » » Random Name-Generation Using Grammars
  Intel sponsors gamedev.net search:   
[Control Panel] [Register] [Bookmarks] [Who's Online] [Active Topics] [Stats] [FAQ] [Search]

Add Forum to Favorites |  Send Topic To a Friend | View Forum FAQ | Track this topic

Page:   1 2 »»

 Last Thread Next Thread 
 Random Name-Generation Using Grammars
Post Reply 
to be quite honest, the names from the new algorithm didnt seem to be that much of an improvement, but very cool if you want to make a game like everquest. You can so tell im a newb, right? hehe

 User Rating: 1015    Report this Post to a Moderator | Link

I've made some code up myself, I think it produces quite okayish names.
Given this text http://the-tech.mit.edu/Shakespeare/allswell/allswell.1.1.html

it produces things like
waldofour,gono,aive,sen,yoredyour,erallaned,estirondon,chioserdea,vitidesu,rgi,tarsthanch,insha

Take this text
http://www.crs4.it/Letteratura/Decamerone/Prima/1_01.htm

it produces things like
scoicomal,trave,garpav,ealqundon,zistosce,dazo,tem,omolal,ichionra,pra

and here's the code ( in python )

counts = {'a':0,'b':0,'c':0,'d':0,'e':0,'f':0,'g':0,'h':0,'i':0,'j':0,'k':0,
'l':0,'m':0,'n':0,'o':0,'p':0,'q':0,'r':0,'s':0,'t':0,'u':0,'v':0,
'w':0,'x':0,'y':0,'z':0}
combinations = {'1beg':dict( counts ),
'a':dict( counts ),
'b':dict( counts ),
'c':dict( counts ),
'd':dict( counts ),
'e':dict( counts ),
'f':dict( counts ),
'g':dict( counts ),
'h':dict( counts ),
'i':dict( counts ),
'j':dict( counts ),
'k':dict( counts ),
'l':dict( counts ),
'm':dict( counts ),
'n':dict( counts ),
'o':dict( counts ),
'p':dict( counts ),
'q':dict( counts ),
'r':dict( counts ),
's':dict( counts ),
't':dict( counts ),
'u':dict( counts ),
'v':dict( counts ),
'w':dict( counts ),
'x':dict( counts ),
'y':dict( counts ),
'z':dict( counts )}

text = file( 'aramaic.txt' ).read()
result = ''
legal = 'abcdefghijklmnopqrstuvwxyz'
for c in text.lower():
if c in legal:
result += c
elif c in string.punctuation:
result += ' '

for word in shlex.shlex( result ):
last = '1beg'
for c in word:
combinations[last][c]+=1
last = c

class chooser:
def __init__( self, c ):
self.values = []
for key in c.keys():
self.values += [key]* c[key]
self.l = len( self.values )
def get( self ):
return self.values[ random.randint(0,self.l-1) ]

choosers = dict()

for key in combinations:
choosers[key] = chooser( combinations[key] )

last = '1beg'
word = ''
for i in range( 10 ):
for i in range( random.randint( 3, 10 ) ):
v = choosers[last].get()
word += v
last = v
word += ','

print word

 User Rating: 1015    Report this Post to a Moderator | Link

worked a bit more
now I can produce things like these

vami,yayama,tindorelo,atingedere,asol,yer,abuvin,emaion
demot,tinghti,lanx,leermon,emul,raliapa,tesson,ghall
daley,hovaq

But the algorithm is buggy, and the above is just about the better one third of the output. But I am sure one can get such names most of the time.


 User Rating: 1015    Report this Post to a Moderator | Link

the guy who wrote The Everchanging Book of Names gives some information about how his program generates random names. it seems similar to this method (to me, that is; i know relatively nothing about this topic)... maybe with both sources of information you can get better results.

 User Rating: 1214   |  Rate This User  Send Private MessageView Profile Report this Post to a Moderator | Link

When you're dealing with probabilities, what you have is a Markov model. Markov models come in different orders, depending on how much memory they have.

An table with zero memory, only a probability for each letter, would be a zeroth order Markov model. A table with one step of memory, i e, the probability of each letter depends on what letter came before it, would be a first-order Markov model.

You can easily build Markov models in memory by either using really big tables, or by using hashing (as many slots in the table are likely to not be populated). You can populate them by feeding real text into them, which seemed to be what the first python script did. In fact, the first python script looks a lot like a first-order Markov model.

Once you get to higher levels of Markov models, the random generator is likely to start spitting out whole words that were in the input. This is a great thing to base a text compression algorithm on, by the way -- in fact, most dictionary based text compression algorithms work at a low level by somehow modeling N-th order Markov models.

To implement a Markov for name generation for English, you need 27 symbols: 26 for the alphabet, and one for "stop here" (let's call it "space" :-). If you want to be all binary, you could expand that to 32, which would give space for some umlauts and other foreign characters if you wanted. However, the space needed grows exponentially by the depth of the model (if you don't use hashing) so keeping it tight has value.

Hmm... I feel a generic-programming C++ template implementation coming on... who will post it first? ;-)

 User Rating: 1944   |  Rate This User  Send Private MessageView ProfileView Journal Report this Post to a Moderator | Link

Intresting

I didn't knew about Markov, but this hash-table thingy seemed to me the easiest solution to detect short term dependencies.
Currently I use some sort of 4th order Markov I guess. Where I first identify patterns up to three letters that occur more then once in a given text. And then analyse the probability of such symbols following each other. Once I got that, the words start looking a lot better. But still it's not quite there where I want it.

 User Rating: 1015    Report this Post to a Moderator | Link

quote:
Original post by krez
the guy who wrote <A HREF="http://ebon.pyorre.net/">The Everchanging Book of Names</A> gives some information about how his program generates random names. it seems similar to this method (to me, that is; i know relatively nothing about this topic)... maybe with both sources of information you can get better results.


Well, as far as I can say the Everchanging book of names uses predefined grammars to represent production rules. I didn't rely on a grammar and instead made a possibility analysis, which tells me how likely a certyn group of letters is to apear behind another.
The methods are somewhat similar, in that I guess with the analysis you could produce a grammar.



 User Rating: 1015    Report this Post to a Moderator | Link

Is importend for me to say that I didn't want to depreciate the Markovian List when I wrote this article, but I wanted to generate names for a role player game that should always be pronounceable. The only way I knew to ensure that were grammars.


[edited by - conman on November 26, 2003 9:52:31 AM]

 User Rating: 1054   |  Rate This User  Send Private MessageView Profile Report this Post to a Moderator | Link

i think the EBoN generates those grammars from name-lists (i.e. if you type up a text file with a thousand german names, it will start generating german-sounding names).

 User Rating: 1214   |  Rate This User  Send Private MessageView Profile Report this Post to a Moderator | Link

Made some further tryouts

My current implementation is markovian-n capable. The higher n goes the more likely will an original word of the text come up. Also I give bigger symbols the preference in matching before small ones.

For shakespeare english with markov_n=1 it makes
mpreans wisherrichil avilonos bllldiroune harabld pakieeadrd ithavisiopryospayou agojond mamusis rofanored

markov_n=2
hishathelt ciance strais wascomes asivermin perandmer lafece lowillikeena fort's copearommon

markov_n=3
fathere parolles wantage thusband buries ambitessible acuted appromine porrow naturalize

For a funny language unknown to me it makes
desiozko barmen ezinago estuko alkoholara eterna dagiten bestukeriaz nabarnago gordin

and for elven names taken from fantasy name generator
cilmanduil belimir elvararil tinithraling ririon tinith isebrir mithrand nilmanduil norfilithraldor

I noticed that the more material you give the statistics the more likely they will produce non repetable products.
Also in case of the elven names, the original set is 500 names big. 2 names where in the original set, 3 where pieces of other names, and the rest 6 where assembled other names.



 User Rating: 1015    Report this Post to a Moderator | Link

Clicky

 User Rating: 1015    Report this Post to a Moderator | Link

aaahh yes.... random name generation... reminds me of one of my first 'real' programs -> a random sentence generator.... a throwback to the days of yore... nice!!

 User Rating: 1032   |  Rate This User  Send Private MessageView Profile Report this Post to a Moderator | Link

Good article, just that this kind of random generation can be difficult when it comes to saving games or providing a uniform experience for every player. Eg. One player will encounter someone called Joe Bloggs who will be played by Jane Seymour in a different copy of the game - or even the same game but after it has been restarted. Plus random stuff makes it difficult to keep replays etc.

In anycase, it was a fun and well written article, just thought I would point out that this system should be used with care.

Cheers,

Paul

 User Rating: 1643   |  Rate This User  Send Private MessageView ProfileView Journal Report this Post to a Moderator | Link

quote:
Original post by paulsdsrubbish
Good article, just that this kind of random generation can be difficult when it comes to saving games or providing a uniform experience for every player. Eg. One player will encounter someone called Joe Bloggs who will be played by Jane Seymour in a different copy of the game - or even the same game but after it has been restarted. Plus random stuff makes it difficult to keep replays etc.

In anycase, it was a fun and well written article, just thought I would point out that this system should be used with care.

Cheers,

Paul

That's not so much a problem insofar you are able to produce predictable random numbers ( giving the same seed for example ).



 User Rating: 1015    Report this Post to a Moderator | Link

paulsdsrubbish: That's a persistence problem, not a generation problem. If you have a database of players, then it's easy to have the player create a profile up-front where the name is generated randomly, but then persisted in the profile. Restarts of the game wouldn't matter because the player name is taken from the player profile.

 User Rating: 1015    Report this Post to a Moderator | Link

I don't believe Markov models are actually really needed here. They are almost indispensible for pattern/speech recognition or generating hypotheses, but for generating plausible names the language specific (or 'fantasy language specific') phonotactic rules are all that are needed.

For an even better algorithm I suggest using syllable/cluster based rules. For example, you could define a syllable as 'a vowel (nucleus) with a string of phonemes/letter with ascending vocalicity before it and descending vocalicity after it'. 'Vocalicity' denotes a phonemes 'loudness', and it requires that you can classify phonemes as for follows:
v1(stop consonants) = 0 (e.g. p,t,k,b,d,g)
v2(fricatives) = 1 (e.g. f,s,'sh', 'sch','ch' etc)
v3(liquids) = 2 (e.g. j,l,r)
v4(nasals) = 3 (e.g. n,m,'ng')
v5(vowels/diphtongs) = (e.g. a,e,i,o,u,au,ei,ow etc)
So that:
Syllable = (v1)(v2)(v3)(v4)(v5)(v4)(v3)(v2)(v1)

Haven't tried this, but I think it's more accurate a generative grammar.

Anyone care to try? 5-minute Perl

 User Rating: 1015    Report this Post to a Moderator | Link

quote:
Original post by Anonymous Poster
I don't believe Markov models are actually really needed here. They are almost indispensible for pattern/speech recognition or generating hypotheses, but for generating plausible names the language specific (or 'fantasy language specific') phonotactic rules are all that are needed.

For an even better algorithm I suggest using syllable/cluster based rules. For example, you could define a syllable as 'a vowel (nucleus) with a string of phonemes/letter with ascending vocalicity before it and descending vocalicity after it'. 'Vocalicity' denotes a phonemes 'loudness', and it requires that you can classify phonemes as for follows:
v1(stop consonants) = 0 (e.g. p,t,k,b,d,g)
v2(fricatives) = 1 (e.g. f,s,'sh', 'sch','ch' etc)
v3(liquids) = 2 (e.g. j,l,r)
v4(nasals) = 3 (e.g. n,m,'ng')
v5(vowels/diphtongs) = (e.g. a,e,i,o,u,au,ei,ow etc)
So that:
Syllable = (v1)(v2)(v3)(v4)(v5)(v4)(v3)(v2)(v1)

Haven't tried this, but I think it's more accurate a generative grammar.

Anyone care to try? 5-minute Perl


It's not that I think grammars are not useful for this kind of stuff, but they do only half the work. You still have to hack in the grammar itself. I think it is easier to produce such a grammar from text. And once you generate the grammar from text, it's also much easier to use the probabilites directly, instead of translating them to a grammar.
I have arrived at an algorithm that can spit out words from any text, which are realy sounding like words from the language the source text was written in. It basicaly involves analyzing which letter follows a sequence of other letters, and then storing the occurences by which this happens.

example?
None of the following names are in the original text, nor are they only part of names. They are all unique. A python script first analyzes the source-text ( about 2000 german names ) and then produces these 100 names, in under a second.

Sentin Knuta Thilde Greta Angelbert Gerlis Mariana Herberhilde Leonorena Annifer Gudrunhild Andra Mirose Carlotte Kornelix Terenedi Judia Wilheiderike Anettilie Lariusz Wolfredy Flore Albrechthias Carose Mirosemar Ernstafa Ullahattina Lisabether Sebastino Eckartmut Jochelm Bertus Margretel Josel Musten Adeltrude Eckehardt Veresela Otfriedrik Denno Sopher Dietlef Artus Liancy Julina Elisel Rosalinda Margios Lothard Astav Gustel Annemarios Martmut Silvatorin Walbert Kristinos Josemal Bertraudi Hendrzej Heinharritas Berneli Helgarena Ingeburgarian Patrix Johammad Heidemal Felika Kristantin Guntraud Edgaret Erichel Liesela Ansgarek Wladimil Lillene Nicolaud Harritz Mirosef Mathilo Catrich Luciechthilip Fraudia Herrem Kristof Wanderoslaw Almar Engeborge Gabried Artha Ortwig Normanna Karlo Dennislaw Antjepan Hannifer Corianet Siegbertur Thomann Carleskardt Aydine Manusz Martharline Kadimil Denie Carold Cemar Herberhard Sebastin Hardt Ginan Mirosela Norbertraut Rosef Annegreta Galieslaw Eliselorinand Waltherenfriel Eckhardy Herwin Muharta Silvira Dorena Alfonstafa Erolinde Ingrit Verolind Vladine Karoslaw Dimir Lisberta Edele Nancesco Harre Irmtraute Tonio Gabrieder Iristanta Heikolantin Ernar Felitta Bodor Annelina Rolas Birgios Manut Herberto Kamine Giescha Ullah Wernold Verald Coristof Ullrik Raimunda Gernharina Herberthur Ortraute Ayselore Erharlinald Beatrinandy Ferda Coritz Marena Eckartwin Ismannina Emmad Verosemar Otmara Piotram Anjam Urseloreen Siegberthur Carosel Hannetty Alfonsta Reimarios Gabian Haliese Cathris Heribertraudi Carolina Ericita Zbignieleinhard Stantin Angelore Gesia Jerzysztof Adolfgan Sentine Galiesbeth Ulricha Rotraut Reinrich Januelie Chard Tille Abdula Franko Katas Henrichen



 User Rating: 1015    Report this Post to a Moderator | Link

The german names from above where assembled using matches on the following 733 symbols, each having an average of 4 indexed letters. a < means beginning of a word, and a > means the end of it.

<Ad <Al <An <Ar <Be <Bi <Bo <Br <Bu <Ca <Ch <Cl <Co <Da <De <Di <Do <Ec <Ed <El <Em <En <Er <Fa <Fr <Ga <Ge <Gi <Gr <Gu <Ha <He <Hi <Hu <Il <In <Ir <Is <Ja <Je <Jo <Ju <Ka <Ke <Kl <Ko <Kr <Le <Li <Lo <Lu <Ma <Me <Mi <Mo <Na <Ni <No <Ol <Ot <Pa <Pe <Ra <Re <Ri <Ro <Sa <Se <Si <So <St <Su <Ta <Th <Ti <To <Ul <Ve <Vi <Wa <Wi <Wo Ale And Ann Ant Arn Bea Ber Bri Bur Car Chr Cla Cor Dan Die Dor Eck Ell Eri Fra Fri Gab Ger Gun Han Har Hei Hel Hen Her Hil Ing Irm Jan Joh Jos Jul Kar Kat Kon Kri Leo Lie Man Mar Mic Mir Moh Nic Nik Nor Rei Ros Sab Sie Sig Sil Ste Sus The Wal Wil Wol abe abi abr adi ald ali alt ana anc and ane ani ann ans ant anu anz ara arc ard are arg ari arl aro arr art ata ath atr aud aus aut ber bet bri cha che col del der dia din dre dul eat ede eid ein ela ele elg eli ell elm elo ema ena ene enn enr eon eph era ere erh ern ero ert ese eta ete eth ett fri gar gel git gri ham han har hel her hil him hri ian ich ico ied ieg iel ies iet ika ike iko ild ili ill imi imo ina ind ine inh ise ist ita ith itt ius kha kol lan lau law lbe lde len lex lfr lga lhe lia lie lin lis lke lla lli lly lma lmu lor lot ltr lvi man mar met mil min mon mun mut nde ndr nel ner net nge nha nhi nie nis nja nna nne nni nny nri nst nti nto oha ola old olf ona oni ons ora ore org ori ose ott ram ran rau rdt red ren ret rga rgi rha ria ric rid rie rik rin ris rit rli rma rnd rne rol ros rst rth rtr rud run san sbe sel sla sta ste sti sto tan tef ten tha the thi tia tin tma ton tor tra tri tru tta tte tti ude udi uel ula uli und usa ust usz uth wig win <A <B <C <D <E <F <G <H <I <J <K <L <M <N <O <P <R <S <T <U <V <W Ad Al An Ar Be Bi Bo Br Bu Ca Ch Cl Co Da De Di Do Ec Ed El Em En Er Fa Fr Ga Ge Gi Gr Gu Ha He Hi Hu Il In Ir Is Ja Je Jo Ju Ka Ke Kl Ko Kr Le Li Lo Lu Ma Me Mi Mo Na Ni No Ol Ot Pa Pe Ra Re Ri Ro Sa Se Si So St Su Ta Th Ti To Ul Ve Vi Wa Wi Wo ab ac ad ah ai al am an ar as at au aw ba be bi br ca ce ch ci ck co da de di do dr dt du dw ea eb ec ed ee ef eg ei ek el em en eo ep er es et ex fa fi fr ga ge gi go gr ha he hi hm ho hr ia ib ic id ie ig ik il im in io ir is it iu ja ka ke kh ki ko la lb ld le lf lg lh li lk ll lm lo ls lt lv ly ma me mi mm mo mu na nc nd ne ng nh ni nj nk nn no nr ns nt nu ny nz ob of og oh ol om on op or os ot ph ra rc rd re rg rh ri rk rl rm rn ro rr rs rt ru ry rz sa sb sc se si sk sl sm ss st sz ta te th ti tj tl tm to tr tt tz ud ue ug ul un ur us ut va ve vi wa wi ys ze zi < A B C D E F G H I J K L M N O P R S T U V W a b c d e f g h i j k l m n o p r s t u v w x y z

 User Rating: 1015    Report this Post to a Moderator | Link

interesting method, but could you post also some rules of your grammar created by this script?

 User Rating: 1054   |  Rate This User  Send Private MessageView Profile Report this Post to a Moderator | Link

quote:
Original post by conman
interesting method, but could you post also some rules of your grammar created by this script?


Uhm, until now I don't have produced grammars from this statistics but I'll look into it.

Yet the method is somewhat not describable by a grammar, in that is iterative rather then recursive. That also means that I match the longest preceeding symbol to find out which single letter to append.


 User Rating: 1015    Report this Post to a Moderator | Link

Ok, sorry - I misunderstood, you wrote in an earlier post that you work around grammars by using only propabilities (related to
Markov models) (I was confused because you are an anonymous poster, so you don't know if it's a different person or not...)


 User Rating: 1054   |  Rate This User  Send Private MessageView Profile Report this Post to a Moderator | Link

I guess the following is what comes closest to a grammar. It's a table expressing the statistics I build. A line begins with the symbol to match, and is followed by the likely letter candidates

<Wi =:t 0.08 e 0.23 l 0.62 n 0.08
<Ed =:e 0.27 d 0.09 g 0.09 i 0.18 m 0.09 u 0.09 w 0.18
<El =:e 0.12 f 0.12 i 0.12 k 0.06 m 0.06 l 0.24 s 0.18 v 0.06 z 0.06
k =:a 0.19 e 0.22 i 0.05 h 0.08 k 0.03 m 0.03 l 0.01 o 0.13 r 0.01 u 0.01 t 0.04 > 0.19
V =:i 0.25 a 0.19 e 0.31 l 0.06 o 0.19
me =:d 0.23 s 0.08 l 0.15 t 0.31 n 0.23
ma =:d 0.08 i 0.03 l 0.06 n 0.19 s 0.03 r 0.39 > 0.22
mi =:e 0.06 l 0.25 n 0.44 r 0.13 t 0.06 > 0.06
mu =:t 0.67 n 0.33
Li =:a 0.07 e 0.29 d 0.07 l 0.21 n 0.14 s 0.21
A =:c 0.01 b 0.02 d 0.07 h 0.03 l 0.19 n 0.43 s 0.02 r 0.15 u 0.02 y 0.03 x 0.01
<Fr =:a 0.32 i 0.53 e 0.16
l =:> 0.09 a 0.16 b 0.02 e 0.08 d 0.09 g 0.02 f 0.06 i 0.16 h 0.01 k 0.03 j 0.00 m 0.05 l 0.07 o 0.05 p 0.00 s 0.02 r 0.01 t 0.02 w 0.00 v 0.02 y 0.01 z 0.00
rt =:a 0.07 i 0.05 h 0.12 m 0.02 o 0.02 r 0.15 u 0.05 w 0.05 > 0.46
ru =:d 0.45 n 0.55
rs =:e 0.08 u 0.08 t 0.75 > 0.08
Ge =:s 0.05 r 0.82 o 0.14
rd =:a 0.05 i 0.02 o 0.02 u 0.02 t 0.10 y 0.02 > 0.76
re =:a 0.05 c 0.02 e 0.02 d 0.11 g 0.02 k 0.02 j 0.02 m 0.05 n 0.16 s 0.07 t 0.18 w 0.02 > 0.25
rg =:a 0.35 e 0.05 i 0.25 h 0.05 o 0.05 r 0.10 > 0.15
ra =:d 0.04 h 0.02 m 0.09 l 0.07 n 0.15 r 0.02 u 0.26 t 0.02 z 0.02 > 0.30
rl =:a 0.13 e 0.20 i 0.27 h 0.07 o 0.20 > 0.13
rn =:a 0.11 e 0.28 d 0.22 i 0.06 h 0.06 o 0.17 s 0.06 u 0.06
ro =:> 0.11 s 0.28 n 0.06 t 0.17 l 0.39
ri =:a 0.08 c 0.10 b 0.02 e 0.19 d 0.04 g 0.02 f 0.01 k 0.06 j 0.01 o 0.03 n 0.12 s 0.17 u 0.03 t 0.11 x 0.01 > 0.02
ist =:i 0.56 a 0.06 e 0.06 l 0.06 o 0.25
B =:a 0.07 e 0.46 i 0.09 j 0.02 o 0.09 r 0.17 u 0.09 > 0.02
Ka =:d 0.05 i 0.05 m 0.05 r 0.41 t 0.32 y 0.05 z 0.09
rau =:k 0.08 d 0.50 t 0.42
<A =:c 0.01 b 0.02 d 0.07 h 0.03 l 0.19 n 0.43 s 0.02 r 0.15 u 0.02 y 0.03 x 0.01
<N =:i 0.52 a 0.26 e 0.04 o 0.17
m =:a 0.31 e 0.11 g 0.01 i 0.14 h 0.01 k 0.01 m 0.04 o 0.06 u 0.10 t 0.02 y 0.01 > 0.18
hri =:s 0.87 n 0.13
di =:a 0.21 k 0.05 m 0.11 n 0.21 r 0.05 u 0.05 t 0.16 > 0.16
de =:g 0.03 m 0.10 l 0.23 r 0.13 u 0.03 > 0.47
da =:n 0.07 m 0.07 l 0.07 > 0.80
dr =:i 0.21 u 0.14 z 0.07 e 0.43 a 0.14
ild =:a 0.09 e 0.36 > 0.55
<Ge =:s 0.05 r 0.82 o 0.14
Ch =:a 0.13 r 0.87
Ca =:r 0.75 t 0.25
n =:a 0.13 c 0.01 e 0.14 d 0.08 g 0.03 f 0.01 i 0.07 h 0.02 k 0.01 j 0.01 o 0.03 n 0.11 s 0.04 r 0.02 u 0.01 t 0.03 v 0.00 y 0.01 z 0.02 > 0.20
is =:a 0.05 b 0.05 e 0.18 k 0.03 l 0.05 s 0.03 t 0.41 > 0.21
ir =:a 0.07 g 0.13 i 0.07 k 0.13 j 0.07 o 0.13 s 0.13 > 0.27
it =:a 0.20 e 0.03 h 0.14 m 0.03 o 0.03 r 0.03 t 0.23 z 0.06 > 0.26
ik =:a 0.19 e 0.27 m 0.04 o 0.19 t 0.12 > 0.19
im =:i 0.26 a 0.05 u 0.11 o 0.21 > 0.37
il =:a 0.02 b 0.02 d 0.24 f 0.02 i 0.11 h 0.04 k 0.02 m 0.07 l 0.17 o 0.07 s 0.02 t 0.02 v 0.07 > 0.11
in =:a 0.25 e 0.22 d 0.07 g 0.01 f 0.01 i 0.01 h 0.06 o 0.03 n 0.02 r 0.02 z 0.02 > 0.26
ia =:s 0.07 > 0.46 m 0.04 n 0.43
ic =:a 0.10 e 0.10 i 0.07 h 0.43 k 0.03 o 0.20 t 0.03 > 0.03
ie =:c 0.01 b 0.01 d 0.20 g 0.10 l 0.10 s 0.13 r 0.04 t 0.14 w 0.01 > 0.25
id =:a 0.07 e 0.20 i 0.13 o 0.07 r 0.07 > 0.47
ig =:i 0.17 l 0.08 n 0.08 r 0.17 u 0.08 > 0.42
tha =:r 0.45 > 0.55
eli =:a 0.14 c 0.07 e 0.21 k 0.07 n 0.14 s 0.07 u 0.07 t 0.07 x 0.07 > 0.07
ela =:h 0.08 n 0.17 > 0.75
har =:d 0.77 i 0.07 l 0.07 r 0.03 t 0.03 > 0.03
D =:i 0.37 a 0.17 e 0.11 o 0.29 > 0.06
ola =:i 0.17 o 0.08 n 0.17 s 0.08 u 0.08 > 0.42
ich =:a 0.31 e 0.15 > 0.54
Ha =:k 0.07 l 0.11 n 0.41 s 0.11 r 0.26 t 0.04
He =:i 0.27 n 0.20 r 0.20 l 0.29 d 0.05
o =:a 0.01 c 0.00 b 0.02 d 0.01 g 0.02 f 0.02 i 0.00 h 0.04 j 0.00 m 0.02 l 0.15 n 0.15 p 0.02 s 0.10 r 0.19 u 0.01 t 0.06 v 0.00 y 0.00 z 0.01 > 0.17
<Re =:i 0.53 c 0.07 g 0.20 n 0.20
<Ro =:b 0.12 g 0.06 m 0.06 l 0.12 n 0.12 s 0.41 t 0.06 y 0.06
nne =:d 0.05 g 0.05 m 0.05 l 0.29 s 0.10 r 0.05 t 0.14 > 0.29
nh =:a 0.55 i 0.36 o 0.09
ni =:a 0.03 c 0.03 e 0.28 d 0.03 f 0.03 k 0.09 m 0.03 o 0.03 n 0.09 s 0.16 t 0.03 > 0.16
nn =:a 0.18 e 0.42 i 0.12 o 0.06 s 0.02 y 0.10 > 0.10
no =:s 0.08 t 0.08 r 0.15 l 0.08 > 0.62
na =:n 0.03 r 0.03 l 0.05 t 0.03 > 0.85
nd =:a 0.08 e 0.14 i 0.03 o 0.03 r 0.28 u 0.03 t 0.06 y 0.03 > 0.33
ne =:d 0.03 g 0.02 m 0.02 l 0.14 s 0.05 r 0.08 t 0.11 > 0.56
ng =:a 0.07 r 0.07 e 0.57 o 0.14 > 0.14
tin =:a 0.29 e 0.12 o 0.06 > 0.53
ns =:j 0.19 t 0.38 g 0.06 > 0.38
nt =:a 0.13 e 0.07 i 0.33 h 0.07 j 0.07 o 0.27 r 0.07
E =:c 0.08 b 0.02 d 0.17 g 0.03 i 0.03 h 0.02 k 0.03 m 0.09 l 0.26 n 0.06 s 0.02 r 0.15 u 0.03 w 0.03
ert =:a 0.07 h 0.11 o 0.04 r 0.19 u 0.04 > 0.56
p =:a 0.06 e 0.06 h 0.47 o 0.06 p 0.12 > 0.24
be =:r 0.70 t 0.22 l 0.09
Ma =:d 0.02 i 0.03 h 0.03 j 0.02 l 0.02 n 0.07 r 0.74 t 0.05 x 0.03
Mi =:c 0.38 r 0.38 l 0.08 k 0.08 n 0.08
<Si =:m 0.16 b 0.05 e 0.37 l 0.21 g 0.21
F =:a 0.14 r 0.68 e 0.11 l 0.07
<Ar =:i 0.23 m 0.08 t 0.15 n 0.54
<Al =:b 0.13 e 0.25 f 0.19 i 0.13 m 0.19 o 0.06 w 0.06
<An =:e 0.05 d 0.16 g 0.08 i 0.03 k 0.05 j 0.03 n 0.41 s 0.03 t 0.16
mar =:i 0.21 a 0.07 > 0.71
st =:a 0.22 e 0.20 i 0.29 h 0.02 l 0.02 o 0.10 r 0.02 y 0.02 > 0.10
sa =:> 0.38 b 0.19 l 0.06 n 0.38
se =:f 0.11 i 0.04 m 0.04 l 0.29 p 0.07 > 0.46
Re =:i 0.53 c 0.07 g 0.20 n 0.20
Chr =:i 1.00
Ro =:b 0.12 g 0.06 m 0.06 l 0.12 n 0.12 s 0.41 t 0.06 y 0.06
tte =:> 1.00
em =:a 0.64 e 0.18 > 0.18
el =:a 0.16 b 0.01 e 0.08 g 0.05 i 0.18 h 0.01 m 0.12 l 0.05 o 0.09 s 0.01 t 0.03 > 0.21
eo =:> 0.17 r 0.25 d 0.08 p 0.08 n 0.42
en =:a 0.16 e 0.09 d 0.02 f 0.02 i 0.05 j 0.02 n 0.10 s 0.05 r 0.09 t 0.05 z 0.02 > 0.34
ei =:d 0.22 g 0.04 k 0.11 m 0.07 n 0.52 t 0.04
ed =:a 0.04 e 0.19 d 0.04 i 0.08 h 0.04 o 0.04 r 0.04 w 0.04 y 0.04 > 0.46
eg =:a 0.07 b 0.07 g 0.07 f 0.07 i 0.20 h 0.07 m 0.13 l 0.07 o 0.13 r 0.13
ea =:s 0.09 n 0.27 t 0.36 > 0.27
et =:a 0.09 e 0.12 i 0.02 h 0.19 m 0.02 l 0.07 r 0.05 t 0.28 > 0.16
es =:a 0.05 c 0.09 b 0.05 e 0.23 i 0.09 k 0.05 l 0.05 s 0.05 z 0.05 > 0.32
er =:a 0.04 b 0.01 e 0.04 d 0.03 i 0.03 h 0.04 k 0.01 m 0.02 l 0.02 o 0.05 n 0.07 s 0.02 r 0.03 t 0.30 w 0.01 z 0.02 > 0.23
G =:a 0.11 e 0.39 i 0.14 l 0.02 o 0.04 r 0.11 u 0.16 > 0.05
<Be =:a 0.19 k 0.05 r 0.48 t 0.14 n 0.14
ian =:a 0.10 c 0.05 e 0.25 k 0.05 n 0.10 > 0.45
r =:> 0.10 a 0.09 c 0.01 b 0.01 e 0.09 d 0.08 g 0.04 i 0.24 h 0.01 k 0.02 j 0.00 m 0.03 l 0.03 o 0.04 n 0.04 s 0.02 r 0.01 u 0.02 t 0.08 w 0.00 y 0.01 z 0.01
ann =:a 0.23 e 0.36 i 0.09 o 0.05 s 0.05 > 0.23
Di =:a 0.15 m 0.08 r 0.08 e 0.62 t 0.08
Wi =:t 0.08 e 0.23 l 0.62 n 0.08
H =:a 0.33 e 0.50 i 0.09 o 0.02 u 0.05 > 0.01
ja =:n 0.09 m 0.18 > 0.73
C =:a 0.27 e 0.02 h 0.34 l 0.14 o 0.18 u 0.02 > 0.02
s =:a 0.09 c 0.03 b 0.02 e 0.15 g 0.01 i 0.03 k 0.02 j 0.02 m 0.02 l 0.03 o 0.01 s 0.03 u 0.01 t 0.22 w 0.01 z 0.04 > 0.26
ein =:e 0.21 h 0.43 o 0.07 r 0.07 z 0.14 > 0.07
<Ca =:r 0.75 t 0.25
I =:b 0.03 d 0.03 g 0.03 m 0.03 l 0.11 n 0.29 s 0.16 r 0.26 w 0.03 v 0.05
<Ch =:a 0.13 r 0.87
In =:a 0.09 k 0.09 e 0.09 g 0.73
t =:a 0.14 e 0.13 f 0.01 i 0.09 h 0.12 j 0.01 m 0.02 l 0.01 o 0.05 r 0.09 u 0.01 t 0.10 w 0.01 y 0.01 z 0.01 > 0.21
<Ja =:c 0.15 k 0.08 m 0.08 n 0.46 q 0.08 s 0.08 r 0.08
on =:a 0.18 e 0.03 d 0.03 i 0.26 h 0.03 j 0.03 o 0.05 n 0.08 s 0.16 r 0.05 > 0.11
ol =:a 0.32 e 0.03 d 0.18 g 0.03 f 0.24 i 0.08 k 0.08 v 0.03 > 0.03
ot =:a 0.07 h 0.20 r 0.13 t 0.47 > 0.13
os =:a 0.08 e 0.38 i 0.08 l 0.13 t 0.04 w 0.04 > 0.25
or =:a 0.09 b 0.02 e 0.22 d 0.02 g 0.09 i 0.20 m 0.04 o 0.07 n 0.07 s 0.07 z 0.02 > 0.11
Ni =:e 0.08 c 0.42 l 0.08 k 0.33 n 0.08
J =:a 0.26 e 0.16 u 0.16 o 0.36 > 0.06
<Li =:a 0.07 e 0.29 d 0.07 l 0.21 n 0.14 s 0.21
ab =:i 0.29 y 0.07 r 0.29 e 0.29 a 0.07
am =:a 0.11 e 0.17 i 0.11 m 0.11 o 0.06 > 0.44
an =:a 0.05 c 0.04 e 0.06 d 0.09 g 0.01 f 0.01 i 0.08 k 0.03 j 0.01 n 0.20 s 0.04 u 0.04 t 0.04 z 0.04 > 0.28
ar =:a 0.04 c 0.03 b 0.01 e 0.05 d 0.24 g 0.08 i 0.18 k 0.02 m 0.01 l 0.09 o 0.04 s 0.02 r 0.03 t 0.06 y 0.01 > 0.11
au =:k 0.04 s 0.17 t 0.22 l 0.13 d 0.43
at =:a 0.17 e 0.07 i 0.10 h 0.23 j 0.07 m 0.03 o 0.03 r 0.20 t 0.07 > 0.03
u =:a 0.01 c 0.01 b 0.01 e 0.03 d 0.16 g 0.04 f 0.01 i 0.02 h 0.01 k 0.01 l 0.10 n 0.13 s 0.19 r 0.10 t 0.13 z 0.01
tt =:a 0.25 e 0.34 f 0.03 i 0.13 h 0.06 o 0.03 y 0.03 > 0.13
tr =:a 0.48 i 0.33 u 0.15 > 0.04
to =:f 0.13 l 0.06 n 0.25 p 0.13 s 0.06 r 0.25 > 0.13
th =:a 0.29 e 0.18 i 0.11 j 0.03 l 0.03 o 0.03 r 0.05 u 0.03 > 0.26
ti =:a 0.19 c 0.04 > 0.11 l 0.04 n 0.63
te =:f 0.10 l 0.08 n 0.15 p 0.08 r 0.08 > 0.51
ta =:f 0.02 l 0.02 n 0.16 s 0.05 r 0.05 v 0.02 > 0.67
rm =:a 0.38 e 0.15 g 0.08 i 0.15 h 0.08 t 0.15
K =:a 0.46 e 0.08 i 0.04 l 0.08 o 0.10 n 0.02 r 0.13 u 0.04 > 0.04
W =:a 0.31 e 0.06 i 0.41 l 0.03 o 0.16 u 0.03
Si =:m 0.16 b 0.05 e 0.37 l 0.21 g 0.21
v =:i 0.33 a 0.22 e 0.22 o 0.11 > 0.11
a =:> 0.32 c 0.01 b 0.02 e 0.00 d 0.02 f 0.00 i 0.01 h 0.01 k 0.00 j 0.00 m 0.03 l 0.06 o 0.00 n 0.17 q 0.00 s 0.02 r 0.22 u 0.03 t 0.05 w 0.01 v 0.00 y 0.00 x 0.00 z 0.00
<Mi =:c 0.38 r 0.38 l 0.08 k 0.08 n 0.08
<Ma =:d 0.02 i 0.03 h 0.03 j 0.02 l 0.02 n 0.07 r 0.74 t 0.05 x 0.03
L =:a 0.05 e 0.21 i 0.36 o 0.15 u 0.21 y 0.03
El =:e 0.12 f 0.12 i 0.12 k 0.06 m 0.06 l 0.24 s 0.18 v 0.06 z 0.06
Ed =:e 0.27 d 0.09 g 0.09 i 0.18 m 0.09 u 0.09 w 0.18
w =:i 0.39 a 0.17 e 0.13 o 0.04 > 0.26
b =:a 0.08 e 0.46 d 0.04 i 0.16 k 0.02 o 0.02 r 0.12 u 0.04 y 0.04 > 0.02
sti =:a 0.42 n 0.58
lin =:a 0.14 e 0.36 d 0.43 > 0.07
M =:i 0.14 a 0.66 e 0.09 u 0.03 o 0.09
ka =:n 0.07 r 0.20 > 0.73
ke =:h 0.12 r 0.06 > 0.82
Han =:s 0.36 n 0.64
Jo =:a 0.11 c 0.06 h 0.28 l 0.06 n 0.06 s 0.33 z 0.11
Ja =:c 0.15 k 0.08 m 0.08 n 0.46 q 0.08 s 0.08 r 0.08
hil =:i 0.18 d 0.73 o 0.09
ard =:y 0.03 a 0.03 t 0.11 > 0.83
arg =:a 0.55 i 0.18 r 0.18 o 0.09
ari =:a 0.15 e 0.19 k 0.04 j 0.04 o 0.08 n 0.23 s 0.04 u 0.12 t 0.12
arl =:a 0.15 e 0.23 i 0.15 h 0.08 o 0.23 > 0.15
c =:a 0.10 e 0.11 i 0.08 h 0.37 k 0.11 o 0.14 q 0.02 u 0.02 t 0.02 y 0.02 > 0.03
gar =:i 0.09 e 0.36 d 0.36 > 0.18
N =:i 0.52 a 0.26 e 0.04 o 0.17
rit =:a 0.23 h 0.08 z 0.15 t 0.15 > 0.38
ris =:s 0.05 t 0.76 > 0.19
rin =:a 0.57 n 0.07 > 0.36
rie =:t 0.09 > 0.17 l 0.13 d 0.61
ric =:e 0.08 i 0.08 h 0.58 k 0.08 o 0.08 > 0.08
<V =:i 0.25 a 0.19 e 0.31 l 0.06 o 0.19
<W =:a 0.31 e 0.06 i 0.41 l 0.03 o 0.16 u 0.03
<T =:a 0.14 e 0.03 i 0.24 h 0.31 o 0.17 r 0.07 u 0.03
<U =:d 0.07 l 0.43 n 0.07 r 0.14 t 0.21 w 0.07
<R =:a 0.15 e 0.31 i 0.08 o 0.35 u 0.06 y 0.02 > 0.02
<S =:a 0.16 e 0.07 i 0.33 o 0.07 u 0.07 t 0.18 w 0.02 v 0.02 y 0.05 > 0.04
<P =:a 0.44 h 0.11 e 0.28 i 0.17
<F =:a 0.14 r 0.68 e 0.11 l 0.07
<G =:a 0.11 e 0.39 i 0.14 l 0.02 o 0.04 r 0.11 u 0.16 > 0.05
<D =:i 0.37 a 0.17 e 0.11 o 0.29 > 0.06
<E =:c 0.08 b 0.02 d 0.17 g 0.03 i 0.03 h 0.02 k 0.03 m 0.09 l 0.26 n 0.06 s 0.02 r 0.15 u 0.03 w 0.03
<B =:a 0.07 e 0.46 i 0.09 j 0.02 o 0.09 r 0.17 u 0.09 > 0.02
<C =:a 0.27 e 0.02 h 0.34 l 0.14 o 0.18 u 0.02 > 0.02
y =:b 0.03 d 0.05 k 0.03 m 0.03 l 0.08 n 0.05 s 0.13 r 0.03 > 0.59
<O =:s 0.21 r 0.14 t 0.29 l 0.36
<L =:a 0.05 e 0.21 i 0.36 o 0.15 u 0.21 y 0.03
<M =:i 0.14 a 0.66 e 0.09 u 0.03 o 0.09
<J =:a 0.26 e 0.16 u 0.16 o 0.36 > 0.06
<K =:a 0.46 e 0.08 i 0.04 l 0.08 o 0.10 n 0.02 r 0.13 u 0.04 > 0.04
<H =:a 0.33 e 0.50 i 0.09 o 0.02 u 0.05 > 0.01
<I =:b 0.03 d 0.03 g 0.03 m 0.03 l 0.11 n 0.29 s 0.16 r 0.26 w 0.03 v 0.05
d =:a 0.07 e 0.14 d 0.01 g 0.01 i 0.09 h 0.00 j 0.00 m 0.00 o 0.04 r 0.07 u 0.02 t 0.03 w 0.02 y 0.01 > 0.46
O =:s 0.21 r 0.14 t 0.29 l 0.36
al =:b 0.05 e 0.08 d 0.27 f 0.05 i 0.16 k 0.05 l 0.05 p 0.03 t 0.14 v 0.03 > 0.08
ut =:a 0.06 e 0.06 h 0.22 t 0.06 z 0.06 > 0.56
us =:a 0.15 e 0.04 s 0.04 u 0.04 t 0.15 z 0.15 > 0.42
ur =:a 0.15 c 0.08 d 0.08 g 0.23 k 0.15 t 0.15 > 0.15
ul =:a 0.29 i 0.36 f 0.14 l 0.07 > 0.14
un =:d 0.33 h 0.17 o 0.11 n 0.06 t 0.11 > 0.22
ud =:e 0.18 g 0.05 i 0.23 o 0.05 r 0.05 w 0.05 > 0.41
z =:a 0.06 b 0.03 e 0.21 i 0.15 o 0.03 t 0.03 y 0.09 > 0.41
as =:a 0.06 c 0.19 k 0.06 m 0.06 s 0.13 t 0.06 > 0.44
Mar =:a 0.02 c 0.09 e 0.04 g 0.24 i 0.33 k 0.07 l 0.09 t 0.09 y 0.02
e =:> 0.25 a 0.02 c 0.01 b 0.01 e 0.01 d 0.04 g 0.03 f 0.02 i 0.05 h 0.01 k 0.01 j 0.00 m 0.02 l 0.13 o 0.02 n 0.10 p 0.01 s 0.04 r 0.16 u 0.00 t 0.07 w 0.00 x 0.01
Al =:b 0.13 e 0.25 f 0.19 i 0.13 m 0.19 o 0.06 w 0.06
An =:e 0.05 d 0.16 g 0.08 i 0.03 k 0.05 j 0.03 n 0.41 s 0.03 t 0.16
Ar =:i 0.23 m 0.08 t 0.15 n 0.54
ge =:n 0.13 r 0.20 b 0.13 l 0.33 > 0.20
ga =:> 0.26 r 0.58 n 0.16
<Ni =:e 0.08 c 0.42 l 0.08 k 0.33 n 0.08
P =:a 0.44 h 0.11 e 0.28 i 0.17
tra =:m 0.15 u 0.77 > 0.08
<He =:i 0.27 n 0.20 r 0.20 l 0.29 d 0.05
<Ha =:k 0.07 l 0.11 n 0.41 s 0.11 r 0.26 t 0.04
Fr =:a 0.32 i 0.53 e 0.16
f =:a 0.09 e 0.04 g 0.02 f 0.04 i 0.09 h 0.02 o 0.02 r 0.22 > 0.46
ied =:a 0.07 h 0.07 r 0.07 e 0.36 > 0.43
lf =:g 0.05 i 0.05 h 0.05 o 0.05 r 0.20 > 0.60
ld =:a 0.04 e 0.21 t 0.04 > 0.71
le =:e 0.04 f 0.04 i 0.04 m 0.07 o 0.04 n 0.30 s 0.07 x 0.15 > 0.26
la =:d 0.04 f 0.02 i 0.04 h 0.04 o 0.02 n 0.09 s 0.02 r 0.04 u 0.15 w 0.07 v 0.02 > 0.46
lo =:i 0.06 n 0.06 s 0.06 r 0.33 t 0.22 > 0.28
ll =:a 0.23 e 0.14 i 0.27 m 0.09 r 0.05 y 0.18 > 0.05
lm =:a 0.44 i 0.06 u 0.33 > 0.17
li =:a 0.13 c 0.04 b 0.02 e 0.15 k 0.02 l 0.02 n 0.26 p 0.04 s 0.07 u 0.04 t 0.04 v 0.04 x 0.02 > 0.13
< =:A 0.09 C 0.05 B 0.05 E 0.07 D 0.04 G 0.06 F 0.03 I 0.04 H 0.08 K 0.05 J 0.05 M 0.10 L 0.04 O 0.01 N 0.02 P 0.02 S 0.06 R 0.05 U 0.01 T 0.03 W 0.03 V 0.02 Y 0.00 Z 0.00
g =:a 0.22 b 0.02 e 0.18 d 0.01 g 0.01 f 0.01 i 0.12 h 0.02 m 0.02 l 0.02 o 0.09 n 0.01 r 0.08 u 0.04 y 0.01 > 0.12
R =:a 0.15 e 0.31 i 0.08 o 0.35 u 0.06 y 0.02 > 0.02
<In =:a 0.09 k 0.09 e 0.09 g 0.73
ber =:h 0.06 t 0.94
hr =:i 0.94 e 0.06
h =:a 0.35 e 0.16 i 0.13 j 0.01 m 0.04 l 0.01 o 0.03 n 0.01 r 0.11 u 0.01 t 0.01 > 0.15
S =:a 0.16 e 0.07 i 0.33 o 0.07 u 0.07 t 0.18 w 0.02 v 0.02 y 0.05 > 0.04
ett =:i 0.17 y 0.08 e 0.50 a 0.08 > 0.17
ch =:a 0.26 i 0.13 e 0.17 t 0.09 > 0.35
<Jo =:a 0.11 c 0.06 h 0.28 l 0.06 n 0.06 s 0.33 z 0.11
Hel =:e 0.17 m 0.33 l 0.25 g 0.25
Hei =:k 0.18 d 0.45 n 0.36
ad =:i 0.31 a 0.08 j 0.08 e 0.15 > 0.38
Ger =:a 0.11 d 0.11 h 0.17 l 0.11 o 0.11 n 0.06 r 0.06 t 0.28
Ann =:a 0.07 i 0.07 e 0.80 y 0.07
<Di =:a 0.15 m 0.08 r 0.08 e 0.62 t 0.08
i =:> 0.06 a 0.09 c 0.06 b 0.01 e 0.14 d 0.03 g 0.02 f 0.00 k 0.05 j 0.00 m 0.04 l 0.09 o 0.02 n 0.19 p 0.01 s 0.08 r 0.03 u 0.01 t 0.07 v 0.00 x 0.00 z 0.00
Be =:a 0.19 k 0.05 r 0.48 t 0.14 n 0.14
T =:a 0.14 e 0.03 i 0.24 h 0.31 o 0.17 r 0.07 u 0.03
hi =:a 0.11 e 0.05 m 0.21 l 0.58 n 0.05
ha =:e 0.04 i 0.02 m 0.08 n 0.12 r 0.58 t 0.02 > 0.15
he =:a 0.08 e 0.04 i 0.08 k 0.04 l 0.25 o 0.08 n 0.13 r 0.25 > 0.04
j =:a 0.52 c 0.05 e 0.10 o 0.10 > 0.24
ine =:r 0.14 > 0.86
ina =:n 0.04 l 0.04 > 0.92
U =:d 0.07 l 0.43 n 0.07 r 0.14 t 0.21 w 0.07
<Ka =:d 0.05 i 0.05 m 0.05 r 0.41 t 0.32 y 0.05 z 0.09

 User Rating: 1015    Report this Post to a Moderator | Link

quote:
Original post by Anonymous Poster
It's not that I think grammars are not useful for this kind of stuff, but they do only half the work. You still have to hack in the grammar itself. I think it is easier to produce such a grammar from text. And once you generate the grammar from text, it's also much easier to use the probabilites directly, instead of translating them to a grammar.
I have arrived at an algorithm that can spit out words from any text, which are realy sounding like words from the language the source text was written in. It basicaly involves analyzing which letter follows a sequence of other letters, and then storing the occurences by which this happens.

example?
None of the following names are in the original text, nor are they only part of names. They are all unique. A python script first analyzes the source-text ( about 2000 german names ) and then produces these 100 names, in under a second.

Sentin Knuta Thilde Greta Angelbert Gerlis Mariana Herberhilde Leonorena Annifer Gudrunhild Andra Mirose Carlotte Kornelix Terenedi Judia Wilheiderike Anettilie Lariusz Wolfredy Flore Albrechthias Carose Mirosemar Ernstafa Ullahattina Lisabether Sebastino Eckartmut Jochelm Bertus Margretel Josel Musten Adeltrude Eckehardt Veresela Otfriedrik Denno Sopher Dietlef Artus Liancy Julina Elisel Rosalinda Margios Lothard Astav Gustel Annemarios Martmut Silvatorin Walbert Kristinos Josemal Bertraudi Hendrzej Heinharritas Berneli Helgarena Ingeburgarian Patrix Johammad Heidemal Felika Kristantin Guntraud Edgaret Erichel Liesela Ansgarek Wladimil Lillene Nicolaud Harritz Mirosef Mathilo Catrich Luciechthilip Fraudia Herrem Kristof Wanderoslaw Almar Engeborge Gabried Artha Ortwig Normanna Karlo Dennislaw Antjepan Hannifer Corianet Siegbertur Thomann Carleskardt Aydine Manusz Martharline Kadimil Denie Carold Cemar Herberhard Sebastin Hardt Ginan Mirosela Norbertraut Rosef Annegreta Galieslaw Eliselorinand Waltherenfriel Eckhardy Herwin Muharta Silvira Dorena Alfonstafa Erolinde Ingrit Verolind Vladine Karoslaw Dimir Lisberta Edele Nancesco Harre Irmtraute Tonio Gabrieder Iristanta Heikolantin Ernar Felitta Bodor Annelina Rolas Birgios Manut Herberto Kamine Giescha Ullah Wernold Verald Coristof Ullrik Raimunda Gernharina Herberthur Ortraute Ayselore Erharlinald Beatrinandy Ferda Coritz Marena Eckartwin Ismannina Emmad Verosemar Otmara Piotram Anjam Urseloreen Siegberthur Carosel Hannetty Alfonsta Reimarios Gabian Haliese Cathris Heribertraudi Carolina Ericita Zbignieleinhard Stantin Angelore Gesia Jerzysztof Adolfgan Sentine Galiesbeth Ulricha Rotraut Reinrich Januelie Chard Tille Abdula Franko Katas Henrichen





I guess my generative phonetic approach was more directed at emulating real linguistic behaviour for the task at hand. I'm sure you can get realistic and proper results using statistical models, but frankly that's not how language works.
And you're also using a long list as input. That causes two 'problems':
(1) the names are too real (why not just randomly pick from a real list)
(2) it's inefficient (at least on a theoretical level).

But I agree, you approach produced some impressive real/realistic names.

That being said I'd like to see computer games make use of language and linguistic technology. Speech rec, TTS and robust grammar parsing here we go.

 User Rating: 1015    Report this Post to a Moderator | Link

quote:
Original post by Anonymous Poster

A: I'm sure you can get realistic and proper results using statistical models, but frankly that's not how language works.

B:And you're also using a long list as input. That causes two 'problems':
(1) the names are too real (why not just randomly pick from a real list)
(2) it's inefficient (at least on a theoretical level).



A: I see no difference in a well written gramar-based prudction system and the stuff I do. There is a slight difference in that I only append a letter at a time and make a look back for the biggest matching sequence, but I'm sure this can be translated to a EBNF grammar, I just don't see a need for it.

B1: The names are as real as the input you give the statistics. The system can produce vastly more words then are in the original text.
B2: I see three bottlenecks in such a system.
Calculation of the statistics
To much data to ensure fast processing
A lot of space is used up by storing the statistics
But for each of these problems there is a solution.
Calculation of the statistics can take place once, after that you've a functioning set of rules, which you could stream to file and read it in again.
If you find you've to much data to produce names fast, then throw some of it away! In fact I've build a method that scans the tables for symbols and followups that are low in apearence and deletes this data. And I found it to improove the quality of the names, which is odd considering that I destroy information.
This is also adressed by the former method of eliminating rarely used combinations. Also it helps greatly to implement it with associative arrays.

To give an example. For a markovian factor of 3 the stream,out of the data occupies 50kb of memory with somewhat over 3000 symbols in the hash-table. After I clean up a bit, the length has reduced to about 3kb, the names sound better, and it has about 200 relevant symbols stored.

Having this out of the way, I'd loove to produce nice looking grammars from my statistics, and I've already written a system that scans such gramamrs and produces names from it.

And thanks for the compliment about the impressive result. I do also look forward to speech stuff making it's way into games. Specialy recognition and synthesis are going to be intresting, since they give us a way to circumvent the vast bandwith demands of voIP.

 User Rating: 1015    Report this Post to a Moderator | Link
Page:   1 2 »»
All times are ET (US)

Post Reply
 Last Thread Next Thread 
Forum Rules:
You may not post new threads
You may post replies
You may not edit your posts
You may not use HTML in your posts
Jump To:
Administrative Options: