show and tell: a new Java library for procedural random text generation

Started by
5 comments, last by GNPA 4 years, 8 months ago

Hey all.  I've created a Java library that offers a couple of different algorithms for generating random text (e.g. to generate character names or place names for an adventure game).  It's pretty well documented, covered by tests, scanned for code quality, and available via Maven Central -- overkill for this kind of project, I know, but this is really a project I conceived of while training myself in the Java language (having worked for the past several years in Python and JS).  I welcome feedback and contributions!

The library and docs: https://github.com/joeclark-phd/random-text-generators

A repo that uses the library to generate lots of example text: https://github.com/joeclark-phd/procedural-generation-examples

The main algorithm of interest is one that uses Markov chains to learn from a training dataset and produce new data with similar character sequences.  Depending on the training data, this can do a good job of making original new text that sounds like it belongs in the original.  For example, I trained it on a database of about 1300 ancient roman names and here are some examples of the random text output:

caelis           domidus          pilianus        naso             recunobaro  
potiti           cerius           petrentius      herenialio       caelius     
venatius         octovergilio     favenaeus       surus            wasyllvianus
nentius          soceanus         lucia           eulo             atric       
caranoratus      melus            sily            fulcherialio     dula        
Advertisement

Nice, I will try it out for my project. May take a while for feedback, though, as I am not in a stage where I would need it currently but this could get handy for a future build. I am trying to wrap my head around how this text stream would look like. Do I just feed it with my language files?

9 minutes ago, joeclark77 said:

It's pretty well documented, covered by tests, scanned for code quality, and available via Maven Central -- overkill for this kind of project, I know, but this is really a project I conceived of while training myself in the Java language

This is definitly not overkill in the "Plug-and-Play" Java world but rather normality, not to say "expected" in the Java community. Gosh, this is why I love this language so much. In C++ you get some project files and an outdated manual for how to compile that bullsh** for some random Linux distro. :D 

The stream is a Java 8 stream, really just a new kind of iterable. The way I do it in the example repo is just read text from a file with one text string (name) per line, then convert the file to a Stream with a one-liner.

Yes, that's clear. Iterables with lambda support. I was speaking about practical application of your library. I.e. I don't see a word in your docs that trainings can be saved. So one would need to train the library at every start and then the question comes up where to take the training sets from. I was kind of thinking out loud, that one could take the language files already packed with every program. Or you would have to pack some training texts but would probably run into the problem, that these random texts might not give good training results but you would have no idea because you don't speak that particular language.

In your example, to generate Viking names, this is no problem of course.

I wonder if it would work with Chinese characters as well.

Oh btw. is it intentionally that it says "single-s e x dataset" in your docs (without the spaces)?

In my own game I had kind of planned to train a few text generators for different nations, from static files bundled with the game (therefore moddable), every time the game started up. They would stick around in memory to be reused every time a name was needed. But you're right, serialization is probably a good idea. 

And yes, the README does use the precise and correct terminology...

Thanks for sharing.

This topic is closed to new replies.

Advertisement