Jump to content
  • Advertisement
joeclark77

show and tell: a new Java library for procedural random text generation

Recommended Posts

Posted (edited)

Hey all.  I've created a Java library that offers a couple of different algorithms for generating random text (e.g. to generate character names or place names for an adventure game).  It's pretty well documented, covered by tests, scanned for code quality, and available via Maven Central -- overkill for this kind of project, I know, but this is really a project I conceived of while training myself in the Java language (having worked for the past several years in Python and JS).  I welcome feedback and contributions!

The library and docs: https://github.com/joeclark-phd/random-text-generators

A repo that uses the library to generate lots of example text: https://github.com/joeclark-phd/procedural-generation-examples

The main algorithm of interest is one that uses Markov chains to learn from a training dataset and produce new data with similar character sequences.  Depending on the training data, this can do a good job of making original new text that sounds like it belongs in the original.  For example, I trained it on a database of about 1300 ancient roman names and here are some examples of the random text output:

caelis           domidus          pilianus        naso             recunobaro  
potiti           cerius           petrentius      herenialio       caelius     
venatius         octovergilio     favenaeus       surus            wasyllvianus
nentius          soceanus         lucia           eulo             atric       
caranoratus      melus            sily            fulcherialio     dula        
Edited by joeclark77

Share this post


Link to post
Share on other sites
Advertisement

Nice, I will try it out for my project. May take a while for feedback, though, as I am not in a stage where I would need it currently but this could get handy for a future build. I am trying to wrap my head around how this text stream would look like. Do I just feed it with my language files?

9 minutes ago, joeclark77 said:

It's pretty well documented, covered by tests, scanned for code quality, and available via Maven Central -- overkill for this kind of project, I know, but this is really a project I conceived of while training myself in the Java language

This is definitly not overkill in the "Plug-and-Play" Java world but rather normality, not to say "expected" in the Java community. Gosh, this is why I love this language so much. In C++ you get some project files and an outdated manual for how to compile that bullsh** for some random Linux distro. :D 

Share this post


Link to post
Share on other sites

The stream is a Java 8 stream, really just a new kind of iterable. The way I do it in the example repo is just read text from a file with one text string (name) per line, then convert the file to a Stream with a one-liner.

Share this post


Link to post
Share on other sites

Yes, that's clear. Iterables with lambda support. I was speaking about practical application of your library. I.e. I don't see a word in your docs that trainings can be saved. So one would need to train the library at every start and then the question comes up where to take the training sets from. I was kind of thinking out loud, that one could take the language files already packed with every program. Or you would have to pack some training texts but would probably run into the problem, that these random texts might not give good training results but you would have no idea because you don't speak that particular language.

In your example, to generate Viking names, this is no problem of course.

I wonder if it would work with Chinese characters as well.

Oh btw. is it intentionally that it says "single-s e x dataset" in your docs (without the spaces)?

Share this post


Link to post
Share on other sites

In my own game I had kind of planned to train a few text generators for different nations, from static files bundled with the game (therefore moddable), every time the game started up. They would stick around in memory to be reused every time a name was needed. But you're right, serialization is probably a good idea. 

And yes, the README does use the precise and correct terminology...

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!