Evolution of Self-Reproducing Scripts

Started by
11 comments, last by WeirdoFu 18 years, 1 month ago
I've been experimenting for a while with self-reproducing, evolving scripts. I wrote the interpreter in C++, and the scripts are similar to basic with a command on each line, and a maximum of 4 variables (a, b, c, d). They have for-loops, mathematical ops, a data line (which can be read into a variable), etc. Several interpreters (usually 6) run at the same time, each randomly selects a script file (they are named "#.txt" from "-100.txt" to "100.txt"). The scripts reproduce themselves line by line (the current "breed" will copy itself to 6 new scripts). Once the running script terminates, it is deleted, and that interpreter will randomly pick another script. You can see I didn't include "fitness" function to evaluate which scripts deserve to live or die. Instead, the interpreter indiscriminantly deletes every script after it terminates (after its "lifespan" is over). I thought this was more "natural", since in biological evolution "natural selection" is an emergent characteristic of competition over limited resources (in this case, the file range from -100.txt to 100.txt). The ones that reproduce faster and with less mutation will win over the others... this means a balance between the number of offspring (affecting reproduction rate) and the file size (affecting likeliness to mutate and the reproduction rate). But... so far, there have been not a lot of interesting results. I think that since the way the scripts reproduce is by using a "copy_line" command with occassional mutations by the interpreter, that the reproduction is "artificial". But I can't think of any other way to have them reproduce... so I was wondering, is there any other model for how a script might reproduce itself? How might mutation work? How might I make the simulation more interesting? In biological evolution the "natural selection" function is itself determined by the things it is operating on, the parts develop relations like symbiosis and parasitism, and "layers" of relation/interaction develop (the ecosystem, the species, the population, the family, etc). This is the sort of complex, emergent behaviour I want to capture in my programming experiments... but so far a long way to go.
Advertisement
Uhm... what are you actually after? I mean, evolution is about "survival of the fittest", and how can you have that without something determining fitness? There are two normal ways i've heard of in genetic programming. The first one being to identify "good" candidates (or even just one "winner" candidate) and produce some random variations of that one as its "offspring". The second one is to identify "good" candidates and mix them together to produce "offspring".

Still, to get any kind of results you have to evaluate them and alter reproductions after their performance. There is no evolution in a world where everything survives.
Yeah... look - like I already said, after the script finishes, it is deleted, and there is only a limited amount of files (from -100.txt to 100.txt) that will be executed.

So, the scripts are competing for this limited number of files (the more files a script writes itself to, the greater the chances of it running and reproducing again).

But I don't explicity SELECT which ones will live or die - I leave that up to the "system". This is all part of the idea to create emergent behaviour in the system.

This is the way I see it:
Evolution is part diversification and part filtration.
New varieties are added to the system through "diversification" (mutation and interbreeding).
The least "successful" (whatever that means) varieties are "filtered" (through competition over limited resources or being preyed upon by predators).
A different set of criteria for success applies to each niche/variety (not all animals need to be fast, or lean, or muscley, etc).
The criteria for success is itself determined by the things it operates on.

The main point is this: there is no "fitness" function that applies to every animal. The "fitness" function is an emergent property.
polyfrag, it's not clear from your brief description exactly what is occurring with the scripts. All I can make out from what you have written is that 6 interpreters each open a file, run the script within and then delete the file. There was some mention of scripts reproducing themselves line by line, but that is not explained. Where do they copy themselves to? How do they do this? How is this reproduction triggered? Does it happen only if a certain script command make it happen (so a script that finds the beneficial mutation to include this command will survive)?

If you want 'natural selection' then you need selection pressure. There must be something in the environment that favours one script over another.

Personally, for reproduction, I'd consider sampling lines from two or more scripts to form an offspring script. You'll need to be smart about how you do this so that you don't end up with infinitely long scripts, or scripts of zero length. The easiest way is to enfore a constant script length, but that may not be desirable.

As for death, I see nothing wrong with deleting scripts once they're run, provided that they had a means of attempting reproduction during their run cycle.

Cheers,

Timkin
Argh!... okay, I'll try explaining this again.

The point of these experiments is to create a complex, emergent simulation of life. It's just for fun, so there's no algorithm or solution I'm trying to get out of it. I don't want the evolution to go in a straight line to an ideal state, but I want it to branch out to form niches and create unexpected results.


This is how the system runs:

I run 6 evol.exe's that will interpret and bring life to the scripts. Each one one randomly selects a file from the range -100.txt to 100.txt (this is where the scripts are written).

Each script tells the interpreter to reproduce it to another file within the -100 to 100 range. Here's an example script from an older experiment:

increment_var	c	#	#	#increment_var	c	#	#	#skip_lines	c	#	#	#data		15	#	#	#data		2	#	#	#increment_var	c	#	#	#increment_var	c	#	#	#read_var	d	c	a	#increment_var	c	#	#	#read_var	d	c	b	#increment_var	b	#	#	#write_var	d	c	b	#reset_var	c	#	#	#forloop		a	d	#	#copy_line	c	d	b	d


This script details how to make one copy of it. After the script is finished, the interpreter deletes it, selects another random script, and runs it.

The above is the smallest self-reproducing script (15 lines). There's also a script that makes 6 copies of itself and is 31 lines.

There's a limited number of files they can write themselves to, so a rough indicator of success/fitness at any time is how many files the script has written itself to.

So what factors favour either script over the other? Let's compare how the 15-line once-reproducing script compares with the 31-line six-times-reproducing script.

Reproduction rate of 15-liner:
1 offspring per 15 lines = 0.06666... reproduction rate

Reproduction rate of 31-liner:
6 offspring per 31 lines = 0.193548... reproduction rate

Each instruction-line takes time, so, according to reproduction rate alone, the 15-liner will out-reproduce the 31-liner. But there's also the fact that greater complexity means greater chance of mutation. Let's compare mutation rates:

Probability of mutation is around 1 in every 100 lines.

Mutation rate of 15-liner:
15 lines per 100 lines per mutation = (15 lines) / (100 lines / 1 mutation) = 0.15

Mutation rate of 31-liner:
31 lines per 100 lines per mutation = (31 lines)/ (100 lines / 1 mutation) = 0.31

So, according to mutation rate, the 15-liner will have a mutated offspring 1/6 of the time, and the 31-liner will have a mutated offspring 1/3 of the time.
So mutation is twice as likely for a 31-liner for any single offspring. But the 31-liner reproduces itself 6 times per run, so:

(6 offspring per parent) * (1/6 mutants per offspring) = 1 mutant

Every 31-liner should have about 1 mutate from its 6 offspring. Assuming mutation is always fatal to the offspring (destroying its ability to reproduce), we can adjust reproduction rate to account for mutations:

Reproduction rate adjusted with mutation rate for 15-liner:
(1 offspring per 15 lines) - (0.15 mutants per 15 lines) = 0.05666... reproduction rate

Reproduction rate adjusted with mutation rate for 31-liner:
(6 offspring per 31 lines) - (1 mutant per 31 lines) = 0.16129... reproduction rate...

But there's also the fact that not all mutations are fatal. It's not going to make any difference in this case, but there are 20-liners, 40-liners, 50-liners, etc, that have 3 offspring, 7 offspring, 8 offspring, etc., and it would be important in those cases.

The point is that there are selective pressures that determine which script-species will live or go extinct, and they didn't require me to define a fitness function. Instead the fitness function is an emergent property of the environmental conditions.
So what are your actual results -- you only mentioned that they "weren't interesting". What sort of results are you actually looking for, and what's missing on the result side of things?

The mutation rate is interesting - isn't it far too low to allow for some evolution of interesting features with scripts?
How are you mutating the scripts? One thing you might want to try to get more interesting results would be something like using a fixed-width integer to describe each command, then up the mutation rate, and have each mutation change only 1 bit in one dword (or maybe have a 50% chance after each mutation that mutation is finished). Some commands, such as 'increment_var', only have a single data item, so there is a good chance that a mutation will not have any affect. On the other hand, 'copy_line' uses all 4 variable slots, so any bit changing in such a line will result in a different command. This would make selection favor redundancy for large commands while preferring to use short commands in general.

For example, say there are 8 to 15 different instructions. That means 4 bits are needed to indicate which instruction is present. If there are only 4 possible values for any slot that takes a variable, and a function can take at most 4 variables, then that means 8 bits are needed to store 'parameters' making the total word size 12 bits.

A 'data' command might be represented as '0000' followed by an 8-bit number, so 'data 16' becomes '0000 0001 0110'. If 'increment_var' is '1000', then that data line might become 'increment_var a' if the first bit is toggled to a '1'. Later on, a bit near the end is changed, but 'increment_var' only takes a single parameter so it doesn't matter. Finally, the instruction can change back into a data command, but the value is changed and it is '0000 0001 1110' which is 'data 30' instead.

If you're doing something majorly different from this for the way you're mutating things, that might be your problem. Otherwise, the problem is likely your selection of instructions to implement.
"Walk not the trodden path, for it has borne it's burden." -John, Flying Monk
My response is basically the same as Stary's: what do you expect to see?

If what you're seeing isn't what you expect, then either your expectations are wrong or your implementation is wrong. Don't discount the former as more often than not in this sort of experimentation, it is the cause of your problems!

Cheers,

Timkin
The most interesting mutation was one where they started writing to negative file range (-1 to -100.txt). I found nothing else noteworthy. But the next experiment will be better.

I've already taken the things Extrarius mentioned into account, and the problem with this script-language was lack of robust structure and mutation. The new script-language (already finished and tested the interterpreter for it) is based on "Nodes" and "Bonds".

Basically, entire scripts are in the form of webs of inter-connected blocks (with loop-backs and cross-overs and other complex relations). Every type of building-blocks/nodes/atoms/parts/cells (whatever you want to call them) has rules for how other nodes can bond/link/connect to it, and how it can connect to others (like "reactivity capacity" of atoms, or the bumps on Lego blocks).

This way, it's possible to take a bunch of parts, mix them up, and come up with all the possible algorithms they will form together. This is the same way mutation occurs in living cells on the molecular level.

In the script-language, there are three categories of nodes: data/vars, math ops, flow-control structures, and (the most important) STRUCTURE-CONTROL STRUCTURES! These allow the script to actually modify the connections and parts inside themselves or others, so it can attack competing scripts, repair itself, communicate, upgrade itself, assimilate other scripts into its interal structure, and reproduce.

Structure-control nodes basically function as a spider - that is, 'hopping' along nodes, switching links between nodes, forming new links, destroying existing ones, scanning information about structures near it, and relaying the information back to its "parent" nodes. They follow simple rules but can be connected to create complex behaviour... like I already mentioned, communication, repairing, upgrading, assimilating, reproducing, etc.

All this activity uses up resources, so I thought up of a mini-physical/economical/ecological model for this world/system. Bonds and nodes use up matter/minerals/space/proteins/nutrients, which must obtained by "digesting" other script's nodes. Every instruction run by the interpreter uses up energy/life-flow, which must be obtained from destruction of matter (sort of like the law E=MC^2, that energy can be released directly from matter). But I'm still designing that aspect of the system...

I have some schematic diagrams of the script architecture and a working program, but no time to show it all right now, so I will post it later.
Are you actually running these things as parallel processes? The nondeterminism introduced by the operating system's scheduling would prevent you from being able to deterministically repeat a trial.

This topic is closed to new replies.

Advertisement