Biological Programming

Started by
17 comments, last by Sagar_Indurkhya 18 years, 1 month ago
Quote:Original post by Shannon Barber
Are you trying to perform computations using the cell or trying to write a "DNA compiler" that allows you code the organism characteristics in a "higher level" langauge than nucleotide sequences?


I am trying to "code the organism characteristics in a "higher level" langauge than nucleotide sequences?"
Advertisement
Most people already "program" DNA at higher level than sequence, by splicing together cassettes containing either genes or regulatory sequences.

I am interested to know what sort of input you would type into your compiler. Do you picture entering:

print("Hello World!");

and having the E. coli spell out "Hello World!" in lights on the petrie dish?

Assuming that E. coli is the processor and DNA is the program, I think it would help to define more clearly what is input and what is output.



Quote:However, I am confused as to how I would start thinking about the programming language aspect.

Start with the simplest encoding and make abstractions when they feel natural.
Free Mac Mini (I know, I'm a tool)
Quote:Original post by Nypyren
(About the fact that DNA is hard to compress).


Any random sequence of data is hard to compress, and this is the case for DNA. You'd have the same trouble compressing an average sequence of coin flips.

Quote:
That got me thinking... the DNA is data. What operates on the DNA? So it's kind of like a Turing machine model - the DNA is turing tape, and all the entities that read/write DNA are Turing machines.


The only operation applied to DNA (in eucaryotes) is replication: either into more DNA, or into RNA. The RNA is then caught by ribosomes which, through a process similar to replication, serves as a template for proteins.

Quote:
- Since DNA isn't an active entity (it's just data), does that mean that comparing DNA between different species might be missing the possibility that whatever is reading the DNA might be performing totally different operations?!


They're not missing it. There are some differences in the actual sequence-aminoacid associations. However, most associations are the same for all species.

Quote:
- DNA is susceptible to mutations and other things changing the sequence. But what about the other parts of the cell that are acting on the DNA?


These parts are coded for by the DNA, so they will change as the DNA mutates.

Quote:- How do you find out all of the rules that govern how DNA is read? These interactions occur at the molecular scale. How do you scientifically record and analyze a (human?) cell in normal operation?


Microscopy. However, a lot of work is done by killing the cell and isolating the important parts (DNA, proteins, ribosomes).

OK, well, since I haven't been blasted for being too far out of my depth, and seeing that you're the same age as me, let me fill you in on what my AP Bio textbook has to say about DNA expression (a field that is definitely not fully understood):

There are four types of codons for DNA, Adenine, Thymine, Glycine, and Cytosine. Abbreviations are A, T, G and G codons, respectively. As you probably know, DNA is a double helix, composed of two complementary sets of DNA, with A pairing with T and G with C. Therefore, a sequence that looks like this:

ACCTAGGAC would be paired with this:
TGGATCCTG

Now, if you want to think about the most basic unit of gene expression, that would be a triplet of codons. DNA is read in threes, so the above would code for three amino acids (which make up the proteins coded for by DNA), like this:

ACC TAG GAC

If a single codon, say a G, were inserted as, say, the third codon, it would be read like this instead:

ACG CTA GGA C

As you can see, this will end up coding for completely different amino acids than it did before, and won't make the right protein at all.

Through the processes of transcription and translation (in prokaryotic cells, like E. Coli, things work a bit differently), these genes are eventually turned into an RNA sequence that is the complement of the section of DNA copied, except for replacing Thymine with Uracil. Only one side of the DNA sequence is expressed. For the first above sequence, if this sequence was to be expressed:

ACCTAGGAC the complementary RNA (mRNA) would be:
UGGAUCCUG

That's transcription. Translation follows, when the mRNA sequence is "translated" into a protein. A big complex is formed which reads each set of triplets and turns it into the appropriate amino acid. There are, obviously, up to 4^3 possible kinds of amino acids that can be expressed, although there are actually only twenty. Most amino acids are coded for by more than one set of triplets.

So in this complex, the RNA is scanned and amino acids are added to the forming protein as coded for by the triplets on the RNA. Eventually, the new protein is complete and takes off to do its duty, which is dependent on is molecular makeup and its shape and structure.


There's lots more to it; the above is a simplification. Also, the process is definitely not fully understood. The issue I see here with your idea is that, unlike with a computer, we don't fully understand the fundamentals of how the process works. You can't really build a house if there isn't a foundation. If you do figure out all of these basics, you're probably in for a Nobel prize.

I would definitely encourage you to look into this further - let me know what you find out. I just don't think you're going to be looking at a similar paradigm to what computer programming is looking at at all. This is all off the top of my head, so you'll definitely want to do a *lot* more research if you're truly interested in this.
my siteGenius is 1% inspiration and 99% perspiration
Quote:Original post by silverphyre673...Biology...


I know, I've been reading through Reese and Cambell's for a couple weeks. The problem is how to think of this on an abstract level.

Question: In a functional based programming language, we just call a function to do something at any time. In a cell, we can't do that. Everything is created when the cell is created, and activations are controled by chemical reactions. Where is the jump? The leap in thought I might say.
Quote:Original post by Sagar_Indurkhya
Quote:Original post by silverphyre673...Biology...


I know, I've been reading through Reese and Cambell's for a couple weeks. The problem is how to think of this on an abstract level.

Question: In a functional based programming language, we just call a function to do something at any time. In a cell, we can't do that. Everything is created when the cell is created, and activations are controled by chemical reactions. Where is the jump? The leap in thought I might say.


The first thing that comes to mind is a macro system with first-class macros (basically a compile-time functional language). Not sure if that would be useful, though.
A correction is in order first: A codon is the term for a triplet of nucleotide bases. The nucleotides that make up a gene are A, T, G, and C (again, replacing T with U in RNA). Adenine, Thymine, Glycine and Cytosine are not codons, but TTT, GAC, ATG, etc. are. Sorry.

I think the most fundamental issue is to abstract the functions of proteins in the cellular system. You would have to be able to understand the effects of the amino acids making it up, as well as its shape and structure, on the actual cell. This would be really tough; I'm sure you could figure out a few general rules, but you must remember that the cell is a much more complex device than a computer, and that the "programs" (coding for proteins) that you develop for it can actually kill it.

The thing is that we mostly know how a protein is coded for (as I described above). However, we don't know everything - fortunately for you, prokaryotes are much simpler than eukaryotes. In eukaryotes, about 97% of the base-pair codons in their genomes don't code for proteins, and we don't really understand what they do.

I'm sure you could pretty easily come up with a programming language that codes for adding the amino acids - the only instruction is adding an amino acid, and this instruction just needs to take one argument, for the type of amino acid to add. Remember, though, that since multiple triplets of codons code for the same amino acid - in mRNA, UUU and UUC (or TTT and TTC, the equivalent in DNA) both code for the amino acid phenylalanine. The start codon, AUG, marks the beginning of a gene and also codes for methionine. This nucleic acid is often removed after the protein is formed. There are three stop codons, UAA, UAG and UGA, all of which code for the end of a gene.

So I suppose I could think of three instructions, one for starting a gene, one for adding a nucleotide within a gene, and one for ending a gene. All of these really involve adding a nucleotide sequence, but this is one level of abstraction, albeit a minor one. You might do something like this:


START_GENE
ADD_ACID Phenylalanine
ADD_ACID Tryptophan
ADD_ACID Valine
END_GENE


This would translate to a genetic sequence of

ATGTTTTGGGTTTAG

It could also translate to this, which would theoretically result in the same protein:

ATGTTGTGGGTGTGA

Because you'd need to figure out how the basics of DNA work, and whether your rules for abstraction really work, you would need to fully automate the process of coding the protein, adding it to the cell, and then seeing how the gene was expressed.

The most people have really been doing with this lately has been taking genes from one organism and adding it to another. A couple examples that I can think of that have been done include putting a gene from fireflies into tobacco and making the plant glow, making plants glow when they need watering by adding a gene from a fluorescent jellyfish, and adding a gene that codes for insulin production to bacteria and producing the substance that way.

We haven't really been creating our own genes yet because we don't understand most of what goes on "under the hood." It's really, really complex, and depends both on how DNA expression works in general and on the intricacies of how the organism in question works (of course, the makeup of the organism is coded for by DNA too, so it all comes down to how genes interact with other genes).

If you do figure this out, please make sure to mention me when you get the Nobel prize [grin]. I'd definitely talk to a professor about this. Good luck.
my siteGenius is 1% inspiration and 99% perspiration
Quote:Original post by silverphyre673
A correction is in order first: A codon is the term for a triplet of nucleotide bases. The nucleotides that make up a gene are A, T, G, and C (again, replacing T with U in RNA). Adenine, Thymine, Glycine and Cytosine are not codons, but TTT, GAC, ATG, etc. are. Sorry.

I think the most fundamental issue is to abstract the functions of proteins in the cellular system. You would have to be able to understand the effects of the amino acids making it up, as well as its shape and structure, on the actual cell. This would be really tough; I'm sure you could figure out a few general rules, but you must remember that the cell is a much more complex device than a computer, and that the "programs" (coding for proteins) that you develop for it can actually kill it.

The thing is that we mostly know how a protein is coded for (as I described above). However, we don't know everything - fortunately for you, prokaryotes are much simpler than eukaryotes. In eukaryotes, about 97% of the base-pair codons in their genomes don't code for proteins, and we don't really understand what they do.

I'm sure you could pretty easily come up with a programming language that codes for adding the amino acids - the only instruction is adding an amino acid, and this instruction just needs to take one argument, for the type of amino acid to add. Remember, though, that since multiple triplets of codons code for the same amino acid - in mRNA, UUU and UUC (or TTT and TTC, the equivalent in DNA) both code for the amino acid phenylalanine. The start codon, AUG, marks the beginning of a gene and also codes for methionine. This nucleic acid is often removed after the protein is formed. There are three stop codons, UAA, UAG and UGA, all of which code for the end of a gene.

So I suppose I could think of three instructions, one for starting a gene, one for adding a nucleotide within a gene, and one for ending a gene. All of these really involve adding a nucleotide sequence, but this is one level of abstraction, albeit a minor one. You might do something like this:


START_GENE
ADD_ACID Phenylalanine
ADD_ACID Tryptophan
ADD_ACID Valine
END_GENE


This would translate to a genetic sequence of

ATGTTTTGGGTTTAG

It could also translate to this, which would theoretically result in the same protein:

ATGTTGTGGGTGTGA

Because you'd need to figure out how the basics of DNA work, and whether your rules for abstraction really work, you would need to fully automate the process of coding the protein, adding it to the cell, and then seeing how the gene was expressed.

The most people have really been doing with this lately has been taking genes from one organism and adding it to another. A couple examples that I can think of that have been done include putting a gene from fireflies into tobacco and making the plant glow, making plants glow when they need watering by adding a gene from a fluorescent jellyfish, and adding a gene that codes for insulin production to bacteria and producing the substance that way.

We haven't really been creating our own genes yet because we don't understand most of what goes on "under the hood." It's really, really complex, and depends both on how DNA expression works in general and on the intricacies of how the organism in question works (of course, the makeup of the organism is coded for by DNA too, so it all comes down to how genes interact with other genes).

If you do figure this out, please make sure to mention me when you get the Nobel prize [grin]. I'd definitely talk to a professor about this. Good luck.


Thanks, I've actually decided to base the language around membrane-receptor logic gate structures. Yes, I am indeed getting ready to discuss this with many professors. Thanks again!

This topic is closed to new replies.

Advertisement