4/14/2013 mid-Atlantic
I try to find analogy for exploring genome as a program.
Like any other analogy this one is as good as it is bad.
Imagine highly advanced non-biological alien race, say "Crystal entity", or The Crystallites came to the Earth unnoticed to explore local life.
They discovered that live on Earth appeared approximately 2.2e21 picoseconds ago (about 70 of our years) though paleological researchers discovered very early sings of life as far as 4.7e21 ps ago. The Crystallites scientists still argue if this mechanical garbage (known to us as Babbage garbage) is possible to consider a life, especially because it never really came to life.
The evolution of the life came long way starting from elementary monster-like electronic bulb species with just a few codes in their instruction set to sophisticated all-in-one creatures that formed current biosphere on Earth (internet in our terms).
Despite of all wide differences in size, shape, color, architecture, and technology they had one thing in common: all signals, commands and whatever, was organized in bytes of 8 bits. Though there were some died-out branches of the evolution that used 3-state bits or 6-bit bytes, they ... died out in the fight for survival. Thus 8-bit byte is the minimal unit of any physical signal, command or data. Sure that smaller portion of a byte could have some meaning, say half-byte (hex) or even 1 bit (binary), but minimal stored/processed unit is still the byte. The consequtive bytes can be combined into larger but still elementary units by 2, 4, 8, even 16 bytes (what we call "a word"). The evolution of life knows attempts to extend a word with extra byte (tag for describing the word and for error tolerance) or even 1-2 extra bits to distinguish command from data and for error checking. But looks like none of them passed this fight for survival. Looks like life prefers to be more economical than error-free.
The Crystallites (TCs for short) started to explore different species (which we call mainframes, PCs, tablets, smart phones, printers, scanners, TVs, players, etc.), the whole zoo of the life forms. They found out that every living organism contains big blobs of digits which they decided to call, say, chromosomes (and what we call programs). They assign a number for each known for them chromosome and started expore each of them separately.
First they sequenced as much chromosomes as possible to look at the hex content.
Then TCs discovered that every chromosome has common for different species parts which they decided to call, say, chromosome region (and what we could imagine as statically embedded DLLs). They discovered that each region contains coding fragments each of which they decided to call, say, gene (function in our world).
TCs decided to find out what each gene is responsible for. So, they cutout a single gene and put it for processing. They noticed that if they start execution from particular offset, in 85% of the experiments speakers start to produce a sound of random frequency (they are not aware of parameters). They even found that the function has multiple entry points with multiple exit points. So they classified the gene as belonging to a sound phenotype and conveniently named it RFSP aka random-frequency-sound-producer. Tremendous work was done trying to find a functional responsibility of as many genes as possible.
Isn't it silly way of exploring an unknown?
Probably, not.
How else you can explore a black box you have no idea about?
So far, so good.
There is a catch. It is very easy to damage a gene, so it doesn't work properly.
But it is very hard to create meaningful new or modify existing gene to produce the result they would need. Say, modify RFSP gene so it would produce 440Gz sound. It is possible. They can try to change a value in each byte of the gene, eventually replaced a parameter with constant 440. The gene does what they wanted. But the living specie starts beeping instead of talking.
So, genetic modification is extremely dangerous and may lead to unpredictable results if their knowledge is based on experimental data only. Knowledge of the instruction set and common structure (architecture) is required to get any meaningful result. Understanding in this area would give absolutely astonishing possibilities. But who knows when these breakthroughs may happened? You can spend many e20 ps and do not have any meaningful result after all.
So, The Crystallites decided to explore damages in genes that happens, though rarely, during copying of the chromosome regions from one specie to another.
They noticed that if gene CNSS aka create-new-SPREADSHEET is losing one bit at the position 284, only halve of the expected spreadsheet appears on screen. They named the mutation 1-284-0, classified the genetic disorder as SPREADSHITTING DISEASE, put descriptions in all their databases, and continue to explore another damages.
Because The Crystallites do not recognize our biological life, just don't see it, unable to see, they probably consider evolution of machine life as series of random changes, mutations, which in zillions of generations perfected in the fight for survival. They may theorize about different directions in this evolution, discuss why some branches of species are died out. They may find a proof of energy crisis which hit the Earth in the pre-historic time and wiped out less energy-efficient monstrous species.
They witnessed the birth of tiny 8-bit species which took over the world after all.
They may argue that such an evolution cannot be random, there should be a master programmer, a god behind all this. They my discuss evolutionism vs. creationism. Uh-h-h. It's getting scary. I'd rather not elaborate this further.
I think you've got the idea.
Use your imagination.
But please remember what I started from:
This analogy is just as good as it is bad. :-)
Even with all oversimplification, somehow, I feel that genetic science went through the similar path.