Saturday, April 20, 2013

6. I am not afraid to sound silly




4/13/2013 mid-Atlantic

I am not afraid to sound silly

As an amature explorer without deep knowledge of the subject I do not have any boundaries of well established knowledge. My boundaries are my imagination only. I am not afraid of telling silly (from the official science point of view) things if I found them reasonable. Sure I should not rely on changing of fundamental physical constants though.

When I started my readings on genomics I frankly was very surprised, to put it mildly.

Last 50-60 years transformed digital technology into mature science with well defined and well established organization, in all facets of the industry from chemical and physical achievements to data processing and distribution and graphical interface.
I am happy that most of this timeframe I witnessed myself and was part of this developments myself too.
No doubt there were lots of quarrels as well as mutual agreements for coding, protocols, and other subjects of mutual interests.

Side notes.
1. I remember existence of 9 different code tables for Cyrillic alphabet.
2. When first prototypes of the Soviet supercomputer Elbrus were built someone decided to make it super-Russian as well. So they renamed hexadecimal digits
A B C D E F into Cyrillic
А Б В Г Д Е
and we had two consols: one with latin hexadecimals and another with Cyrillic.
Thus, say, 7BDF on latin console looked as 7БГЕ on the Cyrillic one.
After all the international standard was picked up. :)


What has helped? RFCs? Maybe informational technology itself? I do not have an answer, but it worked.

Now I decided to find out what "gene" means to try to apply its definition to digital processing of hexadecimal presentation of a genome. The definition turned out to be very broad: some piece of DNA sequence that encode instructions. With such a definition there is no wonder that thay even do not know how many genes in a genome at all. They say that human genome has 20000-25000 genes. Nice range for something called itself a "science"!

Then I found that the situation even worse. They classify genes mostly not by how they works but by how they are broken, and even cannot agree on how to name them.

So we can find gene definition like this:
CSF2RA, ID 1438, updated on 30-Mar-2013,
Also known as: GMR; CD116; CSF2R; SMDP4; CDw116; CSF2RX; CSF2RY; GMCSFR; CSF2RAX; CSF2RAY; GM-CSF-R-alpha.

I pity them, I feel sorry for them.

Tremendous, absolutely astonishing work on DNA sequencing was done in the past 20 years! But it all ended up in such a lousy organized structures. They try to experimentally find a function of every gene, but do not have clearer definition of gene than "piece of DNA" that codes proteins. No wonder that even having the whole DNA sequences of a genome they do not know how many genes are there. Each gene can be found experimentally only. Their way of exploration is a comparison DNA sequences letter-by-letter in a search of something broken, classified the broken part as a genetic disorder, and put this knowledge into all kind of databases with all possible comments on the research.

To define location of a gene in a genome they created very artificial creature named Gene Map Locus. Say, HFE ( aka, aka, aka...) human gene has Locus 6p21.3 wich means that it is located in the short arm (p) of the chromosome 6 in a place inconveniently defined by someone as region 21.3. I am glad that they at least have standard chromosomes numbering. Everything else looks like total nonsense from digital processing point of view. To find a gene data I need to know genetic disorder name it is causing!

They classify mutations like this: C282Y aka CYS282TYR which describes substitution of C base with T base at position 282 in the gene. From hex point of view this is simply loosing a 1 at the position, transforming 1(4) = 01(2) = C to 0(4) = 00(2) = T.


They defined 3-letters bytes they call "codone" to form Table of genetic codes:

It looks that they have chosen number 3 just because the resulting table nicely fits 1 sheet of paper.

I feel that it will be not that easy to find a common language with people who believes in the magic of numbers 3 and 20 instead of nice and round 4 (or better 8 because it gives the discovered pattern) and 32 or even 256 for amino acids, 236 of which would be reasonable to add to the 20 already known. But what I know about all this? Practically nothing. I am just playing with digits. :)

I realized that contemporary genetic has noble but different goal from data processing: their main goal is to fight genetic disorders, but not to explore how it should work, how it was originally programmed.

No comments:

Post a Comment