Page 1 of 2 12 LastLast
Results 1 to 30 of 53

Thread: Information content of human genome

  1. #1
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts

    Information content of human genome

    Working on developing a cost estimate for AI.
    https://docs.google.com/document/d/1...Fn9IglW3o/edit

    Appendix A might be of interest. I am trying to estimate the information content of the human genome (40 hours to compress with paq8pxd) and compare with source code to estimate how much it would cost to create something equivalent in complexity to the human brain at birth. I estimate 300 million lines of code (USD $30 billion).

  2. #2
    Tester
    Black_Fox's Avatar
    Join Date
    May 2008
    Location
    [CZE] Czechia
    Posts
    471
    Thanks
    26
    Thanked 9 Times in 8 Posts
    Quote Originally Posted by Matt Mahoney View Post
    I estimate 300 million lines of code (USD $30 billion).
    http://en.wikipedia.org/wiki/Blue_Brain_Project ?
    I am... Black_Fox... my discontinued benchmark
    "No one involved in computers would ever say that a certain amount of memory is enough for all time? I keep bumping into that silly quotation attributed to me that says 640K of memory is enough. There's never a citation; the quotation just floats like a rumor, repeated again and again." -- Bill Gates

  3. #3
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    The Blue Brain is an interesting research project, but it is like simulating another computer at the transistor level instead of the instruction set level, and without any of the software.

  4. #4
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,471
    Thanks
    26
    Thanked 120 Times in 94 Posts
    I think they should start with something simpler, like a brain of a bee. Then make a robot that collects the pollen from flowers and have the same effectiveness and accuracy as a real bee.

  5. #5
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    The Blue Brain project is already working on simulating a mouse brain. Of course there is not much use for an artificial mouse or even an artificial human except for research. For production, you need to make AI labor cheaper than human labor. We have already automated lots of the easier parts of the problem. This has led to jobs that are more interesting and better paying because the simple, repetitive tasks (like most factory work) are now done by machines.

    Simulating bees might be interesting to learn how bees work so we can genetically engineer them to make more honey. Or we might figure out how to make artificial honey from corn syrup and other chemicals, or using genetically engineered bacteria.

  6. #6
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 71 Times in 55 Posts
    The document mentions a few problems that are technically extremely hard to solve. For instance, interstellar travel.

    I did a few calculations. The nearest star, proxima centauri, is 2.7x10^5 AU from the sun.

    Suppose that the greatest distance it was possible for early humans to travel was 100 miles on foot, as an extremely rough estimate.

    The farthest a human has traveled to date has been the moon, which wikipedia says is about 2.6x10^-3 AU. At 1 AU = 10^8 miles, that comes to 2.6x10^5 miles.

    So, let's suppose that, due to technological progress since early mankind, the distance humans are able to travel has increased by about 3 orders of magnitude (as a rough estimate).

    The difference in distance between the earth and moon (2.6x10^-3 AU), compared to the distance to the nearest star (2.7x10^5 AU) is about 8 orders of magnitude.

    So, in order to travel to another star, human capacity for travel will have to increase by 8 orders of magnitude. wikipedia says mankind originated about 200,000 years ago.

    Assuming that the growth of technological progress is exponential and constant, and will continue at the same rate into the indefinite future, we can estimate that roughly an additional 8/3 * 200,000 or, about half a million years is needed for humans to achieve the technology for interstellar travel.

    This is a very coarse method of estimation, I admit, with dubious assumptions. However, I think it justifies classifying interstellar travel in a class of problems so difficult that, at best, no reasonable prediction can be made about when, or whether, it can be solved. My private opinion is that interstellar travel is impossible as a practical matter, and, given enough study, people might well reach the conclusion that even in principle it can't be done.

    I have great respect for your expert knowledge of compression, so I would look forward to it if you would critique my argument.

    Thanks.
    Last edited by nburns; 21st March 2013 at 14:53.

  7. #7
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    I agree that interstellar space travel by humans is not practical. It takes too long. Using nuclear fission or fusion to convert about 1% of mass to energy means you can achieve speeds of around 10% of the speed of light. The nearest inhabitable planets might be 10-20 light years away at best, so not possible to reach them within human lifetime.

    If you are more patient, like hundreds of millions of years, it might be possible to use genetic engineering or nanotechnology to design the seeds of life, and spread them throughout the galaxy using conventional rockets.

  8. #8
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts
    Quote Originally Posted by Matt Mahoney View Post
    I agree that interstellar space travel by humans is not practical. It takes too long. Using nuclear fission or fusion to convert about 1% of mass to energy means you can achieve speeds of around 10% of the speed of light. The nearest inhabitable planets might be 10-20 light years away at best, so not possible to reach them within human lifetime.

    If you are more patient, like hundreds of millions of years, it might be possible to use genetic engineering or nanotechnology to design the seeds of life, and spread them throughout the galaxy using conventional rockets.
    I don't think human life will exist on earth for another thousand years. The descendants of mankind that survive if any will most likely be to dumb to even after a million years get off this planet. And even if they became as smart as us at our peak. There may not be enough resources left to do anything like that. I suspect also that the reason we do not see evidence of advances civilizations is that there is a fatal flaw in nature that causes intelligent civilizations to destroy themselves. Maybe millions of civilizations reach our level of achievement and then quickly burn out so that when the next civilizations time window opens up they are gone. Not sure how often this happens where two groups occupy same star system but it would be nice if we discover at least fossil insects on Mars.

    Now the dream: If and that's a big if, man does not destroy itself and science goes on. In a few hundred years we would have not only intelligent machines to accelerate the advance. But I think that its possible to maybe use large ramjet engines. And even possible to swing by groups of black holes to accelerate to very fast speeds. And if worse comes to worse put the people in the space craft into a frozen like state.

  9. #9
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    Nick Bostrom in http://www.nickbostrom.com/extraterrestrial.pdf offers two possible reasons why we have not observed life on other planets.

    1. There is a Great Filter in our past that makes spontaneous evolution of life extremely rare.
    2. There is a Great Filter in our future that makes survival past the development of advanced technology extremely rare.

    But I think that there is a third possibility, that if a very advanced civilization did arrive, that we would be unaware of it. The analogy would be like trying to explain the existence of human civilization to a mound of ants. They may have created the world we observe, but since we know no other world, they would be invisible to us.

  10. #10
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 71 Times in 55 Posts
    Quote Originally Posted by Matt Mahoney View Post
    Nick Bostrom in http://www.nickbostrom.com/extraterrestrial.pdf offers two possible reasons why we have not observed life on other planets.
    One important question that I'd like to know the answer to: from how far away could we observe ourselves? I imagine the observation would take the form of radio signals. Could we observe our own signals if they were coming from another star? I imagine they'd be pretty feeble, and they would be backlit by a giant, noisy star. It doesn't seem like a sure thing.

    If more of an explanation than that is required, I am somewhat pessimistic about the long term. I can't see how this civilization can survive much beyond the reserves of fossil fuels. I'm sure that the human race will outlast this civilization, but those future people will probably not be in a position to send out signals into space.

    Light travels incredibly slowly across long distances. The first significant radio transmissions, around 80-90 years ago, must have only reached a handful of stars. If the human race sends out radio signals for 250 years, this will look like a pulse going out into space. Another civilization would have to be in exactly the right stage at exactly the right time in order to catch it. If there was an opportunity to catch one coming from another star during the Roman Empire, we would have missed it.

    A lot of serendipity went into getting to this point, besides. If we hadn't been blessed with fossil fuels, we would be living under pre-industrial conditions, and we would be invisible. As far as our planet, taking a look at Venus shows what happens when conditions are not quite perfect. The moon is believed to be the result of a collision with another planet that was angled in just the right way that it tore off a large chunk of Earth and sent it into orbit. There is no other moon like this in the solar system. This moon might be another requirement for getting to this point; apparently it acts to stabilize the Earth's rotation, which prevents wild swings in the axis, and it has lengthened days from 4 hours to 24. Evolution seems to proceed by jumping from equilibrium to equilibrium; if an asteroid hadn't extinguished the dinosaurs on schedule, dinosaurs might still be dominant. From what I understand about human evolution, we emerged in a narrow window of opportunity in Africa just in time to dodge the coming ice age, which abolished the temperate environment we came from and sent us into permanent exile under the harshest conditions. It seems like everywhere you look, there is some coincidence that may or may not have been essential. We only have one example to extrapolate from. But it looks like there were a lot of amazingly improbable events.

    Nick Bostrom seems to favor outlandish explanations. I don't think you need a Great Filter. I can see an overabundance of candidates for little ones. The fate of the human race in the distant future is too much for one person to worry about, anyway. It's silly.

  11. #11
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts
    Quote Originally Posted by nburns View Post
    One important question that I'd like to know the answer to: from how far away could we observe ourselves? I imagine the observation would take the form of radio signals. Could we observe our own signals if they were coming from another star? I imagine they'd be pretty feeble, and they would be backlit by a giant, noisy star. It doesn't seem like a sure thing.

    If more of an explanation than that is required, I am somewhat pessimistic about the long term. I can't see how this civilization can survive much beyond the reserves of fossil fuels. I'm sure that the human race will outlast this civilization, but those future people will probably not be in a position to send out signals into space.

    Light travels incredibly slowly across long distances. The first significant radio transmissions, around 80-90 years ago, must have only reached a handful of stars. If the human race sends out radio signals for 250 years, this will look like a pulse going out into space. Another civilization would have to be in exactly the right stage at exactly the right time in order to catch it. If there was an opportunity to catch one coming from another star during the Roman Empire, we would have missed it.

    ...
    Interesting how light speed which is the fastest speed we know of is called incredibly slow. But I agree with you compared to size of known universe its slow. And compared to how short a time humans have had the ability to send and receive signals our time in the universe is almost infinitely small. But the expanding bubble of our radio and power signals 60Hz as they spread across this small galaxy if there is a more advanced life in this galaxy it will be intercepted. My guess is that bubble is already so big that if highly advanced life exists then it already has been intercepted. It's selfish to think that if many advanced civilizations possible that we would be the first and only at this point of time unless there is usually only one advanced civilization per galaxy in this universe.

    Its clear if we advance we will always be looking at the skies for radio signals even if we move to other methods of communications. At our present state of science its unlikely we will receive signals from others that we can realize unless they are at a similar level of advancement which is highly unlikely. I also think its foolish for an advanced civilization to send out signals in hopes that they receive a reply since that would be a waste of time on there part. Yet a truly advanced civilization would investigate US if they get a signal. I think they would not respond to our signal with a signal of there own since they would be fearful that we could have more potential as an advanced race who though fate started late. If they come I think we would be able to observe them but they will not go out of there way to warn us of there approach. They would most likely in any case destroy us. Look at what happened when any civilization on earth meets an isolated human civilization the more war like will destroy the other. The Aztecs where more advanced than the Spanish in many ways yet there vast wealth of knowledge was destroyed by the Spanish. Think what would have happened if a different intelligent species evolved on earth one would have wiped out the other. In the Universe it would be foolish to think an advanced race was anything like us since it evolved in a different planet. I pray we are the only advance species in this galaxy since it's clear if we are not and they have been around only a thousand years ahead of us. They will destroy or enslave us there really is almost not other options for them.

  12. #12
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 71 Times in 55 Posts
    Quote Originally Posted by biject.bwts View Post
    Interesting how light speed which is the fastest speed we know of is called incredibly slow. But I agree with you compared to size of known universe its slow. And compared to how short a time humans have had the ability to send and receive signals our time in the universe is almost infinitely small. But the expanding bubble of our radio and power signals 60Hz as they spread across this small galaxy if there is a more advanced life in this galaxy it will be intercepted. My guess is that bubble is already so big that if highly advanced life exists then it already has been intercepted. It's selfish to think that if many advanced civilizations possible that we would be the first and only at this point of time unless there is usually only one advanced civilization per galaxy in this universe.
    Here's what I came up with as far as which stars are within a few decades at the speed of light.

    http://www.atlasoftheuniverse.com/50lys.html

    It's not insignificant, but it's only a tiny sliver of the galaxy. The galaxy is 100,000 light-years across. Unless dangerous ETs are extremely common, they won't even know about us for many human lifetimes.

    I don't think we should give other civilizations too much credit. I'm assuming that what's technologically hard for us is also hard for them. And we've benefited from a lot of good fortune, so we're probably way ahead of the game.

    Its clear if we advance we will always be looking at the skies for radio signals even if we move to other methods of communications. At our present state of science its unlikely we will receive signals from others that we can realize unless they are at a similar level of advancement which is highly unlikely. I also think its foolish for an advanced civilization to send out signals in hopes that they receive a reply since that would be a waste of time on there part. Yet a truly advanced civilization would investigate US if they get a signal. I think they would not respond to our signal with a signal of there own since they would be fearful that we could have more potential as an advanced race who though fate started late. If they come I think we would be able to observe them but they will not go out of there way to warn us of there approach. They would most likely in any case destroy us. Look at what happened when any civilization on earth meets an isolated human civilization the more war like will destroy the other. The Aztecs where more advanced than the Spanish in many ways yet there vast wealth of knowledge was destroyed by the Spanish. Think what would have happened if a different intelligent species evolved on earth one would have wiped out the other. In the Universe it would be foolish to think an advanced race was anything like us since it evolved in a different planet. I pray we are the only advance species in this galaxy since it's clear if we are not and they have been around only a thousand years ahead of us. They will destroy or enslave us there really is almost not other options for them.
    I'm not a physicist, but what I take from physics is that Einstein's laws of relativity are extremely well-tested and sound. I can't see anything in relativity that is good news for star travel. The limit on speed is bad enough, but the more you approach the speed of light, the more you gain in mass and the harder it becomes to accelerate. As you approach your destination, you have to expend half the energy of the trip to slow down.

    Star travel looks like one of the worst investments of resources you could possibly make. If ETs are smart, they probably long since realized this and invest their resources where there's a reasonable chance of return.
    Last edited by nburns; 28th March 2013 at 22:32.

  13. #13
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts
    Quote Originally Posted by nburns View Post
    Here's what I came up with as far as which stars are within a few decades at the speed of light.

    http://www.atlasoftheuniverse.com/50lys.html

    It's not insignificant, but it's only a tiny sliver of the galaxy. The galaxy is 100,000 light-years across. Unless dangerous ETs are extremely common, they won't even know about us for many human lifetimes.

    I don't think we should give other civilizations too much credit. I'm assuming that what's technologically hard for us is also hard for them. And we've benefited from a lot of good fortune, so we're probably way ahead of the game.



    I'm not a physicist, but what I take from physics is that Einstein's laws of relativity are extremely well-tested and sound. I can't see anything in relativity that is good news for star travel. The limit on speed is bad enough, but the more you approach the speed of light, the more you gain in mass and the harder it becomes to accelerate. As you approach your destination, you have to expend half the energy of the trip to slow down.

    Star travel looks like one of the worst investments of resources you could possibly make. If ETs are smart, they probably long since realized this and invest their resources where there's a reasonable chance of return.
    Considering the billions of years that have passed and assuming there is no fatal flaw in which all intelligent life reaches a point where it does not destroy its self. It would be foolish of us to believe that they could not have reached our level a million years ago. They would have spread out through the galaxy by now. And they would have a vast array of sensors to track emerging civilizations to destroy or stop their development before they become a threat to there own survival.

    We most likely we never see the beings from other planets if they are a million years ahead of us They will most likely destroy our sun before we realize what is happening. You had better hope we are alone. I have watched science develop in my country its dead. The only nation that could say go to Mars and beyond with its people is China. The US gave up on real science decades ago. We past our peak and are accelerating into the dust bin of history. The only question is will Chinese elites make the same mistake when they are top dog or will they place a premium upon science and space exploration.

    Some how I fear they will learn nothing from our failures. Elites just tend to grab power and over estimate how intelligent they are.

  14. #14
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    I agree that detecting radio signals from another planet seems unlikely. If they are technically advanced, then they would want to transmit information as efficiently as possible. That means relaying signals as low power over short distances (because channel capacity increases with the log of power) and not wasting energy by allowing it to leak into space. Also, they would compress the signals and use all of the available bandwidth, so to us it would just look like thermal noise.

    Of course they might want to be known, just like we drew pictures on Voyager I. It is conceivable they could beam a powerful signal to Earth, if they knew somehow that we would be listening. But how would they know? If spontaneous evolution were rare, and it took a lot of coincidences to reach advanced stages, then the nearest intelligent life may be in another galaxy. The best they could ever do is see Earth before humans even evolved because of speed of light delays.

    I don't think energy would be too much of a problem. Fossil fuels are only a little cheaper than renewable sources like solar, so using them up would not be a big deal. Humans currently consume 16 TW of power worldwide, of which 5% is in the form of food. The Earth receives 174,000 TW of energy from the sun, with about half of that reaching the surface. The sun's total output is 3.846 x 10^26 W (10^14 TW), which could be captured by a Dyson sphere. If this energy were sent out in a tightly focused beam, say 10 microradians, then it would appear brighter than the total output of our galaxy, 5 x 10^36 W.

  15. #15
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 71 Times in 55 Posts
    Quote Originally Posted by Matt Mahoney View Post
    The sun's total output is 3.846 x 10^26 W (10^14 TW), which could be captured by a Dyson sphere. If this energy were sent out in a tightly focused beam, say 10 microradians, then it would appear brighter than the total output of our galaxy, 5 x 10^36 W.
    That sounds extremely dangerous .

    I tend to be skeptical about technological advances I have not seen. I guess that's simply where we differ. It's basically the zero probability problem, isn't it? The more I learn about compression, the more it seems like a microcosm of life.

  16. #16
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 71 Times in 55 Posts
    Quote Originally Posted by biject.bwts View Post
    The US gave up on real science decades ago. We past our peak and are accelerating into the dust bin of history. The only question is will Chinese elites make the same mistake when they are top dog or will they place a premium upon science and space exploration.
    The aliens are probably suffering from the same problems we are. They probably overestimate our technology, too. If they come for us, we can just bluff.

  17. #17
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    Quote Originally Posted by nburns View Post
    That sounds extremely dangerous .
    Yeah, I suppose blowing up planets would be one application Actually it wouldn't be a big explosion, more like burning off the outer layers until nothing was left. The Earth's gravitational potential energy is equal to about 1 week's worth of the sun's total output.

    The aliens are probably suffering from the same problems we are. They probably overestimate our technology, too. If they come for us, we can just bluff.
    Nope, no Dyson sphere. We can take them.

  18. #18
    Member
    Join Date
    Oct 2013
    Location
    Filling a much-needed gap in the literature
    Posts
    350
    Thanks
    177
    Thanked 49 Times in 35 Posts
    I've always thought that advanced intelligences are likely to be artificial and digital, and propagate themselves by transmitting their plans at the speed of light to anybody willing to reconstruct them---just transmit a blueprint and memory dump and say, "built this and load this data into it".

    It's by far the most efficient way to travel, and any civilization willing to propagate itself that way would have a gigantic advantage over any civilization that wasn't.

  19. #19
    Member
    Join Date
    Oct 2013
    Location
    Filling a much-needed gap in the literature
    Posts
    350
    Thanks
    177
    Thanked 49 Times in 35 Posts
    It seems to me that low-level compression-type estimates of genomic information content are probably way too high.

    You have to realize that you're looking at inefficiently coded instructions in a programming language, and what kind of programming language, to see which information is interesting and which is mostly arbitrary stuff that could be replaced with other arbitrary stuff with no loss of function.

    As I understand it, the vast majority of functional genes seem to be mostly productions in a fuzzy propositional production (rule) system.

    A typical gene is just a rule with a boolean-like conditional and a consequent that may encode several propositions, like

    If A and B and not C then E and F and G

    The (fuzzy) values of propositions A B and C are implemented as concentrations of molecules with particular-shaped binding sites on them, which can bind to the inhibitory or promoting regions of the gene (the Left Hand Side of a rule) to encourage or discourage it from "firing" (in rule-based system parlance), i.e., being "transcribed" (in molecular genetics parlance).

    When a gene is transcribed, the base sequence in the coding region directs the construction of a corresponding RNA molecule, which may itself be a signaling molecule, or may be further transcribed to create a corresponding protein, which is the signaling molecule.

    Whichever happens, the usual thing is that all that matters about that transcribed sequence of bases or amino acids is the SHAPE it naturally folds up into, (given molecular kinetics in cytoplasm/nucleoplasm) and all that matters about that shape is certain importantly active REGIONS of that shape---areas on the surface of the molecule that geometrically fit (more or less) into the promoting or inhibiting binding sites in the control region of other genes (and/or the same gene).

    At a short timescale, what you have is basically a discrete but stochastic rule-based system, where the probabilities of the firing of different rules depends on the concentrations of molecules with suitably-shaped regions exposed. The more promoting molecules you have bouncing around in the plasm, the more often one will dock for a while to the promoter region of a gene, and the fewer you have, the less often one of them will dock to the inhibiting region of a gene.

    Whether a "rule" actually fires at a given moment depend on those concentrations, chance bouncing around of signalling molecules, and whether the gene transcription machinery is around there at the time and ready to grab it and go.

    The upshot of this is that the information content of genes must be a whole lot lower than it looks---astonishingly low.

    Most sequences of coded RNA or protein are just structural stuff to affect how the signalling molecule folds, and any number of sequences would do the same job---a small minority of sequences, but still quite a large number. Those sequnces are just there to ensure that the actually important regions of the molecule---the ones whose shape determines its activity in promoting or inhibiting gene firing---end up exposed and not interfering with each other.

    Changes to the RNA or protein sequence that don't affect the shapes of promoting or inhibiting regions, or affect whether they're exposed appropriately, do not matter for normal gene function.

    You see this in patterns of variation of highly conserved genes---genes that have been around for a zillion years, because they're important. You get a lot of random mutations in some that don't matter much, because they don't affect the function of the gene. You also get some regions that mutate interestingly, where most mutations are bad and go away, but others hang around because they're equivalent, and a gene may mutate back and forth between various equivalent forms over time.

    As I understand it, the actual information content of the genome must therefore be shockingly low. Most parts of most coding sequences are just there for spacing, not actually encoding interesting information.

    You could probably tighten a guessed-at upper bound on the information content of the gene by taking that into account, and noticing which regions of genes seem to vary phylogenetically and within species, without much effect. (I.e., noticing stretches that have a lot of randomish variants.)

    Such an estimate would still be way too high, though, because it wouldn't take into account the fact that of all the equivalent sequences of some uninteresting section of coding DNA, evolution's only going to find a few of them, because its search is very greedy, and favors isolated, locally harmless changes. Anything that requires a combination of two compensating mutations to generate a functionally equivalent molecule is much less likely to be found than a single harmless mutations, and sets of three or four are very unlikely to be found.

    For example, if one mutation shortens the folded molecule in between two crucial active sites, such that they interfere, and another lengthens it such that they don't, you might find that if you find the latter first---assuming lengthening doesn't hurt mach---and the compact it again with the first. RNA- and protein-folding effects tend to be bizarrely nonlinear, so harmless combinations of two or three mutations are at a huge disadvantage to singleton harmless mutations.

    That means that if evolution preserves sequence information, that may only be because it doesn't discover the vast majority of (more or less) equivalent sequences. Most of the information in the genome at the codon level isn't there because it "really matters" to what the gene is actually for, but because evolution doesn't know any better in the short run---it could be stripped out and replaced by something much simpler, but evolution doesn't know how.

    To understand the actual information content of the genome, we need to get a handle on a couple of basic things:

    1. What do the genes look like when viewed as a productions in a fuzzy propositional production system---how complicated are the boolean expressions on the left hand side (regulatory region) and the right hand side (coding region). To figure that out, we need to know what bind s with what.

    2. What else is going on, like conditional transcription, and with what effects. Genes often don't code for a single molecular product, but for a family of products, with odd editing things going on before the final molecule is produced. (AIUI, a gene that at first glance seems to produce a protein with one sequence may under different conditions produce 20 variant versions.)

    Depending on how that conditional transcription works and how it's effectively used, that could change things a lot. It may essentially act as a macro preprocessor that allows genes to encode significantly more information than they might naively seem to.

  20. #20
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 71 Times in 55 Posts
    I suppose it depends on how you define information. By some definitions, random data has the most information. But irrespective of that, I broadly disagree that the human genome likely has little information. It's not all that useful to compare DNA to a programming language. In a programming language, every behavior and side effect is well-defined and nothing is hidden, whereas DNA came about through billions of years of experiments, and it's impossible to tell what's important and what's not. You might guess that two DNA sequences are equivalent, but if you tried to exchange them, you might find that the one not chosen by nature has a remote downstream effect that couldn't have been predicted. Moreover, random differences in genes that occur naturally are what distinguish individuals. So what information matters? How do you decide?

  21. #21
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    The roundworm C. Elegans has 100M base pairs, 3% as much as a human, but both species have about the same number of genes, about 25,000. In humans, only 2% of our DNA codes for protein, but in C. Elegans, most of it does.

    So you could argue that both genomes have the same information content. But there is an important difference. Humans evolve faster per generation because there are more harmless and beneficial mutations per harmful mutation. The evolutionary search space is "smoother". It is like having two equivalent programs, one verbose and modular so it is easy to modify, and the other highly optimized to fit in as little memory as possible. Do they really have the same information content?

  22. #22
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 71 Times in 55 Posts
    I happen to be working with DNA right now. The bottom line is that we don't know what's important and what isn't. All of it is important until proven otherwise.

  23. #23
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    We don't know which genetic information is important. But we do know there is a wide range of genome sizes due to differences in levels of duplication and differences in genetic pressure to keep the genome small to enable rapid growth and reproduction. The best we can do is consider the smallest genomes among large groups of species.
    http://www.genomesize.com/statistics.php
    (1 pg = 10^9 base pairs).

  24. #24
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    Quote Originally Posted by nburns View Post
    It's not all that useful to compare DNA to a programming language. In a programming language, every behavior and side effect is well-defined and nothing is hidden
    does you mean that you 100% understand windows disassembly?

  25. #25
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 71 Times in 55 Posts
    Quote Originally Posted by Bulat Ziganshin View Post
    does you mean that you 100% understand windows disassembly?
    I was wondering if I needed to qualify that. No, but I'd hope it's more tractable than biology.

  26. #26
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    so, this is your believe and not a proven fact: "In a programming language, every behavior and side effect is well-defined and nothing is hidden"

    in fact, computers easily have unpredictable behavior f.e. in multicore environment. btw, 7-zip compression results are unpredictable when it uses 2+ threads

    and even programming language definitions contains UB (undefined behavior)

    overall, predictability and guarantees are properties of simple systems. when you have something as complex as C++ or modern processor, guaranteed predictability has the price of reduced performance, and people prefer to not pay that. the same does the God

  27. #27
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 71 Times in 55 Posts
    Quote Originally Posted by Bulat Ziganshin View Post
    so, this is your believe and not a proven fact: "In a programming language, every behavior and side effect is well-defined and nothing is hidden"

    in fact, computers easily have unpredictable behavior f.e. in multicore environment. btw, 7-zip compression results are unpredictable when it uses 2+ threads

    and even programming language definitions contains UB (undefined behavior)

    overall, predictability and guarantees are properties of simple systems. when you have something as complex as C++ or modern processor, guaranteed predictability has the price of reduced performance, and people prefer to not pay that. the same does the God
    Right. You can challenge whether programming is truly 100% predictable, which it sometimes falls short of. But at least it was a goal. Biology's only constraint was to produce an organism that could reproduce. So any predictable behavior is purely incidental.

  28. #28
    Member
    Join Date
    Oct 2013
    Location
    Filling a much-needed gap in the literature
    Posts
    350
    Thanks
    177
    Thanked 49 Times in 35 Posts
    nburns,

    "But irrespective of that, I broadly disagree that the human genome likely has little information. It's not all that useful to compare DNA to a programming language. In a programming language, every behavior and side effect is well-defined and nothing is hidden, whereas DNA came about through billions of years of experiments, and it's impossible to tell what's important and what's not."

    You have a very narrow concept of what's a programming language, one very biased by late-20th century programming practice.

    One of the original powerful programming language families was the Post production system, the bettrer part of a century ago, and it was very much a nondeterministic, asynchronous rule-firing paradigm.

    Post was a mathematician and never actually mechanized production systems, but a major point of production systems was to show how computation could be both clearly mechanizable and formalizable so that you could reason about it mathematically.

    Post's production systems predate Turing machines, and one of Turing's biggest achievements was to show that Post production systems and his very-different-looking Turing machines could be equivalent.

    IMO, production systems are the original programming languages---nature invented them billions of years ago, and Post reinvented them about 90 years ago. Von Neumann machines are just a handy way for humans to make computers right now.

    Serial computation can be modeled as the firing of rules in a highly constrained production system, where consequents of particular rules are used to enable one and only one other rule to fire next.

    That's exactly how you typically write boring, slow serial code in asynchronous production system languages, and how nature has been apparently been doing it with genetic regulatory networks for a billion years or so.

    When I say that gene expression is the implementation of a programming language, I'm NOT making a weak analogy to the kind of programming language most of us use when we program. I'm stating what I think is a literal truth---genetic regulation isn't like a programming language that many people are very familiar with, it literally IS a programming language of a kind that most of us are very unfamiliar with.

    I don't mean it as an introductory metaphor, which I agree would be very misleading. I mean it as a (claimed) scientific truth.

  29. #29
    Member
    Join Date
    Oct 2013
    Location
    Filling a much-needed gap in the literature
    Posts
    350
    Thanks
    177
    Thanked 49 Times in 35 Posts
    "Right. You can challenge whether programming is truly 100% predictable, which it sometimes falls short of. But at least it was a goal."

    It is a goal of most popular programming languages, but is very much not a goal of a bunch of other programming languages, like production systems meant to model higher-level aspects of neural processing, or languages for deriving maximally parallel asynchronous algorithms and protocols, parallel functional programming languages, hardware description languages, reactive planning languages for robotics, etc.

    You often want to reason about the weakest serialization constraints that yield a correct algorithm or protocol, maybe one with nondeterministic behavior in certain respects, and then derive a more efficient implementation by judiciously introducing further constraints to map it efficiently onto a particular range of hardware, transistor/latency/power budgets, or proposed neural substrate.

    If you're familiar with those kinds of things, viewing gene expression as firing of rules in an asynchronous, stochastic production system isn't weird in programming language terms at all. It's just very different from C.

  30. #30
    Member
    Join Date
    Oct 2013
    Location
    Filling a much-needed gap in the literature
    Posts
    350
    Thanks
    177
    Thanked 49 Times in 35 Posts
    nburns: "So any predictable behavior is purely incidental."

    I think that's very misleading. It's often strongly selected for, for the same basic reasons programmers select for it---it makes code easier to reuse and evolve in modular ways.

    Look at the conservation of the Hox complex of genes, which go back a zillion years in all sorts of macroscopic animals and a bunch of microscopic ones. Nature hit upon a serialized sequencing of rule firings that is very, very useful because it is so predictable---in many cases even very precise, timing-wise. Nature invents feedforward and feedback networks, phase-locked loops, etc. to generate sequenced and phased behavior bottom up, in much the same way that programmers do in hardware design languages, cognitive modeling languages, etc., and for the same fundamental reasons---it latches onto simple gimmicks that work and make problems more tractable.

    Looked at that way, I think it's pretty clear that most of the information in the genome is uninteresting noise in a certain basic sense.

    Just look at retransposons (e.g., the ubiquitous LINEs and SINEs) that comprise a big fraction of the genome. It's pretty clear that they are overwhelmingly noise injected by viruses reproducing themselves within genomes for their own evolutionary reasons. They mostly don't do anything, and we know why they're there anyhow---they're hard to get rid of, and usually don't kill their hosts, like fleas and tapeworms.

    Sure, they may have some effects, and we may have evolved around their influences so that you can't just take them all out and have everything behave exactly the same. We may even put some of them to more or less productive use, but it's pretty obvious at this point that they're mostly there for no very good reason, and any number of LINEs and SINEs can be removed or replaced with negligible effects. Their main effects are just (1) to space the interesting genes out, such that any of a zillion other spacers would have much the same effect, and occasionally to screw up gene transcription and provide another source of mutations, the overwhelming majority of them bad.

Page 1 of 2 12 LastLast

Similar Threads

  1. Estimating mutual information
    By Matt Mahoney in forum Data Compression
    Replies: 8
    Last Post: 18th February 2013, 01:16
  2. Online Content Management Services
    By Karhunen in forum The Off-Topic Lounge
    Replies: 2
    Last Post: 10th February 2012, 00:57

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •