The Naked Scientists

The Naked Scientists Forum

Author Topic: How much total data in global DNA, minus redundancies?  (Read 1148 times)

Offline AndroidNeox

  • Sr. Member
  • ****
  • Posts: 240
  • Thanked: 1 times
    • View Profile
There's a lot of data in the DNA of any species, but there are commonalities across species, so by using comparisons and compression we could store the info in the human genome plus that in the chimp genome and it would be considerably smaller than appending one to the other. Highly conserved genes really save on data because they can be subtracted from more species.

Does anyone have a handle on the total quantity of data, if you looked at the sum of all the species of Earth?

Would be interested to know how big such a database would have to be to archive the global genome.


Offline evan_au

  • Neilep Level Member
  • ******
  • Posts: 3942
  • Thanked: 226 times
    • View Profile
Re: How much total data in global DNA, minus redundancies?
« Reply #1 on: 06/07/2013 06:44:06 »
There are very large data compression gains possible by comparing the DNA of different species. The Chimpanzee is said to have DNA similarities to humans of 93-99%, depending on how you measure it (and what the person quoting it wants to prove). Some sections of DNA vary much more than others - some sections of so-called "junk" DNA can differ between individuals and species without apparent adverse impacts; but changes to protein-coding regions often produces visible impacts.

There are even larger coding gains possible by comparing your DNA with your parents DNA - it may eventually be possible to fit this into just a few kilobytes of storage.

Note that no individual has identical DNA in all their cells, as point mutations occur every few generations of cell division, and epigenetic differences occur in every different tissue, even if the DNA were identical. So recording the full DNA of a single individual would be very large.

In the end you need to record a "Reference" genome for an individual or a species, as the full DNA & epigenetic record for all cells or all individuals of all species is too large to store with today's technology.
« Last Edit: 06/07/2013 06:46:08 by evan_au »

The Naked Scientists Forum

Re: How much total data in global DNA, minus redundancies?
« Reply #1 on: 06/07/2013 06:44:06 »


SMF 2.0.10 | SMF © 2015, Simple Machines
SMFAds for Free Forums