Results 1 to 4 of 4

Thread: DNA storage

  1. #1

  2. #2
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    The authors encode 750 KB as DNA. The data took 2 days to write at a cost of $12400 per MB and 15 days to read at a cost of $2200 per MB ($220 per MB for larger files). Information density is 2.2 PB per gram, or about 1500 Kg to store all of the world's 3 ZB of data. Data would be stable for 10000 years. (We have recovered DNA from Neandertals and mammoths, but dinosaurs are long gone). DNA synthesis and sequencing costs are dropping like Moore's Law, but still 10^8 times higher than disk.

    During reading, a 50 byte segment of one file was lost due to an unanticipated coding error. They used a Huffman code of 5 or 6 bases per byte that resulted in a string of 0xFF bytes translating to a repeating self-complementary DNA string that folded back on itself during sequencing. This problem could be fixed by compressing or encrypting the data first. They use 5-6 bases per byte instead of 4 because the writing process is simpler when bases don't repeat. This requires a base 3 differential encoding. The encoder is a machine like an inkjet printer that paints in each pass a single base onto millions of pixels each containing a different DNA strand. Each strand is 117 bases of data (including length and index fields and a parity base) plus a 33 base promoter on each end that is identical for all strands. Reading is like normal paired-end sequencing, but skipping the step of fragmenting the DNA into short strands and using custom software to recover the data from the reads. The reads are 96 base pairs, which is sufficient when reading from both ends with some overlap.

    DNA weighs about 10^-21 g per base. So the theoretical capacity should be about 10000 times higher than what the authors got. They used thousands of copies of each DNA strand with 4x overlap to reduce the error rate.
    Last edited by Matt Mahoney; 25th January 2013 at 18:26.

  3. #3
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    it's obly me who recalled Johnny Mnemonic?

  4. #4
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,954
    Thanks
    359
    Thanked 332 Times in 131 Posts
    Quote Originally Posted by Bulat Ziganshin View Post
    it's obly me who recalled Johnny Mnemonic?

    - Your storage capacity?
    - More than adequate.

Similar Threads

  1. How to use ultra-fast storage media?
    By wety in forum Data Compression
    Replies: 2
    Last Post: 5th January 2013, 00:14
  2. Another? DNA contest
    By Shelwien in forum Data Compression
    Replies: 2
    Last Post: 8th February 2012, 17:17
  3. Replies: 1
    Last Post: 13th May 2009, 10:46

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •