Results 1 to 3 of 3

Thread: Another? DNA contest

  1. #1
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,134
    Thanks
    179
    Thanked 921 Times in 469 Posts

  2. #2
    Member
    Join Date
    Dec 2011
    Location
    Cambridge, UK
    Posts
    437
    Thanks
    137
    Thanked 152 Times in 100 Posts
    EDIT: total fubar on my part - ignore below unless you're curious on the *previous* contest

    That is for 454 sequencing instrument output files ("SFF" format).

    The instrument works via a technique known as pyrosequencing, where a single base type is offered up to the DNA template and if it gets incorporated a flash of light is emitted with intensity proportional to the number of bases. Eg if the template is:

    TCCAG

    and we flow past in sequence a repeated stream of T, A, C, G, then the events generated by the machine essentially amount to run lengths:

    ACCAG => T(1) A(0) C(2) G(0) T(0) A(1) C(0) G(1)

    Those 0, 1 and 2 values are theoretical. What actually happens is you get a bunch of values clustered around specific distributions, which you tease apart to work out where 0, 1, 2, etc peaks are. Graphically the intensity values, base-calls and quality values may look something like this (real data, but not from the competition):

    Click image for larger version. 

Name:	gap5_contig_editor.454trace.png 
Views:	182 
Size:	9.5 KB 
ID:	1841

    So there are very obviously high correlations between the signal intensities and the base-calls, as one was called directly from the other, and likely high correlations between the quality values and the signals too. I haven't looked at the competition data, but if it's like the early 454 data I saw then the signals themselves will have been normalised and processed to remove artifacts. The normalisation makes sure that intensity 400 is the median value for a 4-mer (eg AAAA), etc.

    The other processing involves removing cross correlations. The DNA being sequenced is many identical molecules, in order to improve the signal strength. In theory all molecules incorporate the same number of DNA bases at the same rate, but over time some lag. This means that signal starts to spread. I believe a mathematical model for how to correct this was published by Svantesson in the early days of pyrosequencing. It's presumably been improved since then, but her papers may give you an idea of the processing that happens to the raw data before it's presented in files to the user.

    Discussion of techniques for how to compress the data though I think may disqualify you from entry to the contest. Explaining what the data means and the attributes of it are presumably OK though. You'd have to check the rules carefully. (I'm not entering anyway.)
    Last edited by JamesB; 8th February 2012 at 17:22.

  3. #3
    Member
    Join Date
    Dec 2011
    Location
    Cambridge, UK
    Posts
    437
    Thanks
    137
    Thanked 152 Times in 100 Posts
    Bah ignore me, this is a new one I wasn't aware of. I was assuming it was a link to the previous sequencing instrument compression competition from topcoder:

    http://community.topcoder.com/longco...15023&pm=11734

    Apparently that's finished, so there are discussions between the entrants now on how they did it:

    http://apps.topcoder.com/forums/?mod...forumID=551174

Similar Threads

  1. ICFP contest starts today (Friday 18th June 2010)
    By willvarfar in forum The Off-Topic Lounge
    Replies: 11
    Last Post: 13th July 2012, 16:33
  2. Matlab Contest: Compressed Sensing
    By russelms in forum Data Compression
    Replies: 1
    Last Post: 28th April 2010, 22:48
  3. the latest rec math contest is just starting
    By willvarfar in forum The Off-Topic Lounge
    Replies: 0
    Last Post: 15th March 2010, 00:12
  4. Metacompressor.com first contest
    By Sportman in forum Data Compression
    Replies: 17
    Last Post: 15th October 2008, 00:50

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •