Thursday, April 22, 2010

Human Genome sequencing: computational aspect

Business,
… Knome.com, currently provides genome sequencing services but the cost is about $99,500 per genome
… At the end of February 2009, Complete Genomics released a full sequence of a human genome … will contain approximately 80,000-100,000 false positive errors in each genome
… In June 2009, Illumina announced that they were launching their own Personal Full Genome Sequencing Service at a depth of 30X for $48,000 per genome
… In November 2009, Complete Genomics announced that they are now able to sequence a full genome for $1,700… Complete Genomics has previously released statements that it was unable to follow through on
… In March 2010, Pacific Biosciences said they have raised more than $256 million USD in venture capital money and that they will be shipping their first 10 full genome sequencing machines by the end of 2010. … by 2015 … $100 per genome


Technology and open source software
http://www.cbcb.umd.edu/research/assembly_primer.shtml
http://sourceforge.net/apps/mediawiki/amos/index.php?title=AMOS

Problem statement in computing terms
A long string of 4 characters (A, T, G and C) is broken down into many pieces. Some of the pieces may be missing too for biological limitations and human error. This has to be reassembled by looking at sub-sequences that look same. This is error prone, but here is good news.
Multiple copies of the long string were shredded and it is shown that 8-10 copies shredded is good to get a very good genome sequence done.

State of Art
Greedy Algorithms
Overlap-layout algorithm – Hamiltonian path
Eulerian path
Align-layout
BAC-by-BAC (hierarchial sequencing)

… June 2008 the quantity of purity-filtered sequence data generated by our Genome Analyzer (Illumina) platforms reached 1 terabase, and our average weekly Illumina production output is currently 64 gigabases…


Looks inspiring?

-bala

Monday, April 19, 2010

Bug in java's System.out.write with large byte array

Here is a program that just writes lot of "a"s. Compile it and run it with
java B 60000
It doesn't write. Now try,
java B 50000
It does write

----------------------------------------------------------------------------------------
import java.io.*;

public class B {
public static void main(String[] args) throws IOException{
StringBuffer sb = new StringBuffer();
for(int i=0;i {
sb.append("a");
}
byte [] b= sb.toString().getBytes();
System.out.write(b);
System.out.println("Number of bytes: "+b.length);

}
}

---------------------------------------
My colleague Venkatesh pointed out that some error has occurred and because this method doesn't throw an exception, it sets the error,

System.out.println("Error=" + System.out.checkError());