Sunday, 8 June 2008

bioinformatics - Too few transcripts from transcriptome assembler Oases

I am trying to run Oases for transcriptome assembly. The result is far from expected, so I would like to ask whether I am running it in a right way? Thanks.



Here is my running command:



python scripts/oases_pipeline.py -m 25 -M 29 -o output -d " -strand_specific -shortPaired data/reads.fa" -p " -min_trans_lgth 100 -ins_length 300"


My library is strand-specific and pair-ended with length 67bp. The reads are shuffled as:



>0(left_mate_forwarded)
ACTC...
>1(right_mate_reverse_complemented)
TATA...


I got some transcripts, but are far from the transcripts annotated, also far from the result of Trinity. The longest contig from Oases is ~2500bp (vs. ~10000bp from cufflinks and ~6000bp from Trinity). The N50 value is also low. It only reports 20 contigs those cover full-length of some transcripts from Cufflinks (totally ~4000), while Trinity reports ~650.



The dataset I am using is a subset of S. pombe. Does it matter?



Could somebody help me point out whether something wrong here?

No comments:

Post a Comment