If you examine the human genome ~99% of the introns are under 500 kb. I would assume that a limit between 250 kb - 500 kb is reasonable for gene prediction. You may incorrectly predict the proper structure of a small number of genes that have these very large introns but this should be a small number. Furthermore, most popular sequence aligners tend to set an intron length limit between 500 kb and 750 kb.
Just keep in mind that you may increase the number of false positive introns you detect if you set this limit to high. Therefore, it may be worthwhile to try a few settings and evaluate the results.
EDIT:
My guess is that larger introns are constrained in mammals for two reasons
- They are more difficult for the spliceosome to excise properly. The spliceosome may have reduced assembly at the proper 5' / 3' splice sites and the branch point site. There is also a higher chance that there will be cryptic / decoy splice sites witihin the intron.
- They increase the time it takes to transcribe the gene
No comments:
Post a Comment