Thursday, 10 March 2011

dna sequencing - Why do restriction enzymes tend to have an even number of bases in their recognition site?

I think this is due to the over-representation of recognition sites with length 6:



data<-c(16, 16, 12, 12, 6, 6, 6, 6, 4, 16, 6, 6, 6, 6, 15, 15, 6, 6, 6, 6, 11, 11, 6, 6, 4, 4, 6, 6, 11, 12, 6, 6, 23, 23, 6, 6, 6, 6, 9, 12, 4, 4, 6, 6, 6, 6, 6, 6, 6, 6, 6, 7, 7, 10, 10, 6, 4, 6, 6, 11, 11, 9, 9, 6, 6, 6, 6, 5, 5, 8, 8, 6, 6, 8, 8, 6, 9, 10, 10, 6, 6, 6, 5, 5, 6, 4, 6, 6, 5, 5, 14, 14, 6, 6, 6, 16, 6, 6, 6, 6, 15, 15, 6, 6, 6, 6, 6, 18, 18, 7, 7, 11, 11, 20, 20, 6, 13, 4, 4, 6, 6, 6, 6, 6, 6, 6, 11, 6, 6, 7, 7, 6, 6, 6, 6, 6, 12, 6, 6, 10, 10, 23, 23, 7, 7, 23, 23, 12, 12, 6, 6, 6, 6, 10, 10, 6, 6, 8, 8, 6, 6, 35, 35, 11, 11, 7, 7, 6, 6, 9, 9, 8, 8, 16, 16, 6, 6, 17, 17, 6, 6, 6, 6, 23, 23, 6, 6, 4, 4, 21, 21, 12, 12, 20, 20, 6, 6, 6, 6, 6, 7, 7, 6, 6, 6, 6, 5, 5, 11, 11, 6, 11, 11, 6, 6, 5, 5, 7, 7, 11, 11, 11, 11, 6, 6, 12, 12, 6, 6, 6, 21, 21, 9, 9, 8, 8, 7, 7, 16, 16, 4, 4, 6, 6, 6, 6, 7, 7, 18, 18, 6, 6, 6, 6, 6, 23, 23, 7, 34, 34, 39, 39, 6, 6, 12, 12, 5, 5, 19, 19, 8, 8, 8, 8, 4, 6, 6, 6, 6, 6, 5, 5, 6, 6, 6, 6, 7, 7, 4, 4, 15, 15, 7, 7, 7, 7, 14, 14, 11, 11, 27, 27, 12, 12, 4, 4, 10, 10, 6, 6, 8, 8, 7, 7, 8, 8, 7, 7, 5, 5, 7, 7, 6, 7, 7, 6, 6, 8, 8, 39, 39, 6, 6, 12, 12, 8, 8, 7, 13, 13, 8, 8, 8, 8, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 5, 5, 7, 17, 17, 17, 17, 11, 11, 15, 15, 6, 6, 6, 5, 5, 5, 6, 6, 5, 7, 7, 12, 6, 12, 6, 6, 6, 5, 6, 6, 7, 2, 5, 11, 5, 6, 4, 5, 7, 5, 4, 4, 6, 4, 5, 6, 7, 12, 6, 7, 7, 7, 6, 4, 4, 7, 5, 6, 6, 6, 7, 5, 12, 13, 5, 6, 6, 6, 5, 11, 11, 5, 6, 10, 5, 5, 11, 6, 5, 5, 6, 5, 6, 5, 6, 6, 7, 7, 6, 5, 5, 7, 6, 5, 6, 5, 6, 5, 5, 7, 6, 6, 6, 3, 5)
h<-hist(data, breaks=0.5:40.5)
df<-data.frame(counts=h$counts, mids=h$mids)
df$even <- (df$mids%%2 == 0)
ggplot(df, aes(x=mids, y=counts, fill=even))+geom_bar(stat="identity")


Histogram of recognition site length



If you look at the histogram of lengths, there is no bias towards even recognition site lengths except for length 6.

No comments:

Post a Comment