Phylogenetically analyzing genes that are not present in every species may present you with some problems, but it is completely feasible to simultaneously analyze the relatedness of the species (based on your enrichment score), and the relatedness of the scores themselves.
Forgive me if this is too 'simple' an analysis, but you have not said what you have tried.
My choice of approach would be to use R (open source stats package) to generate a heatmap of your matrix of data. There are plenty of options for the method of clustering, but the defaults tend to produce quite a nice heatplot, with hierarchical cluster analysis performed on both dimensions of the data (you can specify only 1, or even none, if you prefer). I have used the following code to simulate some data to generate the below heatmap;
# generate 10x10 matrix using random data
x <- as.matrix(data.frame(rnorm(10),rnorm(10),rnorm(10),rnorm(10),rnorm(10),rnorm(10),rnorm(10),rnorm(10),rnorm(10),rnorm(10)))
# use heatmap function on the data. "labRow" and "labCol" simply remove the labels.
heatmap(x, xlab="Genes", ylab="Species", labRow="", labCol="")
Because it is randomly generated data there are no patterns really, but if you were to stick you real data in there it would look better no doubt. (In an actual analysis you will want to leave the labels on, I was just simplifying the plot). The function can handle missing values, so you could put all the genes you want to analyze in there, even if not all species have them.
Using this method you could see the most closely related species by gene enrichment, and also which are the most closely related genes (in terms of enrichment for your TF).
No comments:
Post a Comment