Constructing Phylogenetic Tree from Overlapping (Positive, Negative) genes over Multiple Species |
Constructing Phylogenetic Tree from Overlapping (Positive, Negative) Genes over Multiple Species
. |
Now we want to reconstruct the phylogeny from the proximity of genes on whole genomes. For this we give a metric function to compute the similarity of gene proximity. Now we hope to estimate the difference of two different gene pairs, {g_{a,x} , g_{b,x}} and {g_{a,y} , g_{b,y}} over two different species (chromosome) S_{x} , S_{y} . This means that g_{a,x} and g_{b,x} is orthologous to g_{a,x} and g_{b,x} , respectively. In the following g_{x} will denote the generic gene identity. Therefore g_{x} is orthologous to all g_{x,W} , where S_{W} ¡ô all species. In this chapter we only consider two different genes for a brief explanation. Computing the similarity between multiple genes from two different species can be easily computed by extending the following procedure. Let begin(p,q) denote the starting position(in terms of base pair) of gene g_{q,A} on species S_{A}. In a similar way end(p,A) denotes the ending position of gene g_{q,B} on species S_{B} . And |g_{a,x}| means the length of gene length such that |g_{a,x}| = |begin(a,x) - end(a,x)|
Let sim( S_{x} , S_{y} | g_{a} , g_{b} ) the similarity the configuration
of two genes g_{a} , g_{b} on S_{x} compared from g_{a} , g_{b} on S_{y} . Note that
our measure is not symmetric such that The Maximal Common interval between (g_{a,x} , g_{b,x})and (g_{a,y} , g_{b,y}} can be maximized by moving (g_{a,y} , g_{b,y}) over S_{x} to maximize the common interval between g_{a,x} and g_{a,y} , g_{b,x} and g_{b,y} . If g_{a,x} , g_{b,x} is completely identical to g_{a,y} , g_{b,y} respectively then sim( S_{x} , S_{y} | g_{a} , g_{b} ) = 1. In the following Figure 2, we slightly aligned S_{y} by moving it right in order to maximize the common(overlapping intervals). Common_{a} and Common_{b} intervals denote the overlapping intervals between g_{a} and g_{b} over S_{x} and S_{y}.
So we finally get the following result in Figure 2. By exploiting this sim( ) measure, we can construct the phylogenetic tree in terms of the proximity information on multiple genes. For this a typical method, e.g., Nearest-neighborhood Joining can be easily applied. |
Graphics Application Lab., Dept. of Computer Science,
Molecular Biology & Phylogeny Lab.,
Pusan National University
San-30, Jangjeon-dong, Keumjeong-gu, Pusan, 609-735, South Korea.
Phone: +82-51-582-5009 Fax: +82-51-515-2208