Constructing Phylogenetic Tree from Overlapping (Positive, Negative) genes over Multiple Species

Constructing Phylogenetic Tree from Overlapping (Positive, Negative) Genes over Multiple Species .
(To use EvOG, JRE(JRE 1.6 or newest) must be installed on your computer)

  Get Overlapping Phylogeny Tree for the Following Genes

         Gene Names:

Now we want to reconstruct the phylogeny from the proximity of genes on whole genomes. For this we give a metric function to compute the similarity of gene proximity. Now we hope to estimate the difference of two different gene pairs, {ga,x , gb,x} and {ga,y , gb,y} over two different species (chromosome) Sx , Sy . This means that ga,x and gb,x is orthologous to ga,x and gb,x , respectively. In the following gx will denote the generic gene identity. Therefore gx is orthologous to all gx,W , where SW all species. In this chapter we only consider two different genes for a brief explanation. Computing the similarity between multiple genes from two different species can be easily computed by extending the following procedure. Let begin(p,q) denote the starting position(in terms of base pair) of gene gq,A on species SA. In a similar way end(p,A) denotes the ending position of gene gq,B on species SB . And |ga,x| means the length of gene length such that |ga,x| = |begin(a,x) - end(a,x)|

Figure 1. Computing the similarity of gene proximity over two different species.

Let sim( Sx , Sy | ga , gb ) the similarity the configuration of two genes ga , gb on Sx compared from ga , gb on Sy . Note that our measure is not symmetric such that

sim( Sx , Sy | ga , gb ) sim( Sy , Sx | ga , gb )

Now we give a formal definition of sim( Sx , Sy | ga , gb ) as follows.

The Maximal Common interval between (ga,x , gb,x)and (ga,y , gb,y} can be maximized by moving (ga,y , gb,y) over Sx to maximize the common interval between ga,x and ga,y , gb,x and gb,y . If ga,x , gb,x is completely identical to ga,y , gb,y respectively then sim( Sx , Sy | ga , gb ) = 1.

In the following Figure 2, we slightly aligned Sy by moving it right in order to maximize the common(overlapping intervals). Commona and Commonb intervals denote the overlapping intervals between ga and gb over Sx and Sy.

Figure 2. Computing the maximal common interval. We slightly move Sy to right in order to maximize the common intervals.

So we finally get the following result in Figure 2.



We should consider the direction of gene(upstream or downstream) in computing sim( ). Thus those above computation is valid only if the direction of matched gene is consistent. since the matching an upstream gene to a downstream gene is not reasonable.

By exploiting this sim( ) measure, we can construct the phylogenetic tree in terms of the proximity information on multiple genes. For this a typical method, e.g., Nearest-neighborhood Joining can be easily applied.


Graphics Application Lab., Dept. of Computer Science,
Molecular Biology & Phylogeny Lab.,
Pusan National University
San-30, Jangjeon-dong, Keumjeong-gu, Pusan, 609-735, South Korea.
Phone: +82-51-582-5009 Fax: +82-51-515-2208