ASSIGNMENT1 || ASSIGNMENT2 || WEBBOARD || HOME
ASSIGNMENT 2
1
Comparative Genomics
Comparative genomics is the practice of analyzing and comparing the genetic material of difference species. Comparative genomics can 1.Tell us what are common and what are unique between difference species at the sequence level 2.Genome comparison maybe the surest and most reliable way to identify genomes and predict their function and interactions 3.The functions of the human genes and other DNA regions can be revealed by studying their counterpart in lower organism.
Three major research directors are 1.Purpose of understanding the similarity and difference between the genomes 2.Purpose of predicting gene function 3.Development of efficient algorithms for comparing large, genome scale sequence.
Comparison of complete genome sequences.
What to compare? 1. Statistics of the genome => size of the genome
=>Overall (G+C) content
What to compare? 2. Predicted ORFs =>total number
=>percentage of genome
=>Average length
=>predict gene with
1. Homology and assigned function 2.homology but function
=>*H.Pylori specific genes
*What is H.Pylori? It’s is colonizes the human gastric mucosa. H.Pylori can cause different diseases or even be beneficial to the infected host.
=> Strain-specific genes
=>Location of strain specific genes
What to compare? 3. Paralogues and Orthologues
=>Prologues families
=>DNA sequence difference between orthologues
=>Protein-sequence difference between orthologues
What to compare? 4. Genomic Organization and gene order
=>Duplication
=>Inversion and translocation
=>Gene order: Conservation of immediate neighbors
Detection Protein Interaction
=>Control lives of biology cells
=>Protein Interaction using experiment methods
Biochemistry, Molecular Biology, Genetics
2
=>Computation method base on subunit interfaces, Gene order, Phylogenetic profile, Gene fusion.
2. Predicting Protein Interaction Based on Gene Fusion
Definitions: - Gene fusion event
Certain protein families in a given species consist of fused domains
- Interaction
Defined as either direct physical interaction or an indirect functional association
Method: Input translation of all ORFs in complete genomes. One genome as query, and the other as references.
Procedure : 1.the query set is compared against itself using BLASTP; Pair wise sequence similarities are recorded in a binary matrix T.
2. The query set is compared against a reference set using BLASTP; Pair wise sequence similarities are recorded in binary matrix Y.
3.For each entry C in reference set, collect pair (A,B) from the query set where both a and B are similar to C.
- Look up (A, B) in matrix T.
- If (A, B) is null in T, run Smith Waterman to confirm dissimilarity
- If dissimilarity, collect (A, B) as candidates for a fusion event
3. Relative fast alignment of whole genome sequence
Sequence Alignment – Genome Scale
3
Challenges - Large size of the DNA sequence to be aligned
=>Memory
=>Speed
- Occurrence of both short and long insertions and deletions
- Large-scale changes such as tandem repeats and large scale reversals
- High degree of divergence in the third position of codons
A suffix tree based Method
-Designed for fast alignment of large
-Aligned two about 4Mb genome sequences
Three Steps 1.Identify all Maximal Unique Matches (MUMS)
2. Extract the longest set of matches that occur in the same order in both genomes
3. Close the local gaps by identifying inserts, repeats, tandem repeats, small mutated regions, and SNPs
Pro and Con
Pres: - very fast for alignment of genomes of different strains of the same species or genomes of similar species
-can handle long insertions and deletions
-can detect reverses, SNPs , repeats and tandem repeats
Con: - speed suffer significantly for less similar sequences
Other Research Areas in Comparative Genomics
- Using genome comparison for exon prediction and regulatory region prediction
- Building phylogenetic tree based on genome comparison
- Visualization of genome alignment