Our first approach to this project was in designing a java program that linked directly to NCBI's BLAST search engine and ran queries to that site. Our software would then parse the data results returned from the BLAST searches and input them in a readable form into our matrix algorithm. Our matrix algorithm calculates distances relative to a source and target organism and outputs the data graphically as a "tree". We chose to output it graphically as a tree because our focus was ease-of-use and readability for the end-user.
The problem with the original approach was the difficulty in designing the web interface to interact with NCBI's BLAST site. Per suggestion by Prof. Arkin, we've instead decided to create our own BLAST search locally by downloading the necessary genome data from the internet and source code for running BLAST.
Modular design divided into following portions:
(details of each are discussed in powerpoint presentation)
The input portion contains HTML/CGI files for user input and a Perl program for running BLAST and parsing the result.
1. HTML/CGI part has two files: blast.htm and blast.cgi.
http://sahara.lbl.gov/~lyan/cgi-bin/blast.htm
The above web interface allows the user to input a pathway name, organism names, protein names, multiple protein sequences, and a threshold. Blast.cgi will save those information into 5 files: Pathwayname, org.file.txt, Proteinnames, Proteinsequences, and Threshold. Then it will run the bp.pl9 program.
2. Protein Databases and BLAST tools (ie. blastall, fascacmd) for 5 organisms are located in /usr2/people/lyan/public_html. Those Databases and BLAST tools are used by "bp.pl9" which will do the following:
The output portion contains five classes in four different files. The classes are as follows: a) intree, b) dphy, c) Orthotable, d) dismat2 and e) newtree. They are stored in separate files except for dphy, which is saved as part of newtree. Intree.java takes in a text file of the output of the clustering algorithm and reads in the organisms, names and distances for drawing the phylogenetic tree. Dphy.java creates a JPanel and contains the method to draw the phylogenetic tree on the panel. Dismat2.java takes organism names and their distances from Metric.java and paints a graphical display of the distance matrix in another JPanel. Orthotable.java takes in organism names and a hashmap of vectors of orthologs in the different organisms from Metric.java and counts the number of orthologs to the seed genes in each organism. These are then displayed in a JTable in a third JPanel. Newtree.java is the major display class which calls methods from all the other classes and creates a tabPane containing the above three JPanels inside a JFrame. Newtree is then called by Phylogenetics to display the desired results.