Design spec for Genealogy Database

Version 1.4
Date 17th Dec 02
Author Jeya Balaji [email protected]
What is new  
Earlier version genealogy13.html



Content

  • Aim
  • Functionality
  • Technology
  • Design
    1. Table design
    2. Class design
  • Web admin
  • Cost estimation
  • Issues



  • Aim

    Main theme: To collect details of ancestors and current generation, so that, genealogy charts will be available online.

    Supplementary idea: This database can be utilized for matrimonial search; ie with person info database, complete family background of a girl or a boy can be known.


    Website functionality

    The functionalities available in this website are listed here :


    Technology




    Design

    MVC model is adapted for implementing the web application. Jsp pages are used to display the records and collect data. servlets and beans are used in the back-stage to help this process.


    Database table design

    The main table, personTable, holds the records of individual. personTable and spouseTable stores the relationship between people.
    The basic relationship stored are father, mother and spouse. With these three relations, we should be able to retrieve all other relatives.

    matrimonialTable is used to place matrimonial info. The table horoscopeMatchingFactor determines the possibility of match between two horoscope.


    50k to 100k people (records) are expected to be available in the personTable.
    Here is the person detail -
    
    create table personTable (
    	personId     mediumint not null auto_increment,
    	name         varchar(40),
    	initial      varchar(10),
    	familiarName varchar(40),
    	sex          enum ('m', 'f'),
    	birthDate    date,
    	expireDate   date,
    
    	fatherId     mediumInt,
    	motherId     mediumInt,
    
    	religionId   mediumInt   fk classificationTable,
    	sectorId     mediumInt   fk classificationTable,		 
    
    	education    varchar(100),
    	profession   varchar(100),
    
    	street       varchar(40),
    	placeId      mediumInt   fk placeTable,
    	native       mediumInt   fk placeTable,
    
    	nationality  mediumInt   fk placeTable,
     
    
    	homePhone    varchar(20),
    	workPhone    varchar(20),
    	mobile       varchar(20),
    
    
    	email        varchar(70),
    	hideEmail    enum ('0', '1'),
    	webpage      varchar(100),
    	photo        varchar(100),
    
    	bloodgroup   enum('A+', 'A-' ...),
          
    	maritalStatus enum('single', 'married', 'divorce'),
    	rasi          enum(0 - 12),
    	nakshatram    enum(0-27),
    
    
    	hobbies      varchar(200),
    	description  varchar(500),
    
    	recordStatus enum('available', 'pending'),
         
    	primary key (personId)
    );
    
    // do we need a separate table or can we use varray in the personTable to hold list of spouse?
    create table spouseTable (
    	husbandId    mediumInt,
    	wifeId       mediumInt,
    	marriageDate date,
    
    	primary key (husbandId, wifeId)
    );
    
    create table matrimonialTable (
    	matriId     mediumInt,
    	personId    mediumInt,
    	
    	height      int,
    	weight      int,
    	
    	complexion  enum('fair', 'medium', 'black'),
    	smoking     enum('never', 'sometimes', 'often'),
    	drinking    enum('never', 'sometimes', 'often'),
    	
    	description varchar(500),
    	expectation varchar(500),
    
    	primary key matriId
    );
    
    



    HoroscopeMatchingFactor determines matching criteria of all possible combinations of rasi and nakshatram. This table will be useful for searching matrimonial table based on good horoscope match for a particular horoscope.
    create table HoroscopeMatchingFactor (
    	girlRasi          enum(0-12),
    	girlNakshatram    enum(0-27),
    	boyRasi           enum(0-12),
    	boyNakshatram     enum(0-27),
    	matchingFactor    float,
    
    	primary key   (girlRasi, girlNakshatram, boyRasi, boyNakshatram)
    )
    
    HoroscopeMatchingFactor table can have 2 indexes based on (girlRasi, girlNakshatram) and (boyRasi, boyNakshatram)


    As per social norms, person can be classified based on different belief and sectors (religion, caste, sub divisions). These divisions are maintained in an auxiliary table with the following structure.
    create table classificationTable (
    	divisionId      mediumInt,
    	baseDivisionId  mediumInt,
    	description     varchar(100),
    	divisionType    enum('belief', 'sector'),
    
    	primary key divisionId
    );
    
    Two divisionId (one for belief and one for sector) alone will be stored in the person record. Using this id, other divisions (parent, sub divisions) can be retrieved from the classificationTable.


    The city, state, country names are stored in placeTable.
    create table placeTable (
    	placeId         mediumInt,
    	basePlaceId     mediumInt,
    	pincode         varchar(10),
    	placeType       enum('country', 'state', 'city', 'town', 'village', 'area'),
    
    	primary key placeId
    );
    




    Class design

    Person is a java bean, which is vital object for all operations. Person bean is filled with the data from the database.
    Most of the search are done with the primary key (personId) lookup.

    Instead of having the attributes of a person (like name, addr etc) as separate variables, we can have them in a hashtable. Pros and cons are discussed in the issue section.

    Person bean has all the attribute related to a person. Note that zodiac sign and star (rasi and nakshatram) are available in a separate class Horoscope. Person class has Horoscope class.

    These classes are accessed from servlets and jsp.
    (TODO : yet to plan the jsp, servlets)

    Planning to provide SessionManager which will hold a pool of recently accessed person records.
    (TODO : have to sketch other responsibilities of SessionManager)

    
    class Person
    {
    	insertPersonRecord()
    	{ 
    	}
    	
    	updatePersonRecord()
    	{
    	}
    
    	deletePersonRecord()
    	{
    
    	}
    
    		
    	retrievePersonRecord()   // from db
    	{
    
    	}
    
    	retrievePersonName()
    	{
    
    	}
    
    	formatPersonRecordForDisplay()
    	{
    
    	}
    
    	formatPersonNameForDisplay()
    	{
    
    	}
    
    	protected int personId;
    	protected String name;
    
    	protected Hashtable personData;  // all other info of a person
    
    	protected Horoscope horoscope;
    }
    
    
    
    class Horoscope
    {
    	
    
    	matchHoroscope(Horoscope otherHoroscope)
    	{
    		girlHoroscope = this;
    		boyHoroscope = otherHoroscope;
    		for(int i = 0; i < 10; i++)
    		{
    			call match[I](girlHoroscope, boyHoroscope);
    		}
    
    		return numberOfMatches, crispDesc, detailDesc;
    	}
    
    	// there r 10 match function.  
    	// Each function returns a float indicating the matching 
    	// criteria.  Also these function place their comment into  
    	// crispDesc & detailDesc.
    	match[0-9](Horoscope girlHoroscope, boyHoroscope)
    	{
    
    	}
    
    	private static String rasiName[] = { "Mesham", � };
    	private static String nakshtramName[] = {"Ashwini", �};
    
    	private byte rasi;
    	private byte nakshatram;
    
    	private float numberOfMatches = 0;
    	private String[] crispDesc, detailDesc;
    }
    
    
    class SessionManager
    {
     	retrievePerson(int personId)
    	{
    		Person person = personPool.retrievePerson(personId);
    	  	if (person == null)  // retrieve from db
    	  	 	person = Person.retrievePerson(personId);
    	  	return person;
    	}
    	  
    	PersonPool personPool;
    }
    
    
    // a container to hold recently accessed person records in this session
    class PersonPool
    {
    	retrievePerson(int personId)
    	{
    	}
    	 
    	addPerson(Person person)
    	{
    	  	// implement lru algorithm to handle person object
    	}
    }
    
    
    



    LogWriter is an utility class to note different log messages generated by various modules.
    
    class LogWriter
    {
    	
    
    	int message(int messageLevel, int messageType, message)
    	{
    		/*
    		messageLevel is an indicator useful to consider to enter this log or not.
    		A global debugLevel is set with the number between 0 to 9
    
    		If messageLevel falls below debugLevel, then the message is entered (to the db)
    
    		messageLevel takes the following meaning :
    		
    1. Error message
    2. Warnings
    3. Sql template
    4. Sql command
    5. Session state
    */ } }



    Web admin

    Tasks for Web Admin





    Activities

    Time Frame Duration Action Cost Remarks
    asap   Register Domain name $30 register thru dotdnr.com or webhost?
    First of Feb 1 month or more Use trial period    
    Mid March 1 or 2 months Register for economy pack $7 per month  
    Mid May   Register for business pack $20 per month Upgrade based on traffic and data growth




    Web hosts chart

    This application's requirement is
    Details collected from Jan 10th, 2003 to Jan 16th, 2003
    Host Cost
    $ setup / monthly
    Facilities Other Plans Description
    webindia.com 0 / 25
    • Redhat Linux 7.0
    • 50 MB
    • 1 GB transfer per month
    • 100 MB bandwidth
    • OC 48
    • Jdk 1.3 - shared
    • Apache http 1.3.27 - shared
    • Tomcat 4.0.1 (Jsp 1.2) - shared
    • PostgreSql
    • 10 Pop3 id
    • $10 - less space
    • $33 - more space
    • Indian site
    • Have a close friend
    • No control panel
    • No lib like struts, cocoon, log4j
    myservlethosting.com 0 / 60
    • Redhat Linux 7.x
    • 75 MB
    • 4 GB bandwidth
    • 96 MB Ram
    • Jdk 1.3
    • Jsp 1.2
    • Cocoon, log4j, struts
    • Postgresql 7
    • OC 3 connection
    • $25 - 50 MB, shared jvm, engine
    • $90 - more space
     
    mmaweb.net 50 / 50
    • Solaris
    • 200 MB
    • 10 GB monthly data transfer
    • 80 MB Ram
    • Jsp
    • Cocoon
    • Postgresql
    • OC 3 multiple connection
    • $25 - 100 MB, 50 MB Ram, no Postgresql
    • $100 - more space
    • Looks good among Postgresql services
    • Got a positive feedback from Bear Bibeault, ranch-mate at javaranch.com
    verio.com 50 / 200
    • Solaris
    • 400MB
    • Jsp
    • Postgresql
    • Oracle Gateway (?? what is this?)
    Many plans ranging from $25 to $375
    • Number one web host
    • Seems costly
    +250 / +250
    • Oracle 8i license
    +$400 or +$600 for Oracle with more connection and space
    • Have to pay separately for Oracle
    ejip.net 135 / 135
    • 100 MB
    • Jsp 1.2
    • Oracle 8i
    Price varies when u choose different jsp container, db, email type etc
    • $20 pm extra for web mail
    • Many such service are available at additional cost
    • Should have already or buy Oracle 8i production license from them. cost?
    • Free trial account for 20 days
    95 / 95
    • Postgresql 7.2.1
    $105 for MS Sql Server 2k  
    wantjava.com 150 / 75
    • Solaris
    • Jsp 1.1
    • Mysql
       
    +100 / +100
    • Oracle
       
    eqhost.com 50 / 50
    • Linux
    • 150 MB
    • unlimited data transfer
    • Mysql, MS sql
    • Shared OC48 (2.56Gbps) Internet access node
    • Service start as low as $10



    Criteria for a good Webhost

    Here are some points to consider while looking for a webhost.




    Development environment

    The development environment I have in my local machine is as follows :



    Cost estimation

    Investment

    Article Amount
    (in Rs)
    Description
    Books 2000 Got the books
    • Professional JSP
    • Beginning XML
    • Oracle Bible

    Planning to buy couple of books of this nature (bpbpub.com)
    • 81 87105 86 0 Creating Commercial webpages
    • 81 7635 046 X Dynamic Html unleashed
    • 81 7656 271 8 XHTML
    • 81 7656 306 4 Advanced Html with DHtml
    • 81 7656 274 2 Commercial Application
    Printing 1000
    • Letter pad
    • Visiting card
    • ...
    Hard disk ???? Not sure if a hard disk is needed in the local machine.
    If needed, it may cost around 5k for 40GB
    Setup fee 1 month charge Most sites charge a setup fee
    Misc 2000  

    Running cost (per month)

    Article Amount
    (in Rs)
    Description
    Domain name 250  
    Web space 1500 Oracle cost seems very high; around $250 pm.
    Looking for other db like postgresql
    Internet/Power 1500  
    Domain name promotion 2500
    • Cable TV advt
    • Stickers in cafes
    • Notice in newspapers
    Expecting this cost will be cut in 3 to 6 months.
    Salary ???? Expecting people from different fields like :
    • Marketing, sales
    • Data collection
    • Web admin




    Issues

    Hashing

    Ideally, Person class should be a entity bean mapping to one record of the PersonTable. I am thinking about a lazy way to code Person class.
    Instead of having all the field names as an attribute(in a separate variable), we can have one hashtable which will hold all the name, value pairs.

    Advantage:
    Disadvantage:

    A question concerning the field names in the hashtable. Let us say that there are 100 Person objects in the jvm at a particular time. Each object has a hashtable which has almost same set of field names (we can expect all of the objects having many common fields). Take the scenario: there are 15 fields in each hash, out of which 10 are common to all the 100 hashtables. Will the jvm hold 15 * 100 strings (field names) in the memory or will it hold say 15 to 100 strings in the memory and use some indexing mechanism in the hashtable ...


    Finding the relationship

    A person record has three basic relationships : father, mother and a list of spouse.
    We can view the table as a graph structure, each person as a node in the graph.
    Our aim is to traverse through the graph (breadth first search) and find a path between two nodes.


    Instead of viewing as a single graph, it will be easy to picturize them as a collection of well connected graphs. Nodes in a graph has a field to mention which group they belong. This field will help us to take a quick guess on the feasibility of finding a path. However, we take this info just as a suggestion; irrespective of the field suggesting positive or negative about the path, we explore the graph to find the path.

    Is this field essential? Apart from space for such field in each record, we have to code for maintaining the definition. ie, when a new node enters the system, we have to check if different groups can be clubbed together through this new node. Also, deletion of a node should be considered for splitting groups.

    A temp table can be used to do the search. The nodes visited is placed in the temp table. Traversing the graph takes place parallely from both the direction, ie, from person 1 and person 2.

    The idea of navigation is simple. Get the relative's id and put them in the table, until a path is found. Is there any possibility of "heuristic" search?

    The above logic will fetch one path. How can we find other genuine paths between those 2 nodes? ie the new path shouldn't be a slight deviation from the first path; instead it should imply a new relationship between the nodes. Note that some of the nodes may be common between two different paths.



    Client side processing

    Instead of session level record pooling, we can try sending xml files to the client. xsl will be used at client machine to do the presentation in the browser.


    Advantage Disadvantage

    Data entry

    The webserver is expected to hold the main database. When there is a need to enter bulk data, it will be wise (in India) to enter them offline. (Recent trend - in cities, dishnet has come up with a scheme for unlimited direct net access without using phone line. If we have such connection, then we don't have to worry about this cost.)

    For bulk data entry, a local database is provided with facility to enter and upload records.

    The PersonId may be different for a record going from local to main db. The transferring code changes all the PersonId columns in the local db to that of main db and then transfers the records. This will ensure the id consistency between the 2 db. After uploading, the data collector will be prompted to delete the uploaded records (of the local db) or restore them.

    Note that the local db will hold a small fraction of the main db's PersonTable records.

    Important point to remember : only the personTable can be uploaded from local to main db. None of the other table will be uploaded. So, the data collector is not expected to add items to other tables of the local db. Instead, she/he should modify the main database and then synchronize the local db with that of main.

    This logic seems cumbersome. If there is a way to avoid this functionality, I would like to do that. So, I will try to get a unlimited connection (without using phone). Or collect the data in xml format (or which ever format is easy to type) and then try to upload the data to the main db using some automated/semi-automated process.




    House keeping

    Periodical check of the database is required to ensure the validity and correctness of the entered data. This operation can be done in a semi-automatic fashion.



    Choosing the classification

    We have numerous sectors, believes, and sub divisions (religion, caste, subdivisions etc) to which a person belongs.
    Here is a way to display that list and allow the user to choose out of the list.

    The division and subdivision relationship is retrieved and placed in xml files. These files will be used by the end user to choose appropriate belief and sector.

    xml file displays the content in a tree structure. If there is an easier method to display classifications as a tree structure, that could be used.


    Misc



    Resolved issues





    Notes

    Hosted by www.Geocities.ws

    1