Design spec for Genealogy Database

Version 1.3
Date 1st Dec 02
Author Jeya Balaji
What is new Most of the sections went through considerable changes.
Earlier version genealogy12.html
Recent version genealogy14.html



Content

  1. Aim
  2. Functionality
  3. Technology
  4. Design
    1. Table design
    2. Class design
  5. Web admin
  6. Cost estimation
  7. Issues



Aim

Main theme: To collect details of ancestors and current generation, so that, genealogy charts will be available online.

Supplementary idea: This database can be utilized for matrimonial search; ie with person info database, complete family background of a girl or a boy can be known.


Website functionality

The functionalities available in this website are listed here :


Technology




Design

MVC model is adapted for implementing the web application. Jsp pages are used to display the records and collect data. servlets and beans are used in the back-stage to help this process.


Database table design

The main table, personTable, holds the records of individual. personTable and spouseTable stores the relationship between people.
The basic relationship stored are father, mother and spouse. With these three relations, we should be able to retrieve all other relatives.

matrimonialTable is used to place matrimonial info. The table horoscopeMatchingFactor determines the possibility of match between two horoscope.


50k to 100k people (records) are expected to be available in the personTable.
Here is the person detail -

create table personTable (
	personId     mediumint not null auto_increment,
	name         varchar(40),
	initial      varchar(10),
	familiarName varchar(40),
	sex          enum ('m', 'f'),
	birthDate    date,
	expireDate   date,

	fatherId     mediumInt,
	motherId     mediumInt,

	religionId   mediumInt   fk classificationTable,
	sectorId     mediumInt   fk classificationTable,		 

	education    varchar(100),
	profession   varchar(100),

	street       varchar(40),
	placeId      mediumInt   fk placeTable,
	native       mediumInt   fk placeTable,

	homePhone    varchar(20),
	workPhone    varchar(20),
	mobile       varchar(20),


	email        varchar(70),
	hideEmail    enum ('0', '1'),
	webpage      varchar(100),
	photo        varchar(100),

	bloodgroup   enum('A+', 'A-' ...),
      
	maritalStatus enum('single', 'married', 'divorce'),
	rasi          enum(0 - 12),
	nakshatram    enum(0-27),


	hobbies      varchar(200),
	description  varchar(500),

	recordStatus enum('available', 'pending'),
     
	primary key (personId)
);

// do we need a separate table or can we use varray in the personTable to hold list of spouse?
create table spouseTable (
	husbandId    mediumInt,
	wifeId       mediumInt,
	marriageDate date,

	primary key (husbandId, wifeId)
);

create table matrimonialTable (
	matriId     mediumInt,
	personId    mediumInt,
	
	height      int,
	weight      int,
	
	complexion  enum('fair', 'medium', 'black'),
	smoking     enum('never', 'sometimes', 'often'),
	drinking    enum('never', 'sometimes', 'often'),
	
	description varchar(500),
	expectation varchar(500),

	primary key matriId
);




HoroscopeMatchingFactor determines matching criteria of all possible combinations of rasi and nakshatram. This table will be useful for searching matrimonial table based on good horoscope match for a particular horoscope.
create table HoroscopeMatchingFactor (
	girlRasi          enum(0-12),
	girlNakshatram    enum(0-27),
	boyRasi           enum(0-12),
	boyNakshatram     enum(0-27),
	matchingFactor    float,

	primary key   (girlRasi, girlNakshatram, boyRasi, boyNakshatram)
)
HoroscopeMatchingFactor table can have 2 indexes based on (girlRasi, girlNakshatram) and (boyRasi, boyNakshatram)


As per social norms, person can be classified based on different belief and sectors (religion, caste, sub divisions). These divisions are maintained in an auxiliary table with the following structure.
create table classificationTable (
	divisionId      mediumInt,
	baseDivisionId  mediumInt,
	description     varchar(100),
	divisionType    enum('belief', 'sector'),

	primary key divisionId
);
Two divisionId (one for belief and one for sector) alone will be stored in the person record. Using this id, other divisions (parent, sub divisions) can be retrieved from the classificationTable.


The city, state, country names are stored in placeTable.
create table placeTable (
	placeId         mediumInt,
	basePlaceId     mediumInt,
	pincode         varchar(10),
	placeType       enum('country', 'state', 'city', 'town', 'village', 'area'),

	primary key placeId
);




Class design

Person is a java bean, which is vital object for all operations. Person bean is filled with the data from the database.
Most of the search are done with the primary key (personId) lookup.

Instead of having the attributes of a person (like name, addr etc) as separate variables, we can have them in a hashtable. Pros and cons are discussed in the issue section.

Person bean has all the attribute related to a person. Note that zodiac sign and star (rasi and nakshatram) are available in a separate class Horoscope. Person class has Horoscope class.

These classes are accessed from servlets and jsp.
(TODO : yet to plan the jsp, servlets)

Planning to provide SessionManager which will hold a pool of recently accessed person records.
(TODO : have to sketch other responsibilities of SessionManager)


class Person
{
	insertPersonRecord()
	{ 
	}
	
	updatePersonRecord()
	{
	}

	deletePersonRecord()
	{

	}

		
	retrievePersonRecord()   // from db
	{

	}

	retrievePersonName()
	{

	}

	formatPersonRecordForDisplay()
	{

	}

	formatPersonNameForDisplay()
	{

	}

	protected int personId;
	protected String name;

	protected Hashtable personData;  // all other info of a person

	protected Horoscope horoscope;
}



class Horoscope
{
	

	matchHoroscope(Horoscope otherHoroscope)
	{
		girlHoroscope = this;
		boyHoroscope = otherHoroscope;
		for(int i = 0; i < 10; i++)
		{
			call match[I](girlHoroscope, boyHoroscope);
		}

		return numberOfMatches, crispDesc, detailDesc;
	}

	// there r 10 match function.  
	// Each function returns a float indicating the matching 
	// criteria.  Also these function place their comment into  
	// crispDesc & detailDesc.
	match[0-9](Horoscope girlHoroscope, boyHoroscope)
	{

	}

	private static String rasiName[] = { "Mesham", � };
	private static String nakshtramName[] = {"Ashwini", �};

	private byte rasi;
	private byte nakshatram;

	private float numberOfMatches = 0;
	private String[] crispDesc, detailDesc;
}


class SessionManager
{
 	retrievePerson(int personId)
	{
		Person person = personPool.retrievePerson(personId);
	  	if (person == null)  // retrieve from db
	  	 	person = Person.retrievePerson(personId);
	  	return person;
	}
	  
	PersonPool personPool;
}


// a container to hold recently accessed person records in this session
class PersonPool
{
	retrievePerson(int personId)
	{
	}
	 
	addPerson(Person person)
	{
	  	// implement lru algorithm to handle person object
	}
}






Web admin

Tasks for Web Admin

Criteria for a good Webhost

Here are some points to consider while looking for a webhost.



Development environment

The development environment I have in my local machine is as follows :



Cost estimation

Investment

Article Amount
(in Rs)
Description
Books 2000 Got the books
  • Professional JSP
  • Beginning XML
  • Oracle Bible

Planning to buy couple of books of this nature (bpbpub.com)
  • 81 87105 86 0 Creating Commercial webpages
  • 81 7635 046 X Dynamic Html unleashed
  • 81 7656 271 8 XHTML
  • 81 7656 306 4 Advanced Html with DHtml
  • 81 7656 274 2 Commercial Application
Printing 1000
  • Letter pad
  • Visiting card
  • ...
Hard disk ???? Not sure if a hard disk is needed in the local machine.
If needed, it may cost around 5k for 40GB
Misc 2000  

Running cost (per month)

Article Amount
(in Rs)
Description
Domain name 250  
Web space 1500  
Internet/Power 1500  
Domain name promotion 2500
  • Cable TV advt
  • Stickers in cafes
  • Notice in newspapers
Expecting this cost will be cut in 3 to 6 months.
Salary ???? Expecting people from different fields like :
  • Marketing, sales
  • Data collection
  • Web admin




Issues

Hashing

Ideally, Person class should be a entity bean mapping to one record of the PersonTable. I am thinking about a lazy way to code Person class.
Instead of having all the field names as an attribute(in a separate variable), we can have one hashtable which will hold all the name, value pairs.

Advantage:
Disadvantage:

A question concerning the field names in the hashtable. Let us say that there are 100 Person objects in the jvm at a particular time. Each object has a hashtable which has almost same set of field names (we can expect all of the objects having many common fields). Take the scenario: there are 15 fields in each hash, out of which 10 are common to all the 100 hashtables. Will the jvm hold 15 * 100 strings (field names) in the memory or will it hold say 15 to 100 strings in the memory and use some indexing mechanism in the hashtable ...


Finding the relationship

A person record has three basic relationships : father, mother and a list of spouse.
We can view the table as a graph structure, each person as a node in the graph.
Our aim is to traverse through the graph (breadth first search) and find a path between two nodes.


Instead of viewing as a single graph, it will be easy to picturize them as a collection of well connected graphs. Nodes in a graph has a field to mention which group they belong. This field will help us to take a quick guess on the feasibility of finding a path. However, we take this info just as a suggestion; irrespective of the field suggesting positive or negative about the path, we explore the graph to find the path.

Is this field essential? Apart from space for such field in each record, we have to code for maintaining the definition. ie, when a new node enters the system, we have to check if different groups can be clubbed together through this new node. Also, deletion of a node should be considered for splitting groups.

A temp table can be used to do the search. The nodes visited is placed in the temp table. Traversing the graph takes place parallely from both the direction, ie, from person 1 and person 2.

The idea of navigation is simple. Get the relative's id and put them in the table, until a path is found. Is there any possibility of "heuristic" search?

The above logic will fetch one path. How can we find other genuine paths between those 2 nodes? ie the new path shouldn't be a slight deviation from the first path; instead it should imply a new relationship between the nodes. Note that some of the nodes may be common between two different paths.



Data entry

The webserver is expected to hold the main database. When there is a need to enter bulk data, it will be wise (in India) to enter them offline. (Recent trend - in cities, dishnet has come up with a scheme for unlimited direct net access without using phone line. If we have such connection, then we don't have to worry about this cost.)

For bulk data entry, a local database is provided with facility to enter and upload records.

The PersonId may be different for a record going from local to main db. The transferring code changes all the PersonId columns in the local db to that of main db and then transfers the records. This will ensure the id consistency between the 2 db. After uploading, the data collector will be prompted to delete the uploaded records (of the local db) or restore them.

Note that the local db will hold a small fraction of the main db's PersonTable records.

Important point to remember : only the personTable can be uploaded from local to main db. None of the other table will be uploaded. So, the data collector is not expected to add items to other tables of the local db. Instead, she/he should modify the main database and then synchronize the local db with that of main.

This logic seems cumbersome. If there is a way to avoid this functionality, I would like to do that. So, I will try to get a unlimited connection (without using phone). Or collect the data in xml format (or which ever format is easy to type) and then try to upload the data to the main db using some automated/semi-automated process.




House keeping

Periodical check of the database is required to ensure the validity and correctness of the entered data. This operation can be done in a semi-automatic fashion.



Choosing the classification

We have numerous sectors, believes, and sub divisions (religion, caste, subdivisions etc) to which a person belongs.
Here is a way to display that list and allow the user to choose out of the list.

The division and subdivision relationship is retrieved and placed in xml files. These files will be used by the end user to choose appropriate belief and sector.

xml file displays the content in a tree structure. If there is an easier method to display classifications as a tree structure, that could be used.


Misc



Resolved issues

Hosted by www.Geocities.ws

1