Design spec for Genealogy Database
| Version
| 1.3 |
| Date |
1st Dec 02 |
| Author |
Jeya Balaji |
| What is new |
Most of the sections went through considerable changes.
|
| Earlier version |
genealogy12.html |
| Recent version |
genealogy14.html |
Content
- Aim
- Functionality
- Technology
- Design
- Table design
- Class design
- Web admin
- Cost estimation
- Issues
Aim
Main theme: To collect details of ancestors and current generation, so that, genealogy charts will be available online.
Supplementary idea: This database can be utilized for matrimonial search; ie with person info database, complete family background of a girl or a boy can be known.
Website functionality
The functionalities available in this website are listed here :
- Add/edit person records
- Search one or more person
- See the genealogy chart for a family
- Find relationship between 2 person
- Search bride/groom
- Match bride/groom by horoscope
Technology
- Oracle database holds the data of people and maintains relationship between them.
- Java beans are used to pull the data from the db.
- Jsp is used to present the data and accept user queries.
- Servlets mediate beans and jsps
- xml, xsl (if required)
Design
MVC model is adapted for implementing the web application. Jsp pages are used to display the records and collect data. servlets and beans are used in the back-stage to help this process.
Database table design
The main table, personTable, holds the records of individual. personTable and spouseTable stores the relationship between people.
The basic relationship stored are father, mother and spouse. With these three relations, we should be able to retrieve all other relatives.
matrimonialTable is used to place matrimonial info. The table horoscopeMatchingFactor determines the possibility of match between two horoscope.
50k to 100k people (records) are expected to be available in the personTable.
Here is the person detail -
create table personTable (
personId mediumint not null auto_increment,
name varchar(40),
initial varchar(10),
familiarName varchar(40),
sex enum ('m', 'f'),
birthDate date,
expireDate date,
fatherId mediumInt,
motherId mediumInt,
religionId mediumInt fk classificationTable,
sectorId mediumInt fk classificationTable,
education varchar(100),
profession varchar(100),
street varchar(40),
placeId mediumInt fk placeTable,
native mediumInt fk placeTable,
homePhone varchar(20),
workPhone varchar(20),
mobile varchar(20),
email varchar(70),
hideEmail enum ('0', '1'),
webpage varchar(100),
photo varchar(100),
bloodgroup enum('A+', 'A-' ...),
maritalStatus enum('single', 'married', 'divorce'),
rasi enum(0 - 12),
nakshatram enum(0-27),
hobbies varchar(200),
description varchar(500),
recordStatus enum('available', 'pending'),
primary key (personId)
);
// do we need a separate table or can we use varray in the personTable to hold list of spouse?
create table spouseTable (
husbandId mediumInt,
wifeId mediumInt,
marriageDate date,
primary key (husbandId, wifeId)
);
create table matrimonialTable (
matriId mediumInt,
personId mediumInt,
height int,
weight int,
complexion enum('fair', 'medium', 'black'),
smoking enum('never', 'sometimes', 'often'),
drinking enum('never', 'sometimes', 'often'),
description varchar(500),
expectation varchar(500),
primary key matriId
);
HoroscopeMatchingFactor determines matching criteria of all possible combinations of rasi and nakshatram. This table will be useful for searching matrimonial table based on good horoscope match for a particular horoscope.
create table HoroscopeMatchingFactor (
girlRasi enum(0-12),
girlNakshatram enum(0-27),
boyRasi enum(0-12),
boyNakshatram enum(0-27),
matchingFactor float,
primary key (girlRasi, girlNakshatram, boyRasi, boyNakshatram)
)
HoroscopeMatchingFactor table can have 2 indexes based on (girlRasi, girlNakshatram) and (boyRasi, boyNakshatram)
As per social norms, person can be classified based on different belief and sectors (religion, caste, sub divisions). These divisions are maintained in an auxiliary table with the following structure.
create table classificationTable (
divisionId mediumInt,
baseDivisionId mediumInt,
description varchar(100),
divisionType enum('belief', 'sector'),
primary key divisionId
);
Two divisionId (one for belief and one for sector) alone will be stored in the person record.
Using this id, other divisions (parent, sub divisions) can be retrieved from the classificationTable.
The city, state, country names are stored in placeTable.
create table placeTable (
placeId mediumInt,
basePlaceId mediumInt,
pincode varchar(10),
placeType enum('country', 'state', 'city', 'town', 'village', 'area'),
primary key placeId
);
Class design
Person is a java bean, which is vital object for all operations.
Person bean is filled with the data from the database.
Most of the search are done with the primary key (personId) lookup.
Instead of having the attributes of a person (like name, addr etc) as separate variables, we can have them in a hashtable. Pros and cons are discussed in the issue section.
Person bean has all the attribute related to a person. Note that zodiac sign and star (rasi and nakshatram) are available in a separate class Horoscope. Person class has Horoscope class.
These classes are accessed from servlets and jsp.
(TODO : yet to plan the jsp, servlets)
Planning to provide SessionManager which will hold a pool of
recently accessed person records.
(TODO : have to sketch other responsibilities of SessionManager)
class Person
{
insertPersonRecord()
{
}
updatePersonRecord()
{
}
deletePersonRecord()
{
}
retrievePersonRecord() // from db
{
}
retrievePersonName()
{
}
formatPersonRecordForDisplay()
{
}
formatPersonNameForDisplay()
{
}
protected int personId;
protected String name;
protected Hashtable personData; // all other info of a person
protected Horoscope horoscope;
}
class Horoscope
{
matchHoroscope(Horoscope otherHoroscope)
{
girlHoroscope = this;
boyHoroscope = otherHoroscope;
for(int i = 0; i < 10; i++)
{
call match[I](girlHoroscope, boyHoroscope);
}
return numberOfMatches, crispDesc, detailDesc;
}
// there r 10 match function.
// Each function returns a float indicating the matching
// criteria. Also these function place their comment into
// crispDesc & detailDesc.
match[0-9](Horoscope girlHoroscope, boyHoroscope)
{
}
private static String rasiName[] = { "Mesham", � };
private static String nakshtramName[] = {"Ashwini", �};
private byte rasi;
private byte nakshatram;
private float numberOfMatches = 0;
private String[] crispDesc, detailDesc;
}
class SessionManager
{
retrievePerson(int personId)
{
Person person = personPool.retrievePerson(personId);
if (person == null) // retrieve from db
person = Person.retrievePerson(personId);
return person;
}
PersonPool personPool;
}
// a container to hold recently accessed person records in this session
class PersonPool
{
retrievePerson(int personId)
{
}
addPerson(Person person)
{
// implement lru algorithm to handle person object
}
}
Web admin
Tasks for Web Admin
- Analyze the daily report generated by House keeping module and fix the issues
- Go thru ALL new/edited records and approve them
Criteria for a good Webhost
Here are some points to consider while looking for a webhost.
- Web space
- File space to hold html, script, class files
- Database space
- Runtime memory
Are these size specification flexible?
- Number of ip addr (port) provided
- Number of schemas (users?) allowed in the database
- Backup, recovery procedure
- Software version
- Jdk, jsp
- Web server
- Database
- Other api
- Bandwidth, scalability
- Statistical analyses of nature and behavior of users
Development environment
The development environment I have in my local machine is as follows :
- java 1.3
- Tomcat 4.0
- Oracle 8 - Trying for a recent version (>= Oracle 8i)
- Ant 1.5.1 - a java based build tool from Apache
- JCreater - a java IDE from Xinox
- CoffeeCup - a html editor
- Winston Veerender, a friend, suggested Eclipse from www.eclipse.org as a java IDE; Will try that if it is more suitable than JCreater for this application.
Cost estimation
Investment |
| Article |
Amount (in Rs) |
Description |
| Books |
2000 |
Got the books
- Professional JSP
- Beginning XML
- Oracle Bible
Planning to buy couple of books of this nature (bpbpub.com)
- 81 87105 86 0 Creating Commercial webpages
- 81 7635 046 X Dynamic Html unleashed
- 81 7656 271 8 XHTML
- 81 7656 306 4 Advanced Html with DHtml
- 81 7656 274 2 Commercial Application
|
| Printing |
1000 |
- Letter pad
- Visiting card
- ...
|
| Hard disk |
???? |
Not sure if a hard disk is needed in the local machine.
If needed, it may cost around 5k for 40GB
|
| Misc |
2000 |
|
Running cost (per month) |
| Article |
Amount (in Rs) |
Description |
| Domain name |
250 |
|
| Web space |
1500 |
|
| Internet/Power |
1500 |
|
| Domain name promotion |
2500 |
- Cable TV advt
- Stickers in cafes
- Notice in newspapers
Expecting this cost will be cut in 3 to 6 months.
|
| Salary |
???? |
Expecting people from different fields like :
- Marketing, sales
- Data collection
- Web admin
|
Issues
Hashing
Ideally, Person class should be a entity bean mapping to one record of the PersonTable.
I am thinking about a lazy way to code Person class.
Instead of having all the field names as an attribute(in a separate variable), we can have one hashtable which will hold
all the name, value pairs.
Advantage:
- Ease of coding and maintenance
- Saves lots of space, because expecting most of the fields to be null.
These null fields will not be in the hash.
- Modifying the table will affect only the xsl file
(or any presentation layer)
Disadvantage:
- Accessing the attributes of the Person class will be 2 or 3 times slower than directly accessing a variable.
- Space for the field name stored along with actual value should be considered
A question concerning the field names in the hashtable. Let us say that there are 100 Person
objects in the jvm at a particular time. Each object has a hashtable which has almost same set
of field names (we can expect all of the objects having many common fields).
Take the scenario: there are 15 fields in each hash, out of which 10 are common to all the 100 hashtables.
Will the jvm hold 15 * 100 strings (field names) in the memory or will it hold say 15 to 100 strings
in the memory and use some indexing mechanism in the hashtable ...
Finding the relationship
A person record has three basic relationships : father, mother and a list of spouse.
We can view the table as a graph structure, each person as a node in the graph.
Our aim is to traverse through the graph (breadth first search) and find a path between two nodes.
Instead of viewing as a single graph, it will be easy to picturize them as a collection of well connected graphs. Nodes in a graph has a field to mention which group they belong. This field will help us to take a quick guess on the feasibility of finding a path. However, we take this info just as a suggestion; irrespective of the field suggesting positive or negative about the path, we explore the graph to find the path.
Is this field essential? Apart from space for such field in each record, we have to code for maintaining the definition. ie, when a new node enters the system, we have to check if different groups can be clubbed together through this new node. Also, deletion of a node should be considered for splitting groups.
A temp table can be used to do the search. The nodes visited
is placed in the temp table. Traversing the graph takes place parallely from both the direction, ie, from person 1 and person 2.
- Can we implement this logic in plsql?
- Do we require java stored proc?
- Will these proc handle multi-thread?
- Do we have to use different temp table names?
The idea of navigation is simple. Get the relative's id and put them in the table, until a path is found. Is there any possibility of "heuristic" search?
The above logic will fetch one path. How can we find other genuine paths between those 2 nodes? ie the new path shouldn't be a slight deviation from the first path; instead it should imply a new relationship between the nodes. Note that some of the nodes may be common between two different paths.
Data entry
The webserver is expected to hold the main database.
When there is a need to enter bulk data, it will be wise (in India) to enter them offline. (Recent trend - in cities, dishnet has come up with a scheme for unlimited direct net access without using phone line. If we have such connection, then we don't have to worry about this cost.)
For bulk data entry, a local database is provided with facility to enter and upload records.
The PersonId may be different for a record going from local to main db. The transferring code changes all the PersonId columns in the local db to that of main db and then transfers the records. This will ensure the id consistency between the 2 db. After uploading, the data collector will be prompted to delete the uploaded records (of the local db) or restore them.
Note that the local db will hold a small fraction of the main db's PersonTable records.
Important point to remember : only the personTable can be uploaded from local to main db. None of the other table will be uploaded. So, the data collector is not expected to add items to other tables of the local db. Instead, she/he should modify the main database and then synchronize the local db with that of main.
This logic seems cumbersome. If there is a way to avoid this
functionality, I would like to do that. So, I will try to
get a unlimited connection (without using phone). Or collect
the data in xml format (or which ever format is easy to type) and then try to upload the data to the main db using some
automated/semi-automated process.
House keeping
Periodical check of the database is required to ensure the validity and correctness of the entered data. This operation can be done in a semi-automatic fashion.
- For each new/edited person record, see if there is a similar record
- Send a report about new records and other relevant details to the web admin
Choosing the classification
We have numerous sectors, believes, and sub divisions (religion, caste, subdivisions etc) to which a person belongs.
Here is a way to display that list and allow the user to choose out of the list.
The division and subdivision relationship is retrieved and placed in xml files. These files will be used by the end user to choose appropriate belief and sector.
xml file displays the content in a tree structure. If there is an easier method to display classifications as a tree structure, that could be used.
Misc
- In the personTable, can I use varray to hold spouse Ids? This will avoid a separate spouseTable.
- What is the webspace we need for this application?
- Is it legal, ethical to mention dowry range, so that matrimonial search can be done based on it?
Resolved issues
- What size should the personId be? (mediumInt or longInt ?)
- In the personTable, I am expecting most of the fields to be null for most of the records (fields like webpage, email, addr etc). Will storage be utilization be optimal in such case or is it advisable to have two tables (first table with vital info & auxiliary table with sparsial data)?
- Ram: There will be minor a difference if you split the tables and the aux table will not have a row if all the cols in it are null. But it is not worth splitting the table. Go ahead with one if you expect only 50k to 100k rows.
- Will the table be indexed on primary key automatically or do we have to do that?
- Ram: Table will be indexed on PK automatically. But you may want to add additional indexes based on the queried cols. e.g. Religion, age etc. this will improve the performance.
- What is index-only table?
- Ram: IOTs are special kind of tables where data and index entries are stored in the same segment. (Normal tables will have different data and index segments). IOTs improve query performance. Generally used in look up tables which does not involve high dml activity but get large number of queries. If you have many dmls on this table the performance may get affected.