Design spec for Genealogy Database
Content
Aim
Functionality
Technology
Design
- Table design
- Class design
Web admin
Cost estimation
Issues
Aim
Main theme: To collect details of ancestors and current generation, so that, genealogy charts will be available online.
Supplementary idea: This database can be utilized for matrimonial search; ie with person info database, complete family background of a girl or a boy can be known.
Website functionality
The functionalities available in this website are listed here :
- Add/edit person records
- Search one or more person
- See the genealogy chart for a family
- Find relationship between 2 person
- Search bride/groom
- Match bride/groom by horoscope
- Bride/groom places a wish list (a list of items they would like to receive as marriage gift)
Technology
- Database holds the data of people and maintains relationship between them.
- Java beans are used to pull the data from the db.
- Jsp is used to present the data and accept user queries.
- Servlets mediate beans and jsps
- xml, xsl (if required)
Design
MVC model is adapted for implementing the web application. Jsp pages are used to display the records and collect data. servlets and beans are used in the back-stage to help this process.
Database table design
The main table, personTable, holds the records of individual. personTable and spouseTable stores the relationship between people.
The basic relationship stored are father, mother and spouse. With these three relations, we should be able to retrieve all other relatives.
matrimonialTable is used to place matrimonial info. The table horoscopeMatchingFactor determines the possibility of match between two horoscope.
50k to 100k people (records) are expected to be available in the personTable.
Here is the person detail -
create table personTable (
personId mediumint not null auto_increment,
name varchar(40),
initial varchar(10),
familiarName varchar(40),
sex enum ('m', 'f'),
birthDate date,
expireDate date,
fatherId mediumInt,
motherId mediumInt,
religionId mediumInt fk classificationTable,
sectorId mediumInt fk classificationTable,
education varchar(100),
profession varchar(100),
street varchar(40),
placeId mediumInt fk placeTable,
native mediumInt fk placeTable,
nationality mediumInt fk placeTable,
homePhone varchar(20),
workPhone varchar(20),
mobile varchar(20),
email varchar(70),
hideEmail enum ('0', '1'),
webpage varchar(100),
photo varchar(100),
bloodgroup enum('A+', 'A-' ...),
maritalStatus enum('single', 'married', 'divorce'),
rasi enum(0 - 12),
nakshatram enum(0-27),
hobbies varchar(200),
description varchar(500),
recordStatus enum('available', 'pending'),
primary key (personId)
);
// do we need a separate table or can we use varray in the personTable to hold list of spouse?
create table spouseTable (
husbandId mediumInt,
wifeId mediumInt,
marriageDate date,
primary key (husbandId, wifeId)
);
create table matrimonialTable (
matriId mediumInt,
personId mediumInt,
height int,
weight int,
complexion enum('fair', 'medium', 'black'),
smoking enum('never', 'sometimes', 'often'),
drinking enum('never', 'sometimes', 'often'),
description varchar(500),
expectation varchar(500),
primary key matriId
);
HoroscopeMatchingFactor determines matching criteria of all possible combinations of rasi and nakshatram. This table will be useful for searching matrimonial table based on good horoscope match for a particular horoscope.
create table HoroscopeMatchingFactor (
girlRasi enum(0-12),
girlNakshatram enum(0-27),
boyRasi enum(0-12),
boyNakshatram enum(0-27),
matchingFactor float,
primary key (girlRasi, girlNakshatram, boyRasi, boyNakshatram)
)
HoroscopeMatchingFactor table can have 2 indexes based on (girlRasi, girlNakshatram) and (boyRasi, boyNakshatram)
As per social norms, person can be classified based on different belief and sectors (religion, caste, sub divisions). These divisions are maintained in an auxiliary table with the following structure.
create table classificationTable (
divisionId mediumInt,
baseDivisionId mediumInt,
description varchar(100),
divisionType enum('belief', 'sector'),
primary key divisionId
);
Two divisionId (one for belief and one for sector) alone will be stored in the person record.
Using this id, other divisions (parent, sub divisions) can be retrieved from the classificationTable.
The city, state, country names are stored in placeTable.
create table placeTable (
placeId mediumInt,
basePlaceId mediumInt,
pincode varchar(10),
placeType enum('country', 'state', 'city', 'town', 'village', 'area'),
primary key placeId
);
Class design
Person is a java bean, which is vital object for all operations.
Person bean is filled with the data from the database.
Most of the search are done with the primary key (personId) lookup.
Instead of having the attributes of a person (like name, addr etc) as separate variables, we can have them in a hashtable. Pros and cons are discussed in the issue section.
Person bean has all the attribute related to a person. Note that zodiac sign and star (rasi and nakshatram) are available in a separate class Horoscope. Person class has Horoscope class.
These classes are accessed from servlets and jsp.
(TODO : yet to plan the jsp, servlets)
Planning to provide SessionManager which will hold a pool of
recently accessed person records.
(TODO : have to sketch other responsibilities of SessionManager)
class Person
{
insertPersonRecord()
{
}
updatePersonRecord()
{
}
deletePersonRecord()
{
}
retrievePersonRecord() // from db
{
}
retrievePersonName()
{
}
formatPersonRecordForDisplay()
{
}
formatPersonNameForDisplay()
{
}
protected int personId;
protected String name;
protected Hashtable personData; // all other info of a person
protected Horoscope horoscope;
}
class Horoscope
{
matchHoroscope(Horoscope otherHoroscope)
{
girlHoroscope = this;
boyHoroscope = otherHoroscope;
for(int i = 0; i < 10; i++)
{
call match[I](girlHoroscope, boyHoroscope);
}
return numberOfMatches, crispDesc, detailDesc;
}
// there r 10 match function.
// Each function returns a float indicating the matching
// criteria. Also these function place their comment into
// crispDesc & detailDesc.
match[0-9](Horoscope girlHoroscope, boyHoroscope)
{
}
private static String rasiName[] = { "Mesham", � };
private static String nakshtramName[] = {"Ashwini", �};
private byte rasi;
private byte nakshatram;
private float numberOfMatches = 0;
private String[] crispDesc, detailDesc;
}
class SessionManager
{
retrievePerson(int personId)
{
Person person = personPool.retrievePerson(personId);
if (person == null) // retrieve from db
person = Person.retrievePerson(personId);
return person;
}
PersonPool personPool;
}
// a container to hold recently accessed person records in this session
class PersonPool
{
retrievePerson(int personId)
{
}
addPerson(Person person)
{
// implement lru algorithm to handle person object
}
}
LogWriter is an utility class to note different log messages generated by various modules.
class LogWriter
{
int message(int messageLevel, int messageType, message)
{
/*
messageLevel is an indicator useful to consider to enter this log or not.
A global debugLevel is set with the number between 0 to 9
If messageLevel falls below debugLevel, then the message is entered (to the db)
messageLevel takes the following meaning :
- Error message
- Warnings
- Sql template
- Sql command
- Session state
*/
}
}
Web admin
Tasks for Web Admin
- Analyze the daily report generated by House keeping module and fix the issues
- Go thru ALL new/edited records and approve them
Activities
| Time Frame |
Duration |
Action |
Cost |
Remarks |
| asap |
|
Register Domain name |
$30 |
register thru dotdnr.com or webhost? |
| First of Feb |
1 month or more |
Use trial period |
|
|
| Mid March |
1 or 2 months |
Register for economy pack |
$7 per month |
|
| Mid May |
|
Register for business pack |
$20 per month |
Upgrade based on traffic and data growth |
Web hosts chart
This application's requirement is
- Jsp 1.2
- Database (Oracle preferable; great with 9i)
Since hosting with Oracle is beyond my budget, considering Postgresql
MySql is not suitable; due to lack of subqueries, triggers, stored proc, transaction ...
- Unix preferable
- 75 to 100MB space
- Extras like xml parser, struts preferred
Details collected from Jan 10th, 2003 to Jan 16th, 2003
| Host |
Cost $ setup / monthly |
Facilities |
Other Plans |
Description |
| webindia.com |
0 / 25 |
- Redhat Linux 7.0
- 50 MB
- 1 GB transfer per month
- 100 MB bandwidth
- OC 48
- Jdk 1.3 - shared
- Apache http 1.3.27 - shared
- Tomcat 4.0.1 (Jsp 1.2) - shared
- PostgreSql
- 10 Pop3 id
|
- $10 - less space
- $33 - more space
|
- Indian site
- Have a close friend
- No control panel
- No lib like struts, cocoon, log4j
|
| myservlethosting.com |
0 / 60 |
- Redhat Linux 7.x
- 75 MB
- 4 GB bandwidth
- 96 MB Ram
- Jdk 1.3
- Jsp 1.2
- Cocoon, log4j, struts
- Postgresql 7
- OC 3 connection
|
- $25 - 50 MB, shared jvm, engine
- $90 - more space
|
|
| mmaweb.net |
50 / 50 |
- Solaris
- 200 MB
- 10 GB monthly data transfer
- 80 MB Ram
- Jsp
- Cocoon
- Postgresql
- OC 3 multiple connection
|
- $25 - 100 MB, 50 MB Ram, no Postgresql
- $100 - more space
|
- Looks good among Postgresql services
- Got a positive feedback from Bear Bibeault, ranch-mate at javaranch.com
|
| verio.com |
50 / 200 |
- Solaris
- 400MB
- Jsp
- Postgresql
- Oracle Gateway (?? what is this?)
|
Many plans ranging from $25 to $375
|
- Number one web host
- Seems costly
|
| +250 / +250 |
|
+$400 or +$600 for Oracle with more connection and space |
- Have to pay separately for Oracle
|
| ejip.net |
135 / 135 |
|
Price varies when u choose different jsp container, db, email type etc
|
- $20 pm extra for web mail
- Many such service are available at additional cost
- Should have already or buy Oracle 8i production license from them. cost?
- Free trial account for 20 days
|
| 95 / 95 |
|
$105 for MS Sql Server 2k |
|
| wantjava.com |
150 / 75 |
|
|
|
| +100 / +100 |
|
|
|
| eqhost.com |
50 / 50 |
- Linux
- 150 MB
- unlimited data transfer
- Mysql, MS sql
- Shared OC48 (2.56Gbps) Internet access node
|
- Service start as low as $10
|
|
Criteria for a good Webhost
Here are some points to consider while looking for a webhost.
- Web space
- File space to hold html, script, class, image files
- Database space
- Runtime memory (How to measure?)
Are these size specification flexible?
- Deploy War file
- ftp, telnet facility
- scalability
- Popularity of the host
- Check the other sites in the host server
- If possible, get feedback from few site owners
- What is the load of the server? How many sites can it handle?
- Accounts
- Number of ip addr (port) provided
- Number of domains and sub domains
- Number of schemas (users?) allowed in the database
- Emails (pop, mime), forwarding, web access
- Backup, recovery procedure
- Security
- Software version
- OS - Unix blend preferable
- Jdk, jsp
- Web server
- Database
- Other api
- Bandwidth
- Traffic volume
- What is the upper limit?
- What happens when the limit is crossed?
- Do we have to pay more for the extra traffic?
- How easy to upgrade to higher options?
- 24/7 hours reliability
- Auditing, statistical analyses of nature and behavior of users
- Technical support
- Toll free number
- Quick response for email
- Online help, user guide
- The big question. Cost
- Setup cost (the site or individual component)
- Any hidden cost?
- Can we get a trial period (1 month) for the web space?
This will be useful to know the capabilities of the site before entering into a long term contract.
Development environment
The development environment I have in my local machine is as follows :
- java 1.3
- Tomcat 4.1.12
- Struts - MVC framework from apache.org
- Oracle 8 - Trying for a recent version (>= Oracle 8i)
- Have to download PostgreSql
- Ant 1.5.1 - a java based build tool from Apache
- Eclipse - a java IDE eclipse.org
- CoffeeCup - a html editor
Cost estimation
Investment |
| Article |
Amount (in Rs) |
Description |
| Books |
2000 |
Got the books
- Professional JSP
- Beginning XML
- Oracle Bible
Planning to buy couple of books of this nature (bpbpub.com)
- 81 87105 86 0 Creating Commercial webpages
- 81 7635 046 X Dynamic Html unleashed
- 81 7656 271 8 XHTML
- 81 7656 306 4 Advanced Html with DHtml
- 81 7656 274 2 Commercial Application
|
| Printing |
1000 |
- Letter pad
- Visiting card
- ...
|
| Hard disk |
???? |
Not sure if a hard disk is needed in the local machine.
If needed, it may cost around 5k for 40GB
|
| Setup fee |
1 month charge |
Most sites charge a setup fee |
| Misc |
2000 |
|
Running cost (per month) |
| Article |
Amount (in Rs) |
Description |
| Domain name |
250 |
|
| Web space |
1500 |
Oracle cost seems very high; around $250 pm.
Looking for other db like postgresql
|
| Internet/Power |
1500 |
|
| Domain name promotion |
2500 |
- Cable TV advt
- Stickers in cafes
- Notice in newspapers
Expecting this cost will be cut in 3 to 6 months.
|
| Salary |
???? |
Expecting people from different fields like :
- Marketing, sales
- Data collection
- Web admin
|
Issues
Hashing
Ideally, Person class should be a entity bean mapping to one record of the PersonTable.
I am thinking about a lazy way to code Person class.
Instead of having all the field names as an attribute(in a separate variable), we can have one hashtable which will hold
all the name, value pairs.
Advantage:
- Ease of coding and maintenance
- Saves lots of space, because expecting most of the fields to be null.
These null fields will not be in the hash.
- Modifying the table will affect only the xsl file
(or any presentation layer)
Disadvantage:
- Accessing the attributes of the Person class will be 2 or 3 times slower than directly accessing a variable.
- Space for the field name stored along with actual value should be considered
A question concerning the field names in the hashtable. Let us say that there are 100 Person
objects in the jvm at a particular time. Each object has a hashtable which has almost same set
of field names (we can expect all of the objects having many common fields).
Take the scenario: there are 15 fields in each hash, out of which 10 are common to all the 100 hashtables.
Will the jvm hold 15 * 100 strings (field names) in the memory or will it hold say 15 to 100 strings
in the memory and use some indexing mechanism in the hashtable ...
Finding the relationship
A person record has three basic relationships : father, mother and a list of spouse.
We can view the table as a graph structure, each person as a node in the graph.
Our aim is to traverse through the graph (breadth first search) and find a path between two nodes.
Instead of viewing as a single graph, it will be easy to picturize them as a collection of well connected graphs. Nodes in a graph has a field to mention which group they belong. This field will help us to take a quick guess on the feasibility of finding a path. However, we take this info just as a suggestion; irrespective of the field suggesting positive or negative about the path, we explore the graph to find the path.
Is this field essential? Apart from space for such field in each record, we have to code for maintaining the definition. ie, when a new node enters the system, we have to check if different groups can be clubbed together through this new node. Also, deletion of a node should be considered for splitting groups.
A temp table can be used to do the search. The nodes visited
is placed in the temp table. Traversing the graph takes place parallely from both the direction, ie, from person 1 and person 2.
- Can we implement this logic in plsql?
- Do we require java stored proc?
- Will these proc handle multi-thread?
- Do we have to use different temp table names?
The idea of navigation is simple. Get the relative's id and put them in the table, until a path is found. Is there any possibility of "heuristic" search?
The above logic will fetch one path. How can we find other genuine paths between those 2 nodes? ie the new path shouldn't be a slight deviation from the first path; instead it should imply a new relationship between the nodes. Note that some of the nodes may be common between two different paths.
Client side processing
Instead of session level record pooling, we can try sending xml files to the client. xsl will be used at client machine to do the presentation in the browser.
Advantage
-
server has less work of formatting the presentation. The load is shifted to client which handles the xsl file.
-
Once the xml doc (file?) is available to the client, that particular document (record) can be reused and displayed with different xsl files. In my application, I thought of designing a record pooling mechanism at session level, so that recently accessed records will be available immediately without the need to refer the db. But the same mechanism will be available if recent xml doc will be available at client machine and there is a mechanism to reuse them.
Disadvantage
- Overhead cost of converting data into xml doc etc. Read in the book that databases can produce pure xml result sets. Have to explore that.
- Browser support for xml, xsl. Have to study different browsers� (versions) functionality and mention which are the browsers suitable for the site.
- Major disadvantage : When a record is displayed in a page, reference to more than 3 records are given as a link. In order to display the text for that link, there should be as many client request (for those xml doc) as there r links. This repeated client to server transaction will not occur if the jsp formats the presentation instead of client side xsl.
Data entry
The webserver is expected to hold the main database.
When there is a need to enter bulk data, it will be wise (in India) to enter them offline. (Recent trend - in cities, dishnet has come up with a scheme for unlimited direct net access without using phone line. If we have such connection, then we don't have to worry about this cost.)
For bulk data entry, a local database is provided with facility to enter and upload records.
The PersonId may be different for a record going from local to main db. The transferring code changes all the PersonId columns in the local db to that of main db and then transfers the records. This will ensure the id consistency between the 2 db. After uploading, the data collector will be prompted to delete the uploaded records (of the local db) or restore them.
Note that the local db will hold a small fraction of the main db's PersonTable records.
Important point to remember : only the personTable can be uploaded from local to main db. None of the other table will be uploaded. So, the data collector is not expected to add items to other tables of the local db. Instead, she/he should modify the main database and then synchronize the local db with that of main.
This logic seems cumbersome. If there is a way to avoid this
functionality, I would like to do that. So, I will try to
get a unlimited connection (without using phone). Or collect
the data in xml format (or which ever format is easy to type) and then try to upload the data to the main db using some
automated/semi-automated process.
House keeping
Periodical check of the database is required to ensure the validity and correctness of the entered data. This operation can be done in a semi-automatic fashion.
- For each new/edited person record, see if there is a similar record
- Send a report about new records and other relevant details to the web admin
Choosing the classification
We have numerous sectors, believes, and sub divisions (religion, caste, subdivisions etc) to which a person belongs.
Here is a way to display that list and allow the user to choose out of the list.
The division and subdivision relationship is retrieved and placed in xml files. These files will be used by the end user to choose appropriate belief and sector.
xml file displays the content in a tree structure. If there is an easier method to display classifications as a tree structure, that could be used.
Misc
- In the personTable, can I use varray to hold spouse Ids? This will avoid a separate spouseTable.
- What is the webspace we need for this application?
- Is it legal, ethical to mention dowry range, so that matrimonial search can be done based on it?
Resolved issues
- What size should the personId be? (mediumInt or longInt ?)
- In the personTable, I am expecting most of the fields to be null for most of the records (fields like webpage, email, addr etc). Will storage be utilization be optimal in such case or is it advisable to have two tables (first table with vital info & auxiliary table with sparsial data)?
- Ram: There will be minor a difference if you split the tables and the aux table will not have a row if all the cols in it are null. But it is not worth splitting the table. Go ahead with one if you expect only 50k to 100k rows.
- Will the table be indexed on primary key automatically or do we have to do that?
- Ram: Table will be indexed on PK automatically. But you may want to add additional indexes based on the queried cols. e.g. Religion, age etc. this will improve the performance.
- What is index-only table?
- Ram: IOTs are special kind of tables where data and index entries are stored in the same segment. (Normal tables will have different data and index segments). IOTs improve query performance. Generally used in look up tables which does not involve high dml activity but get large number of queries. If you have many dmls on this table the performance may get affected.
Notes
- HashMap seems to be suitable for holding the person detail
- LinkedList is ideal to implement lru algorithm for person pool