Presents your SQL SERVER E-NEWSLETTER for December 9, 2003 <-------------------------------------------> A CASE FOR TAKING YOUR DATABASE DESIGN TO 4NF Most normalization texts describe First Normal Form (1NF), Second Normal Form (2NF), and Third Normal Form (3NF), and only a few texts go beyond this level to Boyce-Codd Normal Form (BCNF) and Fourth Normal Form (4NF). Almost every text that discusses normalization suggests that 3NF is enough--I disagree, and I'll explain why. Let's start by defining BCNF and 4NF. BCNF A relation is in BCNF if every determinant is a candidate key; that is, if every attribute is unique. A table in 3NF is usually also in BCNF. An example of a table that isn't in BCNF is one that contains interdependent attributes. This means that information is being provided that is unnecessary. Here's an example: CREATE TABLE Enrollments EnrollmentID int IDENTITY( 1, 1), CourseID int, CourseName varchar(100), StudentID int, StudentName varchar(100), EnrollmentDate DateTime In this example, the CourseName column depends on the CourseID, and, similarly, the StudentName column depends on the StudentID. This is fixed by turning the single table into three tables: Students, Courses, and Enrollments. 4NF In order to be in 4NF, a table must be in BCNF. In addition, only one column may be multi-valued. There are numerous examples of tables that aren't in 4NF. Here's an example: CREATE TABLE Addresses AddressID int IDENTITY( 1, 1), StreetAddress varchar(100), CityID int, StateID int, CountryID int We'll look at what's wrong with this design. Assume that the last three columns reference the tables Cities, States, and Countries. In this design, it becomes possible to select the following items (resolving the foreign keys): StreetAddress = '123 Main Street' CityID = "London" StateID = "New York" CountryID = "Tanzania" There are a number of cities called London in the world, but none of them are in New York, and New York isn't in Tanzania. You can prevent such nonsensical data either by writing a lot of front-end code to handle the related lookups or by taking the design to 4NF. This would mean moving the CountryID column out of the Addresses table and into the States table, and moving the StateID column out of the Addresses table and into the Cities table. To make it clear to your users which city they're selecting, you might write a query that concatenates the Name columns from all three tables, so that a given choice looks like this: London, Ontario, Canada Taking your database design to 4NF may seem like more effort than it's worth, but I've seen countless examples of nonsense data entered into tables that are merely 3NF. Depending on the application, this failure can be costly. Arthur Fuller has been developing database applications for 20 years. His experience includes Access ADPs, Microsoft SQL 2000, MySQL, and .NET. ----------------------------------------