Handout 13

Functional Dependencies and Normalization*


1 Informal Design Guidelines for Relational Databases

What is relational database design?

The formal concepts of functional dependencies and normal forms

1.1 Semantics of the Relation Attributes

Informally , each tuple should represent one entity or relationship instance.

1.2 Redundant information in Tuples And Update Anomalies

Mixing attributes of multiple entities may cause problems

Anomaly Examples

Relational database theory is interested in eliminating update anomalies from a database.

For example,

Student_Activity

SID       Activity       Fee

100       Skiing         200

150       Swimming       50

175       Squash         50

200       Swimming       50

Insertion Anomalies

Assume we would like to add a new activity: Softball costs $25. The problem is that in the above database we cannot add softball until someone signs up for softball. This is called an insertion anomaly.

Deletion Anomalies

Now assume that SID=175 drops out of the university so this row is deleted from the database. The problem is that we have now lost information about squash, also. This is called a deletion anomaly.

Modification Anomalies

Finally, assume that the price of swimming increases to $60. In this case, two tuples must be updated. It is possible to accidentally update only one tuple; then we would have inconsistency in the database. This is called a modification anomaly.

Better Representation

A better representation is:

Student

SID       Activity
100       Skiing
150       Swimming
175       Squash
200       Swimming

ActivityFees

Activity  Fee
Skiing    200
Swimming  50
Squash    50

We can do a natural join to get the original table back, plus we don't have the insertion, deletion and modification anomalies.

1.3 Null Values in Tuples

1.4 Spurious Tuples


2 Functional Dependencies

2.1 Definition of FD

Examples of FD constraints:

SSN -> ENAME

PNUMBER -> {PNAME , PLOCATION}

{SSN , PNUMBER} -> HOURS

An FD is a property of the attributes in the schema R

The constraint must hold on every relation instance r(R)


3 Normilization

3.1 Introduction to Normalization

3.2 First Normal Form

An Informal definition - First normal form requires that every datum (Table/Row/Column entry) must be "atomic". That is, as far as the database is concerned, it must not be a collection of items. The fact that the datum might have structure in some context outside the database is irrelevant. It can sometimes be tough to decide if a particular database design violate first normal form. Consider a database of literature references. An obvious decision is to have one field to contain the title of the reference. It is likely that you will want to search the database for individual words in that title. Does that mean that this is not an atomic value, because you are interested in parts of the title (words)? Most people would say "no", but why this is true is difficult to justify.

Unlike Second and Higher Normal Forms, which rest on formal analysis of a schema, enforcement of First Normal Form ought to be built in to a DBMS package. When it isn't, the enforcement must come from self-discipline on the part of the Database Designer. In this sense, First Normal Form is similar to the rules that require Primary Keys to be Unique and Not Null, and Secondary Keys to have a non-Null value in the Table to which they refer.

1-NF Example


3.3 Second Normal Form

Definition:

Examples:

An Informal definition - Consider a table with a compound primary key; e.g. a primary key that is made up by a comination of two or more columns. A database is in second normal form if no (non-key) column depends on part of the primary key; that you always need all of the primary key to specify the contents of all other columns.

Example 2-NF


3.4 Third Normal Form

Definition:

Y -> X and X -> Z

Examples:

An Informal definition - A database is in the third normal form if no column depends on any non-key column.

Example 3-NF


BCNF ( Boyce-Codd Normal Form)

Relations are in third normal form and there are no Boyce-Codd dependencies. A Boyce-Codd dependency is where a (part of the) key is dependent on a non-key attribute.

Example - BCNF

Suppose we want to keep track of our inventory based on location and product and we also want to keep track of which salespeople are selling which products and where they are located. Consider the following table:

 
LOCATION PRODUCT # SALESPERSON QTY REMAINING
NORTH A1

111

100

NORTH B1

222

500

SOUTH A1

333

60

SOUTH B1

333

50

SOUTH C1

444

150

NORTH C1

111

2000

Is the table in Third Normal Form? Yes, it is. All of the non-key attributes are dependent on the entire primary key. Are there any data anomalies associated with this table?

In order to solve the problems associated with this table we need to break it up. Begin by performing a dependency analysis. This analysis shows the following dependencies:

LOCATION, PRODUCT # -> SALESPERSON, QTY_REMAINING

SALESPERSON -> LOCATION

We therefore break up the table into two tables where every determinant (an attribute that determines the value of another attribute) in the new tables is a candidate key. By doing this we achieve Boyce-Codd Normal Form (BCNF).


Here are the results. Notice that the anomalies have been resolved.

LOCATION_PRODUCT  
     
LOCATION PRODUCT # QTY REMAINING
NORTH A1

100

NORTH B1

500

SOUTH A1

60

SOUTH B1

50

SOUTH C1

150

NORTH C1

2000

 

SALESPERSON_LOCATION
   
SALESPERSON LOCATION
111 NORTH
222 NORTH
333 SOUTH
444 SOUTH


Informal definition - Violations of Fourth Normal Form can be recognized by columns that "don't go together". It adheres to the rules of BNCF plus there are no multivalued dependencies.

Example - 4NF

Suppose we want to keep track of the classes a student takes that satisfy major requirements and elective requirements. We might build the following table:

 

STU_NUM MAJOR ELECTIVES
0001 MIS310  
0001 MIS201  
0001   LIT101
0001   ECON101

Are there any problems associated with building a table in this manner? This table might actually work; however, it leaves many null values. These "empty" spaces take up room on our storage media and are therefore not desirable.

In order to eliminate these spaces we need to break up the table. The table above actually represents multiple sets of multivalued dependencies. That is MAJOR and ELECTIVE may have multiple values for every student. By eliminating these multiple sets of multivalued dependencies we can achieve Fourth Normal Form (4NF).

Review the results below. Why do we need to change the primary key? Click here for the answer.

MAJOR  
   
STU_NUM MAJOR
0001 MIS310
0001 MIS201

 

ELECTIVES  
   
STU_NUM ELECTIVES
0001 LIT101
0001 ECON101

Fifth, Higher, and Domain-Key Normal Forms

In the words of Elmasri and Navathe in "Fundamentals of Database Systems":

The practical utility of normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect by the database designers and users who must discover these constraints. 


* portions from Elmasir/Navathe, Fundamentals of Database Systems, 2e.