Sunday, May 30, 2010

Database Normalization

It is Sunday, had dinner and thinking what to do today that can enhance my database management skill, ohhh Great a very basic topic of relation database "Normalization" is here, that I need to explore more because every time I gets confused whenever faces question on that.

Normalization is the systematic way to organizing data in a database. This includes creating tables and establishing relationships between those tables according to rules designed both to protect the existing data (accidental deletions or amendments) and to make the database more flexible by eliminating redundancy and inconsistent dependency.

Normalization is the process that includes creating tables and establishing relationships between those tables according to rules designed both to protect the data and to make the database more flexible by eliminating redundancy and inconsistent dependency.

Redundant data wastes disk space and creates maintenance problems. If data that exists in more than one place must be changed, the data must be changed in exactly the same way in all locations. A customer address change is much easier to implement if that data is stored only in the Customers table and nowhere else in the database.

There are a few rules for database normalization. Each rule is called a "normal form." If the first rule is observed, the database is said to be in "first normal form." If the first three rules are observed, the database is considered to be in "third normal form." Although other levels of normalization are possible, third normal form is considered the highest level necessary for most applications.

First Normal Form

* Eliminate repeating groups in individual tables.
* Create a separate table for each set of related data.
* Identify each set of related data with a primary key.

Basically there are 4 properties of Relational table that can be considered as in 1Nf:
1. Entries in columns are single-valued.
2. Entries in columns are of the same kind.
3. Each row is unique.
4. Each column has a unique name.

See
Example 1

Table 1. (Not in 1Nf)
Item_ID Item_Number Item_Amount
100 ITM001, ITM002, ITM003 500,450,300

Table 1. (In 1Nf)
Item_ID Item_Number Item_Amount
100 ITM001 500
101 ITM002 450
103 ITM003 300


Second Normal Form

* Create separate tables for sets of values that apply to multiple records.
* Relate these tables with a foreign key.

Records should not depend on anything other than a table's primary key (a compound key, if necessary). For example, consider a customer's address in an accounting system. The address is needed by the Customers table, but also by the Orders, Shipping, Invoices, Accounts Receivable, and Collections tables. Instead of storing the customer's address as a separate entry in each of these tables, store it in one place, either in the Customers table or in a separate Addresses table.


# Unnormalized table:

Collapse this tableExpand this table
Student# Advisor Adv-Room Class1 Class2 Class3
1022 Jones 412 101-07 143-01 159-02
4123 Smith 216 201-01 211-02 214-01

Note the multiple Class# values for each Student# value in the above table. Class# is not functionally dependent on Student# (primary key), so this relationship is not in second normal form.

The following two tables demonstrate second normal form:

Students:
Collapse this tableExpand this table
Student# Advisor Adv-Room
1022 Jones 412
4123 Smith 216


Registration:
Collapse this tableExpand this table
Student# Class#
1022 101-07
1022 143-01
1022 159-02
4123 201-01
4123 211-02
4123 214-01


Third Normal Form

* Eliminate fields that do not depend on the key.

Values in a record that are not part of that record's key do not belong in the table. In general, any time the contents of a group of fields may apply to more than a single record in the table, consider placing those fields in a separate table.

For example, in an Employee Recruitment table, a candidate's university name and address may be included. But you need a complete list of universities for group mailings. If university information is stored in the Candidates table, there is no way to list universities with no current candidates. Create a separate Universities table and link it to the Candidates table with a university code key.

Why We Need Database Normalization:
Normalization is part of successful database design; without normalization, database systems can be inaccurate, slow, and inefficient, and they might not produce the data you expect.

Third Normal Form: Eliminate Data Not Dependent On Key

In the last example, Adv-Room (the advisor's office number) is functionally dependent on the Advisor attribute. The solution is to move that attribute from the Students table to the Faculty table, as shown below:

Students:

Collapse this tableExpand this table
Student# Advisor
1022 Jones
4123 Smith


Faculty:

Collapse this tableExpand this table
Name Room Dept
Jones 412 42
Smith 216 42

This article is posted on one of MS site, i just understood,simplified and put on my blog.
I think this is good to go for Normalization basic.