Saturday, July 5, 2014

Normalization of Relational Database and its types

A database stores data in an organized way so that it can be searched and retrieved later. It should contain one or more tables. A table is much like a spreadsheet, in that it's made up of rows and columns. All rows have the same columns, and each column contains the data itself. If it helps, think of your tables in the same way that you would a table in Excel.

Data can be inserted, retrieved, updated, and deleted from a table. The word, created, is generally used instead of inserted, so, collectively, these four functions are affectionately abbreviated as CRUD.



What is Relational Database?
A relational database is a type of database that organizes data into tables, and links them, based on defined relationships. These relationships enable you to retrieve and combine data from one or more tables with a single query. The relational database was invented by E. F. Codd at IBM in 1970.

What is Database Normalization?
Database normalization is the process of organizing the fields and tables of a relational database to minimize redundancy. Normalization usually involves dividing large tables into smaller (and less redundant) tables and defining relationships between them.

The main objective of Database Normalization is:
  • To eliminate reduntant(useless) data
  • To ensure data dependencies 


Edgar F. Codd, the inventor of the relational model, introduced the concept of normalization and what we now know as the First Normal Form (1NF) in 1970. Codd went on to define the Second Normal Form (2NF) and Third Normal Form (3NF) in 1971, and Codd and Raymond F. Boyce defined the Boyce-Codd Normal Form (BCNF) in 1974. Informally, a relational database table is often described as "normalized" if it is in the Third Normal Form. Most 3NF tables are free of insertion, update, and deletion anomalies.



First Normal Form (1NF):
Each field in a table contains different information. As per 1NF, no two rows of data must contain repeating group of information i.e each set of column must have a unique value, such that multiple columns cannot be used to fetch the same row.

For example:
In a school, school wants records of its all students and related subjects then table looks like below:

Student Id Student Name Address Subjects  Faculty Fee
1 Reshma Kathmandu Botany, Physics, Statistics Science 2000

In 1NF, each field should contain only one value. Like there shouldn't be more than one value like separated with commas. Rather that that, we must separate such data into multiple rows in a different table. So we separate subject from above table and create new table with Student Id as identifier of subjects.

Student Id Subjects
1 Botany
1 Physics
1 Statistics

Using 1NF, data redundancy increases but there each row will have a unique value.


Second Normal Form (2NF):
A table that is in first normal form (1NF) must meet additional criteria if it is to qualify for second normal form. Specifically: a table is in 2NF if and only if it is in 1NF and no non-prime attribute is dependent on any proper subset of any candidate key of the table. A non-prime attribute of a table is an attribute that is not a part of any candidate key of the table.

Simply, a table is in 2NF if and only if it is in 1NF and every non-prime attribute of the table is dependent on the whole of a candidate key.

Now continuing the above example:
The table following 2NF will be as below:

Table 1  (Student table)
Student Id Student Name Student Address
Faculty
Fee
1 Reshma Kathmandu Science 2000

Table 2  (Subject table)
Student Id Subjects
1 Botany
1 Physics
1 Statistics

Now, in table 1, Student Id is primary key and Student Name and Student Address should depend on its primary key.

Third Normal Form (3NF): 
A table which must be in 2NF and no duplicate information is permitted. It applies that every non-prime attribute of table must be dependent on primary key.

In the above example, fee depends in admission year more than student. So remove these values and create a new table like below:
Table 1  (Student table)
Student Id Student Name Student Address
1 Reshma Kathmandu

Table 2  (Subject table)
Student Id Subjects
1 Botany
1 Physics
1 Statistics

Table 3 (Faculty table)

Faculty
Fee
Science 2000

Tables should be normalized till 3NF, if not we will proceed further other levels of Normalization.

Boyce and Codd Normal Form (BCNF or 3.5NF):
Boyce and Codd Normal Form is a higher version of the Third Normal Form.  It is a slightly stronger version of the third normal form (3NF). BCNF was developed in 1974 by Raymond F. Boyce and Edgar F. Codd to address certain types of anomaly not dealt with by 3NF as originally defined.

A 3NF table which does not have multiple overlapping candidate keys is said to be in BCNF. 

No comments:

Post a Comment