19 October 2024
1. Overview of Codd’s Rules for Normalising a Database
E.F. Codd, the founder of relational database theory, introduced database normalisation to reduce redundancy and improve data integrity. His normalisation process is divided into stages, known as normal forms (NF). Each step addresses specific types of anomalies that can occur in a poorly designed database. Below are the key normalisation rules:
2. Codd's Normal Forms
First Normal Form (1NF):
Each table must have atomic values (no multiple values in a single field).
Every column should contain only one value (no lists or arrays in one cell).
There must be a unique identifier (a primary key) for each record in the table.
Example: Instead of having a field that contains multiple phone numbers, there should be separate rows or a related table for each phone number associated with the person.
Second Normal Form (2NF):
It must satisfy all requirements of 1NF.
It should eliminate partial dependency, meaning non-primary key attributes must depend on the entire primary key, not just a part of it.
Example: In a table where the primary key is a combination of multiple columns (e.g., Student ID + Course ID), the student's name should not depend only on Student ID; it must depend on the combination of Student ID and Course ID.
Third Normal Form (3NF):
It must satisfy all requirements of 2NF.
It eliminates transitive dependency, meaning non-key attributes should only depend on the primary key and not on other non-key attributes.
Example: If a table has a Student ID, Course ID, and the Instructor Name, the Instructor Name should be removed and placed in a separate table, as it’s not directly dependent on the primary key (Student ID + Course ID).
3. Advanced Normal Forms
Boyce-Codd Normal Form (BCNF):
A stricter version of 3NF. It eliminates any remaining anomalies where a non-primary key attribute functionally determines a part of the primary key.
Example: If a table contains both Professor ID and Department, and Department determines Professor ID, BCNF requires that the relationship between Department and Professor ID be normalised into separate tables.
Fourth Normal Form (4NF):
It must satisfy all requirements of BCNF.
It addresses cases of multi-valued dependency, ensuring that no record contains multiple independent sets of values for different attributes.
Example: In a table containing Employee ID, Skill, and Language, an employee’s skills and languages are unrelated, so the table should be split into two tables: one for Employee-Skill and another for Employee-Language.
Fifth Normal Form (5NF):
It must satisfy all requirements of 4NF.
It eliminates join dependency, ensuring that data should not be decomposed beyond the necessity to represent complex relationships.
Example: If a table captures relationships between three entities, breaking it into smaller tables shouldn’t cause the loss of any meaningful relationships between the entities.
4. Key Definitions
Atomic values: Each data entry must be indivisible (no lists or sets inside a cell).
Partial dependency: When a non-primary key column depends only on part of a composite primary key, rather than the entire key.
Transitive dependency: When a non-primary key column depends on another non-primary key column instead of directly on the primary key.
Multi-valued dependency: A situation where one attribute in a table uniquely determines another set of attributes, leading to redundancy.
Join dependency: When the combination of multiple tables is necessary to represent a relationship between entities.
5. The Big Picture
Codd's normalisation rules are designed to improve database efficiency and avoid problems like redundant data, update anomalies, and inconsistencies. By applying these forms step-by-step, a database becomes more structured, ensuring that data is stored logically and can be queried effectively.