Database paradigm

Database paradigm

Original Source: Song Yuanjian

Introduction

The position of database paradigm in database design has always been ambiguous. Textbooks have given academic definitions of database paradigm, but the application of paradigm in practical applications is not very optimistic. This article will use simple language and a simple The database DEMO of a database that does not conform to the paradigm is realized step by step from the first paradigm to the fourth paradigm.

Paradigm goal

Applying the database paradigm can bring many benefits, but the most important benefits boil down to three points:

1. Reduce data redundancy (this is the main benefit, other benefits are incidental to this)

2. Eliminate exceptions (insert exceptions, update exceptions, delete exceptions)

3. Make the data organization more harmonious...

But the sword is double-edged, and the application database paradigm will also bring disadvantages, which will be discussed later in the article.

What is a paradigm

Simply put, the paradigm is to eliminate duplicate data and reduce redundant data, so that the data in the database is better organized, and the disk space is more effectively used. The prerequisite for meeting the high-level paradigm is to meet the low Hierarchical paradigm. (For example, satisfying 2nf must satisfy 1nf)

DEMO

Let's start with an unnormalized table, the table is as follows:

First make a brief description of the table, employeeId is the employee id, departmentName is the department name, job represents the position, jobDescription is the job description, skill is the employee’s skills, departmentDescription is the department description, and address is the employee’s address.

Perform first normal form (1NF) on the table

If all attributes of a relational pattern R are inseparable basic data items, then R∈1NF.

Simply put, the first paradigm is that every attribute is inseparable. It cannot be called a relational database if it does not conform to the first normal form. Regarding the above table, it is not difficult to see that Address can be subdivided, such as "XX Road XX Community XX, Beijing", which obviously does not conform to the first paradigm. To apply the first paradigm to it, this attribute needs to be decomposed into another table. ,as follows:

Perform second normal form (2NF) on the table

If the relational pattern R ∈ 1NF, and every non-primary attribute completely depends on the code of R, then R ∈ 2NF

Simply put, the attributes in the table must be completely dependent on all primary keys, not some primary keys. So if a table with only one primary key meets the first normal form, it must be in the second normal form. The purpose of this is to further reduce insert exceptions and update exceptions. In the above table, departmentDescription is determined by the primary key DepartmentName, but it is not determined by the primary key EmployeeID, so departmentDescription only depends on one of the two primary keys, so departmentDescription is partially dependent on the primary key, and the second paradigm is applied to it as shown in the following table :

Perform third normal form (3NF) on the table

If there is no such code X, attribute group Y, and non-primary attribute Z (Z  Y) in the relational pattern R<U, F>, such that X→Y, Y→Z is established, then it is called R<U, F> ∈ 3NF.

Simply put, the third paradigm is to eliminate the dependency between keywords in the database. In the second paradigmized table above, it can be seen that jobDescription (job responsibilities) is determined by job (post), then jobDescription Depending on the job, it can be seen that this does not conform to the third normal form. The relationship diagram after the third normal form of the table is:

In the above table, there is no longer a problem of interdependence of database attributes, so it conforms to the third normal form

Perform BC Normal Form (BCNF) on the table

Suppose the relational pattern R<U,F>∈1NF, if each function of R depends on X→Y, if Y does not belong to X, then X must contain candidate codes, then R∈BCNF.

Simply put, the bc paradigm is a special case based on the third paradigm, that is, there is only one candidate key in each table (the value of each row in a database is different, it can be called a candidate key), It can be seen from the noNf table of the third normal form above that each employee’s email is unique (does two people use the same email??) Then, this table does not conform to the bc paradigm. The relationship diagram is:

Perform the fourth normal form (4NF) on the table

The relational pattern R<U,F>∈1NF, if for each non-trivial multi-valued dependency of R X→→Y (Y  X), X contains candidate codes, then R∈4NF.

Simply put, the fourth paradigm is to eliminate the multi-value dependence in the table, which means that it can reduce the work of maintaining data consistency. For the above bc normalized table, for the skill of the employee, the two possible values ​​are "C#, sql, javascript" and "C#, UML, Ruby". It can be seen that there are multiple values ​​for this database attribute. It may cause inconsistent database content. For example, the first value is written as "C#", and the second value is written as "C#.net". The solution is to put the multi-valued attribute in a new table, then the fourth The normalized relationship diagram is as follows:

For the skill table, the possible values ​​are:

summary

In the process of decomposing the database paradigm above, it is not difficult to see that the higher the applied paradigm registration, the more tables. Many tables will bring many problems:

1 When querying, you need to connect multiple tables, which increases the complexity of the query

2 Multiple tables need to be connected when querying, which reduces database query performance

In the current situation, the cost of disk space is basically negligible, so the problems caused by data redundancy are not the reason for the application of the database paradigm.

Therefore, it is not that the higher the applied paradigm, the better, it depends on the actual situation. The third paradigm has greatly reduced data redundancy, and reduced the occurrence of insert exceptions, update exceptions, and delete exceptions. My personal opinion is that in most cases, applying the third normal form is sufficient, and under certain circumstances, the second normal form is also possible.

Since I am still in the preliminary stage of database research, if there is any impropriety in the above, I hope that experts will not hesitate to advise...

Reference: https://cloud.tencent.com/developer/article/1054588 database paradigm those things-Cloud + Community-Tencent Cloud