Tuesday, 13 May 2014

Functional Dependency and Normalization

Purpose of Normalization

Normalization is a technique for producing a set of suitable relations that support the data requirements of an enterprise.

Characteristics of a suitable set of relations include:

- the minimal number of attributes necessary to support the data requirements of the enterprise;
- attributes with a close logical relationship are found in the same relation;
- minimal redundancy with each attribute represented only once with the important exception of attributes that form all or part of foreign keys.

The benefits of using a database that has a suitable set of relations is that the database will be:
easier for the user to access and maintain the data;

take up minimal storage space on the computer.

Data Redundancy and Update Anomalies

  • Staff Branch relation has redundant data
    –the details of a branch are repeated for every member of staff.
  • In contrast, the branch information appears only once for each branch in the Branch relation and only the branch number (branch No) is repeated in the Staff relation, to represent where each member of staff is located..
  • Relations that contain redundant information may potentially suffer from update anomalies.

Data Redundancy and Update Anomalies

Relations that contain redundant information may potentially suffer from update anomalies.
•Types of update anomalies include
•e.g. insert new staff B007, 22 Deer Road, London?
•e.g. delete staff 102 (branch info of B007 is lost)
•e.g. change address of B003 (must update all staffs in this location)

Functional Dependencies

Functional dependency describes relationship between attributes.
•For example, if A and B are attributes of relation R, B is functionally dependent on A (denoted A -> B), if each value of A in R is associated with exactly one value of B in R.
•e.g. staffNo → branchNo (in staffBranch)

Characteristics of Functional Dependencies

Diagrammatic representation 

The determinant of a functional dependency refers to the attribute or group of attributes on the left-hand side of the arrow. 

An Example Functional Dependency

Full/Partial Functional Dependency

Partial functional dependency

• staffNo, sName → branchNo

–Since some attributes can be removed (sName)

Full functional dependency

• staffNo → branchNo

–No attributes can be removed

Transitive Dependencies

Transitive dependency describes a condition where A, B, and C are attributes of a relation such that if A → B and B → C, then C is transitively dependent on A via B (provided that A is not functionally dependent on B or C).

If A → B and B → C, then A → C

Example Transitive Dependency

Consider functional dependencies in the StaffBranch relation

staffNo (A) → sName, position, salary,

branchNo(B), bAddress(C)

branchNo (B) → bAddress (C)

if A → B and B → C, then A → C

• Transitive dependency, branchNo (B) → bAddress (C) exists on staffNo via branchNo.

    Example - Using sample data to identify functional dependencies.

    •Function dependencies between attributes A to E in the Sample relation.
    A -> C (fd1)
    C ->A (fd2)
    B -> D (fd3)
    A, B -> E (fd4)

    Example - Identifying Primary Key for Sample Relation

    •The determinants in the Sample relation are A, B, C, and (A, B).
    •However, the only determinant that functionally determines all the other attributes of the relation is (A, B).
    •(A, B) is identified as the primary key for this relation.

      Unnormalized Form (UNF)

      •A table that contains one or more repeating groups.
      •To create an unnormalized table
      –Transform the data from the information source (e.g. form) into table format with columns and rows.

      First Normal Form (1NF)

      •A relation in which the intersection of each row and column contains one and only one value.
      •Repeating group
      –(propertyNo, pAddress, rentStart, rentFinish, rent, ownerNo, oName)

      Second Normal Form (2NF)

      •A relation that is in 1NF and every non-primary-key attribute is fully functionally dependent on the primary key.
      •fd2 clientNo -> cName
      •fd3 propertyNo -> pAddress, rent,
      ownerNo, oName

      Third Normal Form (3NF)

      •A relation that is in 1NF and 2NF and in which no non-primary-key attribute is transitively dependent on the primary key.
      •fd6 ownerNo -> oName


      A table in 3NF but some rare case it has still trivial FD which violates the BCNF. BCNF (3.5) is strict form of 3NF in which all key are dependent of super key for remove all types transitive dependencies which some time create insert anomalies

      For each Person / Shop Type combination, the table tells us which shop of this type is geographically nearest to the person's home. We assume for simplicity that a single shop cannot be of more than one type.

      The candidate keys of the table are:
      • {Person, Shop Type}
      • {Person, Nearest Shop}

      Because all three attributes are prime attributes (i.e. belong to candidate keys), the table is in 3NF. The table is not in BCNF, however, as the Shop Type attribute is functionally dependent on a non-superkey: Nearest Shop.

      The violation of BCNF means that the table is subject to anomalies. For example, Eagle Eye might have its Shop Type changed to "Optometrist" on its "Fuller" record while retaining the Shop Type "Optician" on its "Davidson" record. This would imply contradictory answers to the question: "What is Eagle Eye's Shop Type?" Holding each shop's Shop Type only once would seem preferable, as doing so would prevent such anomalies from occurring:

      Fourth normal form (4NF)

      Fourth normal form (4NF) is a normal form used in database normalization. Introduced by Ronald Fagin in 1977, 4NF is the next level of normalization after Boyce–Codd normal form (BCNF). Whereas the second, third, and Boyce–Codd normal forms are concerned with functional dependencies, 4NF is concerned with a more general type of dependency known as a multivalued dependency. A Table is in 4NF if and only if, for every one of its non-trivial multivalued dependencies X Y, X is a superkey—that is, X is either a candidate key or a superset thereof


      Each row indicates that a given restaurant can deliver a given variety of pizza to a given area.
      The table has no non-key attributes because its only key is {Restaurant, Pizza Variety, Delivery Area}. Therefore it meets all normal forms up to BCNF. If we assume, however, that pizza varieties offered by a restaurant are not affected by delivery area (i.e. a restaurant offers all pizza varieties it makes to all areas it supplies), then it does not meet 4NF. The problem is that the table features two non-trivial multivalued dependencies on the {Restaurant} attribute (which is not a superkey). The dependencies are:
      {Restaurant} ->>{Pizza Variety}
      {Restaurant} ->>{Delivery Area}
      These non-trivial multivalued dependencies on a non-superkey reflect the fact that the varieties of pizza a restaurant offers are independent from the areas to which the restaurant delivers. This state of affairs leads to redundancy in the table: for example, we are told three times that A1 Pizza offers Stuffed Crust, and if A1 Pizza starts producing Cheese Crust pizzas then we will need to add multiple rows, one for each of A1 Pizza's delivery areas. There is, moreover, nothing to prevent us from doing this incorrectly: we might add Cheese Crust rows for all but one of A1 Pizza's delivery areas, thereby failing to respect the multivalued dependency {Restaurant} ->>{Pizza Variety}.
      To eliminate the possibility of these anomalies, we must place the facts about varieties offered into a different table from the facts about delivery areas, yielding two tables that are both in 4NF: