Databases are relational. The concept of a relational database

Home > Lecture

Lecture DB Chapter 2 RELATIONAL DATABASES 2.1. Terms and Definitions The development of relational databases began in the late 1960s, when the first works appeared that discussed the possibilities of using the methods of formalized data presentation in the form of tables that are familiar to a specialist. Some experts called this way of presenting information decision tables, others - tabular algorithms. Theorists of relational databases called the tabular way of presenting information datalogical models. The founder of the theory of relational databases is an IBM employee, Dr. E. F. Codd, who published on June 6, 1970 the article “A Relational Model of Data for Large Shared Data Banks”. In this article, for the first time, the term "relational data model" was used, which marked the beginning of relational databases. Relational database theory developed in the 1970s. in the USA by Dr. E. F. Codd, relied on the mathematical apparatus of set theory. He proved that any set of data CAN be represented in the form of two-dimensional tables of a special kind, known in mathematics as ratios. From the English word "relation" "relationship"), the name "relational data model" came about. At present, the theoretical basis for the design of databases (DB) is the mathematical apparatus of relational algebra (see subsection 1.2). Thus, a relational database is information (data) about objects, presented in the form of two-dimensional arrays - tables, united by certain links. The database can also consist of one table. Before proceeding to further study of relational databases, consider the terms and definitions used in theory and practice. database table- a two-dimensional array containing information about one class of objects. In the theory of relational algebra, a two-dimensional array (table) is called attitude. The table consists of the following elements: field, cell, record (Fig. 2.1). Field contains the values ​​of one of the features that characterize the objects of the database. The number of fields in the table corresponds to the number of features that characterize the objects of the database. 22 Cell contains a specific value of the corresponding field (attribute of one object). Recording- table row. It contains the values ​​of all features that characterize one object. The number of records (rows) corresponds to the number of objects whose data are contained in the table. In database theory, the term record corresponds to the concept short-tezh- a sequence of attributes interconnected by the relation AND (AND). In graph theory tuple means a simple branch of a directed tree graph. In table. 2.1 shows the terms used in the theory and practice of developing relational databases. One of the important concepts necessary for constructing the optimal structure of relational databases is the concept of a key, or a key field. key a field is considered whose values ​​uniquely determine the values ​​of all other fields in the table. For example, the field "Passport Number", or "Taxpayer Identification Number (TIN)", uniquely identifies the characteristics of any individual (when compiling the appropriate database tables FOR the personnel departments or accounting departments of the enterprise).
23

A table key can be not one, but several fields. In this case, the set of fields can only be a possible table key if two time-independent conditions are satisfied: uniqueness and minimality. Each field that is not part of the primary key is called a non-key field of the table.

Uniqueness key means that at any given time a database table cannot contain any two distinct records that have the same key field values. Fulfillment of the uniqueness condition is mandatory. Condition minimality key fields means that only the combination of values ​​of the selected fields meets the requirements for the uniqueness of records in the database table. This also means that none of the fields included in the key can be excluded from it without violating uniqueness. When forming a database table key consisting of several fields, you must be guided by the following provisions: you should not include table fields in the key, the values ​​of which by themselves uniquely identify records in the table. For example, you shouldn't create a key that simultaneously contains the fields "passport number" and "taxpayer identification number", since each of these attributes can uniquely identify records in the table; you cannot include a non-unique field in the key, that is, a field whose values ​​can be repeated in the table. Each table must have at least one possible key, which is chosen as primary key. If there are fields in the table, the values ​​of each of which uniquely determine the records, then these fields can be taken as alternative keys. For example, if you select the taxpayer identification number as the primary key, then the passport number will be an alternative key. 2.2. Normalizing Relational Database Tables A relational database is a set of tables interconnected. The number of tables in one file or one database depends on many factors, the main of which are: the composition of database users, ensuring the integrity of information (especially important in multi-user information systems), ensuring the smallest amount of required memory and minimum time data processing. 24

Accounting for these factors in the design of relational databases is carried out by methods of normalizing tables and establishing links between them.

Table normalization are ways of dividing a single database table into multiple tables that generally meet the requirements listed above. Table normalization is a sequential change in the structure of the table until it satisfies the requirements of the last form of normalization. There are six forms of normalization in total:
    first normal form (First Normal Form - 1NF); second normal form (Second Normal Form - 2NF); third normal form (Third Normal Form - ЗNF); Boyce-Codd normal form (Brice-Codd Normal Form-BCNF); fourth normal form (Forth Normal Form - 4NF); Fifth Normal Form - 5NF, or PJ/NF ).
When describing normal forms, the following concepts are used: “functional dependence between fields”; "full functional dependence between fields"; "many-valued functional dependence between fields"; "transitive functional dependence between fields"; "mutual independence between fields". functional dependency between the fields A and B is called the dependence in which each value of A at any time corresponds to a single value of B from all possible ones. An example of a functional relationship is the relationship between the taxpayer identification number and the number of his passport. Complete functional dependency between the composite field A and the field B is called the dependence in which the field B depends functionally on the field A and does not depend functionally on any subset of the field A. Multivalued functional dependency between fields is defined as follows. Field A uniquely defines field B if for each value of field A there is a “well-defined set” of corresponding values ​​of field B. ) and “Score” (field B), then field B has a “well-defined set” of valid values: 1, 2, 3, 4, 5, i.e. for each value of the "Subject" field, there is a multi-valued "well-defined set" of values ​​​​of the "Rating" field. transitive functional dependency between fields A and C Exists if field C is functionally dependent on field B, and field B is functionally dependent on field A; in this case, there is no functional dependence of the field A on the field B. Mutual independence between fields is defined as follows. Several fields are mutually independent if none of them is functionally dependent on the other. First normal form. A table is in first normal form if and only if none of the fields contains more than one value and any key field is not empty. The first normal form is the basis of the relational data model. Any table in a relational database is automatically in first normal form, otherwise it is simply not possible by definition. Such a table should not contain fields (features) that could be divided into several fields (features). Unnormalized, as a rule, are tables that were not originally intended for computer processing of the information contained in them. For example, in table. 2.2 shows a fragment of the table from the reference book "Universal metal-cutting machines", published by the Experimental Research Institute of Metal-cutting Machines (ENIMS). This table is not normalized for the following reasons. 1. It contains lines that have several values ​​of one field in one cell: “Maximum machining diameter, mm” and “Spindle speed, rpm”. 2. One field - "Overall dimensions (length x width x height), mm" can be divided into three fields: "Length, mm", "Width, mm" and "Height, mm". The expediency of such a division can be justified by the need for subsequent calculations of areas or volumes occupied. The source table must be converted to first normal form. To do this, it is necessary: ​​to divide the fields "Maximum machining diameter, mm" and "Spindle speed, rpm" into several fields in accordance with the number of values ​​contained in one cell;
26

The field "Overall dimensions (length x width x height), mm", divided into three fields: "Length, mm", "Width, mm", "Height, mm". The key field of this table can be the field “Machine Model” or “No. p / p” The normal form has a table. 2.3. Let's consider one more example. On fig. 2.2 shows a fragment of the test score sheet, which, as in the previous example, was not originally intended for computer processing. Suppose we want to create a database for automated processing of the results of the test-examination session in accordance with
27

with the content of the examination sheet. To do this, we will convert the content of the form into database tables. Based on the need to comply with the conditions of functional dependence between the fields, it is necessary to form at least two tables (Fig. 2.3) (the key fields in each table are in bold). The first table contains the results of passing the test (exam) by each student in a particular subject. The second table contains the resulting results of passing the test (exam) of a specific group of students in a specific subject. In the first table, the field "Student's full name" is the key, and in the second table - the field "Discipline". The tables must be linked to each other by the fields "Discipline" and "Group Code".

The presented table structures fully meet the requirements of the first normal form, but are characterized by the following disadvantages: adding new data to tables requires entering values ​​for all fields; in each line of each table, it is necessary to enter the repeated values ​​​​of the fields "Discipline", "Teacher's full name", "Group code". Consequently, with such a composition of tables and their structure, there is an obvious redundancy of information, which, of course, will require additional memory. To avoid these shortcomings, it is necessary to bring the tables to the second or third normal form. Second normal form. A table is in second normal form if it satisfies the requirements of first normal form and all of its fields that are not included in the primary key are fully functionally dependent on the primary key. 28

If a table has a simple primary key consisting of only one field, then it is automatically in second normal form.

If the primary key is composite, then the table need not be in second normal form. Then it must be divided into two or more tables in such a way that the primary key uniquely identifies the value in any field. If the table has at least one field that does not depend on the primary key, then additional columns must be included in the primary key. If there are no such columns, then a new column must be added. Based on these conditions that determine the second normal form, we can draw the following conclusions on the characteristics of the compiled tables (see Fig. 2.3). In the first table, there is no direct connection between the key field and the “Teacher's full name” field, since different teachers can take a test or exam in one subject. In the table, there is a complete functional dependence only between all other fields and the key field "Discipline". Similarly, in the second table there is no direct connection between the key field and the field "Teacher's full name". To optimize the database, in particular, to reduce the required amount of memory due to the need to repeat the values ​​of the fields "Subject" and "Name of the teacher" in each record, it is necessary to change the structure of the database - to convert the source tables into the second normal form. The composition of the tables of the modified database structure is shown in fig. 2.4. The transformed database structure consists of six tables, two of which are interconnected (the key fields in each table are in bold). All tables satisfy the requirements of the second normal form. The fifth and sixth tables have duplicate values ​​in the fields, but given that these values ​​are integers instead of text data, the total amount of memory required to store information is much less than in the original tables (see Figure 2.1). In addition, the new database structure will make it possible to fill in the tables by various specialists (divisions of management services). Further optimization of database tables is reduced to bringing them to the third normal form. Third normal form. A table is in third normal form if it satisfies the definition of second normal form and none of its non-key fields are functionally dependent on any other non-key fields. 29

You can also say that a table is in third normal form if it is in second normal form and every non-key field is not transitively dependent on the primary key. The requirement of the third normal form is that all non-key fields depend only on the primary key and do not depend on each other. In accordance with these requirements, as part of the database tables (see Fig. 2.3), the first, second, third and fourth tables belong to the third normal form. To bring the fifth and sixth tables to the third normal form, we will create a new table containing information about the composition of the subjects for which exams or tests are held in groups of students. As a key, we will create a "Counter" field that sets the record number in the table, since each record must be unique. thirty

As a result, we get a new database structure, which is shown in Fig. 2.5 (key fields in each table are in bold). This structure contains seven tables that meet the requirements of the third normal form.

Boyce's normal form is Codd's. A table is in Boyce-Codd normal form only if any functional dependency between its fields is reduced to a complete functional dependency on a possible key. According to this definition, in the database structure (see Figure 2.4), all tables comply with the requirements of the Boyce-Codd normal form. Further optimization of database tables should be reduced to a complete decomposition of tables. Full table decomposition call such a collection of an arbitrary number of its projections, the connection of which completely coincides with the contents of the table. A projection is a copy of a table that does not include one or more columns of the new table. Fourth normal form. The fourth normal form is a special case of the fifth normal form, when the complete decomposition must be a union of two projections.
31

It is very difficult to find a table that is in fourth normal form but does not satisfy the definition of fifth normal form.

Fifth normal form. A table is in fifth normal form if and only if, in each of its complete decompositions, all projections contain a possible key. A table that does not have any complete decomposition is also in fifth normal form. In practice, database table optimization ends up in third normal form. Reducing tables to the fourth and fifth normal forms is, in our opinion, of purely theoretical interest. In practice, this problem is solved by developing queries to create a new table. 2.3. Designing relationships between tables The process of normalizing the initial database tables allows you to create the optimal structure of the information system - to develop a database that requires the least memory resources and, as a result, provides the shortest access time to information. At the same time, the division of one source table into several requires the fulfillment of one of the most important conditions for the design of information systems - ensuring the integrity of information during the operation of the database. In the above example of normalizing the original tables (see Figure 2.3), from two tables, we eventually got seven tables reduced to the third and fourth normal forms. As practice shows, in real production and business, databases are multi-user systems. This applies both to the creation and maintenance of data in separate tables, and to the use of information for decision-making. In the example discussed above, in a really functioning educational process management system at a university or college, the initial formation of study groups is carried out by admissions committees when enrolling applicants based on the results of entrance exams. Further maintenance of information about the composition of students in groups in universities is assigned to the deans, and in colleges - to educational departments or relevant structures. The composition of academic disciplines by groups is determined by other services or specialists. Information about the teaching staff is formed in the personnel departments. The results of credit and examination sessions are necessary for the heads of the dean's office and departments, including for making decisions on granting scholarships to 32 successful students or “withdrawal from the scholarship” of unsuccessful students. Any change in any of the database tables must find an adequate change in all other tables. This is the essence of ensuring the integrity of the database. In practice, this task is carried out by establishing links between database tables. Let us formulate the basic rules for establishing links between tables. 1. Select the main and subordinate tables from two linked tables. 2. Select a key field in each table. The key field of the main table is called primary key. The key field of a sub-table is called foreign key. 3. Linked table fields must have the same data type. 4. The following types of links are established between the tables: "one to one"; "one to many"; many-to-many: a one-to-one relationship is established in cases where a particular row in the master table is linked to only one row in the child table at any given time; a one-to-many relationship is established in cases where a particular row in the main table at any given time
33 is associated with several rows of the sub-table; in this case, any row of the subordinate table is associated with only one row of the main table; A many-to-many relationship is established when a particular row in the master table is at any given time linked to more than one row in the slave table, and at the same time one row in the slave table is linked to more than one row in the master table. When changing the value of the primary key in the main table, the following behavior of the dependent table is possible. Cascading When the primary key data in the main table changes, the corresponding foreign key data in the dependent table changes. All existing connections are saved. Restriction. If you try to change the value of the primary key to which the rows in the dependent table are associated, the changes are rejected. It is allowed to change only those values ​​of the primary key for which a relationship with a dependent table is not established. Establishment (Relation). When the primary key data is changed, the foreign key is set to NULL. Information about row ownership of the dependent table is lost. If you change several values ​​of the primary key, then in the dependent table there are several groups of rows that were previously associated with the changed keys. After this, it is impossible to determine which row was associated with which primary key. On fig. 2.6 shows the link diagrams between the tables of the database presented in fig. 2.5. test questions 1. Define the following elements of a database table: field, cell, record. 2. What do the concepts "key", "key field" mean? 3. Which key field is called the primary key, and which is the foreign key? 4. What is the process of normalizing database tables? 5. What five normal forms of database tables do you know? 6. Define the following types of relationships between database tables: "one to one"; "one to many"; "many to many".

The basic concepts of relational databases are data type, domain, attribute, tuple, primary key, and relationship. Let us show the meaning of these concepts using the example of the EMPLOYEES relation, which contains information about the employees of some organization:

1. Data type

concept data type in a relational data model is completely adequate to the concept of a data type in programming languages. Typically, modern relational databases allow the storage of character, numeric data, bit strings, specialized numeric data (such as "money"), as well as special "temporal" data (date, time, time interval). An approach to extending the capabilities of relational systems with abstract data types is being actively developed (the corresponding capabilities are available, for example, systems of the Ingres/Postgres family). In our example, we are dealing with three types of data: character strings, integers, and "money".

2. Domain

concept domain more database-specific, although it has some analogies with subtypes in some programming languages. In its most general form, a domain is defined by specifying some basic data type to which the elements of the domain belong, and an arbitrary logical expression applied to the element of the data type. If this boolean expression evaluates to true, then the data element is a domain element. The most correct intuitive interpretation of the concept of a domain is the understanding of a domain as a valid potential set of values ​​of a given type. For example, the "Names" domain in our example is defined on the base character string type, but its values ​​can only include strings that can represent a name (in particular, such strings cannot begin with a soft character). The semantic meaning of the domain concept should also be noted: data are considered comparable only if they belong to the same domain. In our example, the values ​​of the "Pass Numbers" and "Group Numbers" domains are of the integer type, but are not comparable. Note that most relational DBMSs do not use the concept of a domain, although it is already supported in Oracle V.7.

3. Relationship schema, database schema

A relation schema is a named set of pairs (attribute name, domain name (or type, if the concept of a domain is not supported)). The degree or "arity" of the relation scheme is the cardinality of this set. The degree of the EMPLOYEE relationship is four, which means it is 4-ary. If all the attributes of one relation are defined on different domains, it makes sense to use the names of the corresponding domains to name the attributes (keeping in mind, of course, that this is just a convenient way of naming and does not eliminate the distinction between the concepts of domain and attribute). A database schema (in a structural sense) is a set of named relationship schemas.

4. Tuple, relation

A tuple corresponding to a given relation schema is a set of (attribute name, value) pairs that contains one occurrence of each attribute name that belongs to the relation schema. "Value" is a valid domain value for this attribute (or data type if the concept of a domain is not supported). Thus, the degree or "arity" of the tuple, i.e. the number of elements in it, coincides with the "arity" of the corresponding scheme of the relation. Simply put, a tuple is a set of named values ​​of a given type.
A relation is a set of tuples corresponding to the same relation schema. Sometimes, in order not to be confused, they say "relation-schema" and "relation-instance", sometimes the schema of the relation is called the header of the relation, and the relation as a set of tuples is called the body of the relation. In fact, the concept of relational schema is closest to the concept of a structural data type in programming languages. It would be quite logical to allow separately defining a relationship schema and then one or more relationships with that schema.
The usual everyday representation of a relation is a table, the header of which is the schema of the relation, and the rows are the tuples of the relation-instance; in this case, the attribute names name the columns of this table. Therefore, sometimes they say "table column", meaning "relation attribute". A relational database is a set of relationships whose names match the relationship schema names in the database schema.

Fundamental Properties of Relationships

1. Absence of duplicate tuples

The property that relations do not contain duplicate tuples follows from the definition of a relation as a set of tuples. In classical set theory, by definition, each set consists of distinct elements. This property implies that each relation has a so-called primary key - a set of attributes, the values ​​of which uniquely determine the tuple of the relation. For every relation, at least the full set of its attributes has this property. However, when formally defining a primary key, it is required to ensure its "minimality", i.e. the set of attributes of the primary key should not include such attributes that can be discarded without prejudice to the main property - to uniquely identify a tuple. concept primary key is extremely important in connection with the concept of database integrity.

2.No ordering of tuples

The non-ordering property of relation tuples is also a consequence of defining an instance relation as a set of tuples. The absence of the requirement to maintain order on the set of tuples of the relation gives additional flexibility to the DBMS when storing databases in external memory and when executing database queries. This does not contradict the fact that when formulating a query to the database, for example, in the SQL language, one can require the resulting table to be sorted according to the values ​​of some columns. Such a result, generally speaking, is not a relation, but some ordered list of tuples.

3.Lack of ordering of attributes

Relationship attributes are not ordered because, by definition, a relationship schema has many pairs (attribute name, domain name). The attribute name is always used to refer to an attribute value in a relation tuple. This property theoretically allows, for example, modifying the schemes of existing relationships not only by adding new attributes, but also by deleting existing attributes. However, in most existing systems this possibility is not allowed, and although the ordering of the set of relation attributes is not explicitly required, often their order in the linear form of the relation schema definition is used as the implicit order of the attributes.

4.Atomicity of attribute values.

All attribute values ​​are atomic. This follows from the definition of a domain as a potential set of values ​​of a simple data type, i.e. domain values ​​cannot contain sets of values ​​(relations). It is common to say that in relational databases only normalized relations or relations represented in the first normal form are allowed.
Relational data model. According to Data, the relational model consists of three parts that describe different aspects of the relational approach: the structural part, the manipulation part, and the integral part. In the structural part of the model, it is fixed that the only data structure used in relational databases is a normalized n-ary relation. In the manipulation part of the model, two fundamental mechanisms for manipulating relational databases are asserted - relational algebra and relational calculus. The first mechanism is based mainly on the classical set theory (with some refinements), and the second one is based on the classical logical apparatus of the first-order predicate calculus.

Entity and reference integrity. Finally, in the integral part of the relational data model, two basic integrity requirements are fixed, which must be supported in any relational DBMS. The first requirement is called entity integrity requirement. An object or entity of the real world in relational databases corresponds to tuples of relations. Specifically, the requirement is that any tuple of any relation is distinguishable from any other tuple of this relation, i.e. in other words, any relation must have a primary key. As we saw in the previous section, this requirement is automatically satisfied if the basic properties of relations are not violated in the system. The second requirement is called referential integrity requirement and is somewhat more complex. It is obvious that while respecting the normalization of relations, complex entities of the real world are represented in a relational database in the form of several tuples of several relations.

Relational operations and calculus.

Having proposed a relational data model, E.F. Codd also created a tool for convenient work with relations - relational algebra. Each operation of this algebra uses one or more tables (relations) as its operands and produces a new table as a result, i.e. allows you to "cut" or "glue" tables (Fig. 3.3).

Rice. 3.3. Some Relational Algebra Operations
Data manipulation languages ​​have been created that make it possible to implement all the operations of relational algebra and almost any combination of them. Among them, the most common are SQL (Structured Query Language - structured query language) and QBE (Quere-By-Example - pattern queries) [ , ]. Both are very high-level languages ​​in which the user specifies what data is to be retrieved without specifying the procedure for obtaining it. With a single query in any of these languages, you can join several tables into a temporary table and cut out the required rows and columns from it (selection and projection).

Database language support

To work with the database, special languages ​​are used, generally referred to as database languages.

In the first databases, there were 2 languages:

1. SDL base schema definition language.

2. DML data manipulation language.

The first of them served to determine the logical structure of the database, and the second contained a set of operators that allowed you to manipulate data, that is, enter it into the database and delete it. In modern DBMS, usually one language is supported, containing all the necessary tools for working with the database. This language allows you to both create a database and provide users with a database.

Today, the most widely spoken language is

S structured

L angle

This language both supports and creates a database schema and allows you to manipulate this data. It contains all the necessary tools to ensure the integrity of the database. These integrity constraints are contained in special directories, which allow you to control the integrity of the database state at the language level. Special SQL statements define so-called database views. View - ϶ᴛᴏ requests that are stored in the database. For a user, a view is a table that can be used to limit or extend the visibility of a database to a particular data user. The SQL language contains so special operands that provide authorization for access to database objects. Since different users have different permissions for working with data, these permissions are described in special tables - catalogs that are supported at the language level.

The basic concepts of relational databases are: data type, domain, attribute, tuple, primary key, relation.

The data type in the relational model is usually understood to be the same as the data type in programming languages, that is, data can be character, numeric, bit strings, special numeric data (money), as well as special temporal data (time, date, time interval). ).

In the most general form, a domain is defined by specifying some basic data type to which the elements of this domain belong, the concept of a domain is its understanding as a valid multiple value database. The domain has a semantic load. Data is considered comparable only if it belongs to the same domain.

By tuple, it is customary to understand a set of pairs of database elements that contain one occurrence of each attribute seed in the relation schema.

Relation schema - ϶ᴛᴏ named set of pairs of elements. And in

tuple = attribute name͵ value, that is, a tuple is a set of named values ​​of a given type.

Relation - ϶ᴛᴏ a set of tuples corresponding to some one schema, that is, a relational database - ϶ᴛᴏ a set of relations whose names match the names of relation schemas in the database structure.

Relational databases are the most common at the present time, although along with generally recognized advantages, they also have a number of disadvantages. The advantages of the relational approach include:

The presence of a small set of abstractions that make it relatively easy to model most of the common subject areas and allow precise formal definitions, while remaining intuitive;

The presence of a simple and at the same time powerful mathematical apparatus, based mainly on set theory and mathematical logic, and providing a theoretical basis for a relational approach to organizing databases;

Possibility of non-navigational data manipulation without the need to know the specific physical organization of databases in external memory.

Relational systems did not immediately become widespread. While the main theoretical results in this area were obtained back in the 70s, and at the same time the first prototypes of relational DBMS appeared, for a long time it was considered impossible to achieve an effective implementation of such systems. However, the advantages noted above and the gradual accumulation of methods and algorithms for organizing relational databases and managing them led to the fact that already in the mid-80s, relational systems practically ousted early DBMS from the world market.

At present, the main object of criticism of relational DBMS is not their lack of efficiency, but some limitations inherent in these systems (a direct consequence of simplicity) when used in so-called non-traditional areas (the most common examples are design automation systems), which require extremely complex data structures. Another often noted disadvantage of relational databases is the inability to adequately reflect the semantics of the subject area. In other words, the possibilities of representing knowledge about the semantic specificity of the subject area in relational systems are very limited. Modern research in the field of post-relational systems is mainly devoted to eliminating these shortcomings.

The basic concepts of relational databases are data type, domain, attribute, tuple, primary key, and relationship.

concept data type in a relational data model is completely adequate to the concept of a data type in programming languages. Typically, modern relational databases allow the storage of character, numeric data, bit strings, specialized numeric data (such as "money"), as well as special "temporal" data (date, time, time interval). An approach to extending the capabilities of relational systems with abstract data types is being actively developed (the corresponding capabilities are available, for example, systems of the Ingres/Postgres family).

Relational model data structures. The relational data model organizes and presents data in the form of tables or relations. relation is a term that comes from mathematics and refers to a simple two-dimensional table. The relational approach to building databases uses the terminology of relational theory. The simplest two-dimensional table is defined as relation.

table is the basic data structure type (object) of the relational model. The structure of the table is determined by the set columns. Each row of the table contains one value in the corresponding column. A table cannot have two identical rows. The total number of lines is not limited.

Column matches some data element - attribute, which is the simplest data structure. A table cannot define multiple elements, a group, or a repeating group, as in the network and hierarchical models discussed above. Each column of the table must have name corresponding data element (attribute).

The table column with the values ​​of the corresponding attribute is called domain, and strings with values ​​of different attributes - tuple.

Relational table-relationship. On fig. 9 is an illustration of a relational relation table R. Formal definition relations R (relational table) relies on the concept of its domains D i , (columns) and tuples K j (lines). The relation R defined on the sets of domains (D i ) is the subset Cartesian (direct) product of domains D 1 *D 2 *…..*D n

table relation(see Fig. 1) contains columns with the names of data elements - attributes (А 1 , А 2 , ...). The d attribute values ​​are in the content part of the table and form rows and columns. Multiple attribute values ​​in one column form one domain D i. Multiple attribute values ​​in one row form one tuple K j . Attitude R is formed by a set of ordered tuples.

R=(Кj), J=1- m Кj=(d 1j, d 2 j ,…d nj ),

where n is the number of relationship domains; defines the dimension of the relationship;

j – tuple number;

m is the total number of tuples in the relation, called coordinate number relations.

Fig.9. Illustration of a relational table-relationship

Domain. In its most general form, a domain is defined by specifying some basic data type to which the elements of the domain belong, and an arbitrary logical expression applied to the element of the data type. If this boolean expression evaluates to true, then the data element is a domain element.

The most correct intuitive interpretation of the concept of a domain is the understanding of a domain as a valid potential set of values ​​of a given type. For example, the "Names" domain in our example is defined on the base character string type, but its values ​​can only include strings that can represent a name (in particular, such strings cannot begin with a soft character).

The semantic meaning of the domain concept should also be noted: data are considered comparable only if they belong to the same domain. In our example, the values ​​of the "Pass Numbers" and "Group Numbers" domains are of the integer type, but are not comparable. Note that most relational DBMSs do not use the concept of a domain, although it is already supported in Oracle V.7.

Relationship schema, database schema. A relation schema is a named set of pairs (attribute name, domain name (or type, if the concept of a domain is not supported)). The degree or "arity" of the relation scheme is the cardinality of this set. The degree of the EXAMPLE relation is SIX, that is, it is 6-ary. If all the attributes of one relation are defined on different domains, it makes sense to use the names of the corresponding domains to name the attributes (keeping in mind, of course, that this is just a convenient way of naming and does not eliminate the distinction between the concepts of domain and attribute). A database schema (in a structural sense) is a set of named relationship schemas.

A list that gives relational table names, their attributes (keys underlined) and foreign key definitions is called relational database schema. It is a preliminary result of the creation of a relational database life cycle stage. Example:

WORKER[ WORKER ID, NAME, HOURLY-RATE, SKILL-TYPE, SVPV-ID]

Foreign keys: SKILL-TYPE refers to SKILL

SVPV-ID refers WORKER

ASSIGNMENT[ WORKER ID, BLDG ID, START-DATE, NUMBER-OF-DAYS]

Foreign keys: WORKER-ID refers to WORKER

BLDG-ID refers to BVILDING

BVILDING[ BLDG ID, ADDRESS, TYPE, QLTY-LEVEL, STATVS]

SKILL[ SKILL-TYPE, BONUS-RATE, HOURS-PER-WEEK]

Tuple, relation. The tuple corresponding to a given relation schema is a set of (attribute name, value) pairs that contains one occurrence of each attribute name that belongs to the relation schema. "Value" is a valid value of the given attribute's domain (or data type if the concept of domain is not supported). Thus, the degree or "arity" of a tuple, i.e. the number of elements in it, coincides with the "arity" of the corresponding relation scheme. Simply put, a tuple is a set of named values ​​of a given type.

Attitude is a set of tuples corresponding to the same relation scheme. Sometimes, in order not to be confused, they say "relation-schema" and "relation-instance", sometimes the schema of the relation is called the header of the relation, and the relation as a set of tuples is called the body of the relation. In fact, the concept of relational schema is closest to the concept of a structural data type in programming languages. It would be quite logical to allow separately defining a relationship schema and then one or more relationships with that schema.

However, this is not common in relational databases. The schema name of a relation in such databases is always the same as the name of the corresponding instance relation. In classic relational databases, once the database schema is defined, only instance relationships change. New tuples can appear in them and existing tuples can be deleted or modified. However, in many implementations, changing the database schema is also allowed: defining new and changing existing relationship schemas. This is called the evolution of the database schema.

The usual everyday representation of a relation is a table, the heading of which is the schema of the relation, and the rows are the tuples of the relation-instance; in this case, the attribute names name the columns of this table. Therefore, sometimes they say "table column", meaning "relation attribute". When we move on to the practicalities of organizing relational databases and management tools, we will use this mundane terminology. This terminology is followed in most commercial relational DBMSs.

A relational database is a set of relationships whose names are the same as the relationship schema names in the database schema.

As you can see, the basic structural concepts of the relational data model (except for the concept of the domain) have a very simple intuitive interpretation, although in the theory of relational databases they are all defined absolutely formally and precisely.

Relationship table key. Tuples must not be repeated internally relationship tables and, accordingly, they must have a unique identifier - primary key. One or more attributes whose values ​​uniquely identify a table row are key tables.

The primary key is called simple , when it consists of a single attribute, or composite, when it consists of multiple attributes. In addition to the primary key, a relation can also have secondary keys.

secondary key this is a key whose values ​​can be repeated in different tuple strings. They can be used to search for a group of rows with the same secondary key value.

External key - it is a set of attributes of one table that is the key of another (or the same) table. Foreign keys provide important relationships between tables. They are used to link data from one table to data in another table. Foreign key attributes do not have to have the same names as the key attributes they correspond to.


Similar information.


RELATIONAL DATABASE AND ITS FEATURES. TYPES OF RELATIONS BETWEEN RELATIONAL TABLES

Relational database is a set of interrelated tables, each of which contains information about objects of a certain type. A table row contains data about one object (for example, a product, a customer), and the table columns describe various characteristics of these objects - attributes (for example, name, product code, customer information). Records, i.e. table rows, have the same structure - they consist of fields that store object attributes. Each field, i.e. column, describes only one characteristic of the object and has a strictly defined data type. All records have the same fields, only they display different informational properties of the object.

In a relational database, every table must have a primary key, a field or combination of fields that uniquely identifies each row in the table. If the key consists of several fields, it is called composite. The key must be unique and uniquely identify the entry. A single entry can be found by the value of the key. Keys are also used to organize information in the database.

Relational database tables must meet the requirements of relational normalization. Normalization of relations is a formal apparatus of restrictions on the formation of tables, which allows you to eliminate duplication, ensures the consistency of data stored in the database, and reduces labor costs for maintaining the database.

Let the Student table be created, containing the following fields: group number, full name, record number, date of birth, specialty name, faculty name. Such an organization of information storage will have a number of disadvantages:

  • duplication of information (the name of the specialty and faculty are repeated for each student), therefore, the volume of the database will increase;
  • the procedure for updating information in the table is difficult due to the need to edit each table entries.

Table normalization is designed to address these shortcomings. Available three normal forms of relationships.

First normal form. A relational table is reduced to first normal form if and only if none of its rows contains more than one value in any of its fields and none of its key fields is empty. So, if you want to get information from the Student table by the name of the student, then the Full Name field should be divided into the Surname, First Name, Patronymic parts.

Second normal form. A relational table is defined in second normal form if it satisfies the requirements of first normal form and all of its fields that are not included in the primary key are fully functionally dependent on the primary key. To bring the table to the second normal form, it is necessary to determine the functional dependence of the fields. Functional dependency of fields is a dependency in which only one value of the descriptive attribute corresponds to a certain value of the key attribute in the instance of the information object.

Third normal form. A table is in third normal form if it satisfies the requirements of second normal form and none of its non-key fields are functionally dependent on any other non-key fields. For example, in the Student table (Group No., Full Name, Gradebook No., Date of Birth, Headman) three fields - Gradebook No., Group No., Headman are in transitive dependence. The group number depends on the record book number, and the Headman depends on the group number. To eliminate the transitive dependency, it is necessary to transfer some of the fields of the Student table to another Group table. The tables will take the following form: Student (Group No., Full Name, Gradebook No., Date of Birth), Group (Group No., Headman).

The following operations are possible with relational tables:

  • Merging tables with the same structure. The result is a common table: first the first, then the second (concatenation).
  • Intersection of tables with the same structure. Result - those records that are in both tables are selected.
  • Subtraction of tables with the same structure. Result - those records that are not in the subtrahend are selected.
  • Sample (horizontal subset). Result - records that meet certain conditions are selected.
  • Projection (vertical subset). The result is a relation containing some of the fields from the source tables.
  • Cartesian Product of Two Tables The entries in the resulting table are obtained by concatenating each entry in the first table with each entry in the other table.

Relational tables can be related to each other, so data can be retrieved from multiple tables at the same time. The tables are linked together in order to ultimately reduce the size of the database. The relationship of each pair of tables is provided if they have the same columns.

There are the following types of information links:

  • one to one;
  • one-to-many;
  • many-to-many.

One-to-one communication assumes that one attribute of the first table corresponds to only one attribute of the second table and vice versa.

One-to-Many Relationship assumes that one attribute of the first table corresponds to several attributes of the second table.

Many-to-many relationship assumes that one attribute of the first table corresponds to several attributes of the second table and vice versa.

Share