Relational algebra — Safekipedia Discoverer

In database theory, relational algebra is a theory that uses algebraic structures for modeling data and defining queries on it with well founded semantics. The theory was introduced by Edgar F. Codd.

The main application of relational algebra is to provide a theoretical foundation for relational databases, particularly query languages for such databases, chief among which is SQL. Relational databases store tabular data represented as relations. Queries over relational databases often likewise return tabular data represented as relations.

The main purpose of relational algebra is to define operators that transform one or more input relations to an output relation. Given that these operators accept relations as input and produce relations as output, they can be combined and used to express complex queries that transform multiple input relations (whose data are stored in the database) into a single output relation (the query results).

Unary operators accept a single relation as input. Examples include operators to filter certain attributes (columns) or tuples (rows) from an input relation. Binary operators accept two relations as input and combine them into a single output relation. For example, taking all tuples found in either relation (union), removing tuples from the first relation found in the second relation (difference), extending the tuples of the first relation with tuples in the second relation matching certain conditions, and so forth.

Introduction

Relational algebra is a way to work with data that was introduced by E.F. Codd in 1970. It helps us understand how to ask questions about data stored in tables.

In relational algebra, data is organized in rows and columns, like in a table. Each row is called a tuple, and each column has a special name called an attribute. These attributes help us find and organize specific pieces of information from the table.

Set operators

Main article: Set theory

Relational algebra uses ideas from set theory, like set union, set difference, and Cartesian product. For set union and set difference, the two groups of data must have the same types of information. The Cartesian product combines two groups of data that don’t share any common names, creating new combinations of information. This helps in working with tables of data in databases.

Projection

Main article: Projection (relational algebra)

A projection is a way to focus on just certain pieces of information from a group of data. Imagine you have a table full of details about different animals, like their names, types, and colors. If you only want to see the names and colors, you would use a projection. This operation helps us narrow down the data to only the columns we need, making it easier to work with large sets of information. In databases, this idea is used to remove extra details and keep only what’s important.

Selection

Main article: Selection (relational algebra)

In relational algebra, a generalized selection is a way to pick out certain rows from a table. It uses a rule, called φ, to decide which rows to keep. This rule can include simple conditions and logical words like "and," "or," and "not." For example, if you have an address book and want to see only friends or business contacts, you could write a selection that keeps every row where "isFriend" is true or "isBusinessContact" is true. The result would be a new table with just those rows.

Rename

Main article: Rename (relational algebra)

A rename is a way to change the name of an attribute, or column heading, in a table. This helps when you need to make names match for combining tables together. For example, you might change an attribute called "isFriend" to "isBusinessContact" so that two tables can be joined properly. This operation does not change the data itself, only the name of the column.

Joins and join-like operators

Main article: Join (relational algebra)

Joins are important operations in relational algebra that help combine information from different tables based on related columns. They allow us to bring together data that matches certain conditions, making it easier to understand relationships between different sets of information. These operations are fundamental in querying and managing data in relational databases.

Common extensions

Relational algebra can be expanded with extra operations to handle more complex data tasks. One important expansion is the outer join, which combines information from two tables and includes rows that don’t have matching entries in the other table. This helps to show missing data by using a special "null" value.

Another key expansion is aggregation, where we calculate totals, averages, counts, maximums, and minimums from groups of data. For example, we can find the highest balance in each branch of a bank by grouping accounts by branch and then finding the maximum balance in each group. These tools make relational algebra more useful for real-world database queries.

Main article: Join (SQL) § Outer join

Use of algebraic properties for query optimization

Relational database systems often have a special part called a query optimizer. This optimizer tries to find the best way to run a query by looking at different options, checking how hard each one is, and picking the easiest one. When queries are written using special math rules called relational algebra, the optimizer can change the query using these rules to make it faster.

Queries can be shown as trees, where the middle parts are actions, the ends are data tables, and smaller trees are smaller parts of the query. The main goal is to make these trees smaller and simpler, so the computer can work faster. Another goal is to find parts that are the same in different queries and only do them once, which saves time.

Implementations

The first query language based on Codd's ideas was Alpha, created by Dr. Codd himself. Later, ISBL was developed, and it inspired many others. Business System 12 was an early relational database system that followed these ideas.

In 1998, Chris Date and Hugh Darwen created a language called Tutorial D for teaching about relational databases. Today, there are implementations like Rel and Bmg that follow these principles. Even SQL, the most common database language, is loosely based on relational algebra, though it works with tables that allow duplicate entries, which changes some of the rules.