Relational algebra

In database theory, relational algebra is a way to work with data using math-like rules. It was created by Edgar F. Codd.

Relational algebra is mainly used for relational databases. These databases store data in tables. The main query language for these databases is SQL. Queries in relational algebra return data in tables too.

The goal of relational algebra is to use special operations to change one or more tables into a new table. These operations can be linked together to create complicated queries. They turn many tables stored in a database into one result table.

Unary operators work on a single table. They can pick certain columns or rows to keep. Binary operators work on two tables and combine them into one. For example, they can join all rows from either table (union) or remove rows from the first table that appear in the second table (difference).

Introduction

Relational algebra is a way to work with data. It was created by E.F. Codd in 1970. It helps us ask questions about data stored in tables.

In relational algebra, data is organized in rows and columns, like in a table. Each row is called a tuple. Each column has a name called an attribute. These attributes help us find specific information from the table.

Set operators

Main article: Set theory

Relational algebra uses ideas from set theory. It uses things like set union, set difference, and Cartesian product. For set union and set difference, the two groups of data need to have the same types of information. The Cartesian product joins two groups of data that don’t share any common names. This creates new combinations of information. It helps when working with tables of data in databases.

Projection

Main article: Projection (relational algebra)

A projection is a way to look at just some parts of information from a group of data. Imagine you have a table with details about different animals, like their names, types, and colors. If you only want to see the names and colors, you would use a projection. This operation helps us focus on only the columns we need. It makes working with big sets of information easier. In databases, this idea is used to keep only the important details and remove extra ones.

Selection

Main article: Selection (relational algebra)

In relational algebra, a generalized selection is a way to pick out certain rows from a table. It uses a rule, called φ, to decide which rows to keep. This rule can include simple conditions and logical words like "and," "or," and "not." For example, if you have an address book and want to see only friends or business contacts, you could write a selection that keeps every row where "isFriend" is true or "isBusinessContact" is true. The result would be a new table with just those rows.

Rename

Main article: Rename (relational algebra)

A rename is a way to change the name of a column in a table. We do this so that tables can be combined more easily. For example, you might change a column named "isFriend" to "isBusinessContact" to help two tables match up. This operation changes only the column name, not the data inside the table.

Joins and join-like operators

Main article: Join (relational algebra)

Joins are key actions in relational algebra. They help mix information from different tables when the tables have related columns. Joins let us combine data that fits certain rules. This makes it simpler to see how different groups of information connect. These actions are important for asking questions and handling data in relational databases.

Common extensions

Relational algebra can be expanded with extra operations to handle more complex data tasks. One important expansion is the outer join. This combines information from two tables and includes rows that don’t have matching entries in the other table. This helps to show missing data by using a special "null" value.

Another key expansion is aggregation. This calculates totals, averages, counts, maximums, and minimums from groups of data. For example, we can find the highest balance in each branch of a bank by grouping accounts by branch and then finding the maximum balance in each group. These tools make relational algebra more useful for real-world database queries.

Main article: Join (SQL) § Outer join_%C2%A7_Outer_join)

Use of algebraic properties for query optimization

Relational database systems have a special part called a query optimizer. This optimizer tries to find the best way to run a query. It looks at different options, checks how hard each one is, and picks the easiest one.

Queries can be shown as trees. In these trees, the middle parts are actions, the ends are data tables, and smaller trees are smaller parts of the query. The main goal is to make these trees smaller and simpler. This helps the computer work faster. Another goal is to find parts that are the same in different queries and only do them once. This saves time.

Implementations

The first query language based on Dr. Codd's ideas was Alpha. Later, ISBL was developed, and it inspired many others. Business System 12 was an early relational database system that followed these ideas.

In 1998, Chris Date and Hugh Darwen created a language called Tutorial D for teaching about relational databases. Today, there are implementations like Rel and Bmg that follow these principles. Even SQL, the most common database language, is loosely based on relational algebra.