Spreadsheet

A spreadsheet is a computer application for computation, organization, analysis and storage of data in tabular form. It works like a digital version of paper accounting worksheets. You can enter numbers, words, or formulas into cells in a table. These formulas can automatically calculate values based on other cells.

In modern spreadsheets, several sheets, often called "worksheets" or "sheets", are gathered together in a file known as a "workbook". You can switch between these sheets using tabs. When you refer to a cell in a multi-sheet book, you add the sheet name, for example, "Sheet 1!C10".

Spreadsheets are very useful for trying out different ideas, known as "what-if" analysis, because you can change values and see how it affects calculations quickly. They can perform many kinds of math, including basic arithmetic, mathematical functions, and special financial accountancy tasks. They are used everywhere in business and many other areas where people need to organize and work with lists of information.

Basics

LANPAR, available in 1969, was the first electronic spreadsheet for big computers. VisiCalc (1979) was the first spreadsheet for small computers and helped make the Apple II very popular. Lotus 1-2-3 was the top spreadsheet when DOS was the main operating system. Today, Microsoft Excel is the most used spreadsheet on Windows and Macintosh computers. In 2006, Google launched a web version called Google Sheets, which is part of Google Drive.

A spreadsheet is a table with rows and columns. Each space in the table is called a cell. Columns are labeled with letters like "A" and "B," and rows are numbered like 1 and 2. You can refer to a cell by its column and row, like "C10." Cells can hold numbers, words, or dates. You can also put formulas in cells, which are special instructions that do calculations. For example, =5*3 would show the result 15.

The powerful part of spreadsheets is that formulas can use values from other cells. If you write =5*C10 and cell C10 has the number 3, the result will be 15. You can even use formulas on groups of cells, like adding up a list of numbers with the SUM function.

History

Paper spreadsheets

People have been organizing information in tables — grids with columns and rows — for thousands of years. Ancient groups like the Babylonians used clay tablets to keep records as early as 1800 BCE. Later, people used paper ledgers and astronomical charts to track important data.

Since around 1906, the word "spreadsheet" has been used in accounting to describe these grid-like tables in ledgers. Before computers, spreadsheets were also called "spreads" because they often appeared across two pages of a book or on large sheets of paper.

Electronic spreadsheets

Batch spreadsheet report generator BSRG

A "batch" spreadsheet works like a computer program that processes data and creates reports. This idea was first described in a 1961 paper by Richard Mattessich. Later work by Mattessich used computers to handle accounting and budgeting on big machines called mainframe computers using a language named FORTRAN IV. These early spreadsheets added or subtracted entire rows or columns of numbers instead of single cells.

In 1962, a program named BCL for Business Computer Language was created for an IBM 1130 computer. In 1963, it was updated for an IBM 7040 by R. Brian Walsh at Marquette University, Wisconsin. This program was written in Fortran. A book about BCL was published in 1964.

In the late 1960s, Xerox improved BCL for their computer system.

LANPAR spreadsheet compiler

A big step in spreadsheet history was made by Rene K. Pardo and Remy Landau. In 1970, they applied for a patent for a spreadsheet system that could calculate results in the correct order. After many years, their invention was recognized in court in 1983. The software was named LANPAR — LANguage for Programming Arrays at Random. It was created in 1969 after Pardo and Landau graduated from Harvard University.

LANPAR was used by Bell Canada, AT&T, and other telephone companies for budgeting. It was also used by General Motors. Its special feature was that it could calculate results in the right order without needing the user to update the spreadsheet repeatedly.

Autoplan/Autotab spreadsheet programming language

In 1968, three former employees from General Electric in Phoenix, Arizona started their own software company. They created a program called AutoPlan to help make business tables easier. This program worked on GE's time-sharing service and later on IBM mainframes under the name AutoTab. Another similar product, CSSTAB, was offered by National CSS.

AutoPlan/AutoTab was not an interactive program with a visual interface. Instead, it was a simple scripting language where users defined names, labels, and formulas for rows and columns.

IBM Financial Planning and Control System

In 1976, Brian Ingham at IBM Canada developed the IBM Financial Planning and Control System. It ran on IBM mainframes and was one of the first programs for financial planning to hide the programming language from users. It updated automatically when new versions were released. Users could set simple math relationships between rows and columns. It could handle very large spreadsheets and loaded financial data each month.

APLDOT modeling language

An early example of a powerful spreadsheet was APLDOT, created in 1976 at the United States Railway Association on an IBM 360/91 computer. It was used for many years to help with financial and costing models for the US Congress and for Conrail. APLDOT was called a "spreadsheet" because analysts used it in similar ways to paper spreadsheets.

VisiCalc for the Apple II

The idea of spreadsheets became widely known because of VisiCalc, created for the Apple II in 1979 by Dan Bricklin and Bob Frankston. It turned personal computers into business tools.

VisiCalc running on an Apple II

VisiCalc was the first spreadsheet to include many key features like a visual interface, automatic updates, and easy formula creation. Dan Bricklin got the idea after watching a professor correct calculations on a blackboard. His idea became VisiCalc.

VisiCalc for the Apple II became so popular that people bought Apple II computers just to use it. It was later made to work on other computers like CP/M machines, Atari 8-bit computers, and the Commodore PET, but it is most famous as an Apple II program.

SuperCalc for CP/M

SuperCalc was a spreadsheet made by Sorcim in 1980. It was included with the CP/M software package for the Osborne 1 portable computer and quickly became the standard spreadsheet for CP/M.

Lotus 1-2-3 spreadsheet for IBM PC DOS

When Lotus 1-2-3 came out in November 1982, it helped make the IBM Personal Computer more popular. It was made for IBM PC DOS and had faster speed and better graphics than VisiCalc on the Apple II, which made it very popular.

Microsoft Excel for Apple Macintosh and Windows

Main article: Microsoft Excel

Microsoft created the first version of Excel for the Apple Macintosh on September 30, 1985. It was later made to work on Windows, with version 2.05 released in November 1987. Microsoft's Windows 3.x in the early 1990s helped Excel become more popular than Lotus 1-2-3. By 1995, Excel was the leading spreadsheet, and in 2013, IBM stopped making Lotus 1-2-3.

Google Sheets, Online, Web-based spreadsheets

Mainframe spreadsheets

The Works Records System at ICI developed in 1974 on IBM 370/145
ExecuCalc, from Parallax Systems, Inc.: Released in late 1982, ExecuCalc was the first mainframe "visi-clone" which duplicated the features of VisiCalc on IBM mainframes with 3270 display terminals. Over 150 copies were licensed (35 to Fortune 500 companies). DP managers were attracted to compatibility and avoiding then-expensive PC purchases (see 1983 Computerworld magazine front page article and advertisement.)

Other spreadsheets

Notable current spreadsheet software:

Apache OpenOffice Calc is free and open-source.
Calligra Sheets (formerly KCalc)
Collabora Online Calc for mobile and desktop apps are free, open-source, cross-platform enterprise-ready editions of LibreOffice.
Corel Quattro Pro (WordPerfect Office)
Gnumeric is a free spreadsheet by the GNU project
Kingsoft Spreadsheets
LibreOffice Calc is free, open-source and cross platform.
Numbers is Apple Inc.'s spreadsheet software, part of iWork.
OnlyOffice Docs Spreadsheet editor is free and open source.
PlanMaker (SoftMaker Office)
Pyspread

Discontinued spreadsheet software:

20/20
3D-Calc for Atari ST computers
As Easy As
Framework by Forefront Corporation/Ashton-Tate (1983–84)
GNU Oleo – A traditional terminal mode spreadsheet for UNIX/UNIX-like systems
IBM Lotus Symphony (2007)
Javelin Software
KCells
Lucid 3-D
Lotus Improv
Lotus Jazz for Macintosh
Lotus Symphony (1984)
MultiPlan
Claris' Resolve (Macintosh)
NeoOffice
Resolver One
Borland's Quattro Pro
SC IM (formerly SC - Spreadsheet Calculator)
SIAG
SuperCalc
T/Maker
Target Planner Calc for CP/M and TRS-DOS
Wingz for Macintosh

Other products

Several companies tried to enter the spreadsheet market with new ideas. Lotus created Lotus Improv, which was successful in finance because of its strong data mining abilities.

Spreadsheet 2000 tried to make formulas much simpler but did not succeed widely.

Concepts

The main ideas are about a grid of cells, called a sheet, with either raw data, called values, or formulas in the cells. Formulas tell how to compute new values from existing ones. Values are general numbers, but can also be pure text, dates, months, etc. Extensions of these concepts include logical spreadsheets. Various tools for programming sheets, visualizing data, remotely connecting sheets, displaying cells' dependencies, etc. are commonly provided.

A "cell" can be thought of as a box for holding data. A single cell is usually referenced by its column and row (C2 would represent the cell containing the value 30 in the example table below). Usually rows, representing the dependent variables, are referenced in decimal notation starting from 1, while columns representing the independent variables use 26-adic bijective numeration using the letters A-Z as numerals. Its physical size can usually be tailored to its content by dragging its height or width at box intersections (or for entire columns or rows by dragging the column- or row-headers).

An array of cells is called a sheet or worksheet. It is analogous to an array of variables in a conventional computer program (although certain unchanging values, once entered, could be considered, by the same analogy, constants). In most implementations, many worksheets may be located within a single spreadsheet. A worksheet is simply a subset of the spreadsheet divided for the sake of clarity. Functionally, the spreadsheet operates as a whole and all cells operate as global variables within the spreadsheet (each variable having 'read' access only except its containing cell).

A cell may contain a value or a formula, or it may simply be left empty. By convention, formulas usually begin with = sign.

A value can be entered from the computer keyboard by directly typing into the cell itself. Alternatively, a value can be based on a formula (see below), which might perform a calculation, display the current date or time, or retrieve external data such as a stock quote or a database value.

Computer scientist Alan Kay used the term value rule to summarize a spreadsheet's operation: a cell's value relies solely on the formula the user has typed into the cell. The formula may rely on the value of other cells, but those cells are likewise restricted to user-entered data or formulas. There are no 'side effects' to calculating a formula: the only output is to display the calculated result inside its occupying cell. There is no natural mechanism for permanently modifying the contents of a cell unless the user manually modifies the cell's contents. In the context of programming languages, this yields a limited form of first-order functional programming.

A standard of spreadsheets since the 1980s, this optional feature eliminates the need to manually request the spreadsheet program to recalculate values (nowadays typically the default option unless specifically 'switched off' for large spreadsheets, usually to improve performance). Some earlier spreadsheets required a manual request to recalculate since the recalculation of large or complex spreadsheets often reduced data entry speed. Many modern spreadsheets still retain this option.

Recalculation generally requires that there are no circular dependencies in a spreadsheet. A dependency graph is a graph that has a vertex for each object to be updated, and an edge connecting two objects whenever one of them needs to be updated earlier than the other. Dependency graphs without circular dependencies form directed acyclic graphs, representations of partial orderings (in this case, across a spreadsheet) that can be relied upon to give a definite result.

This feature refers to updating a cell's contents periodically with a value from an external source—such as a cell in a "remote" spreadsheet. For shared, Web-based spreadsheets, it applies to "immediately" updating cells another user has updated. All dependent cells must be updated also.

Use of named column variables x & y in Microsoft Excel. Formula for y=x2 resembles Fortran, and Name Manager shows the definitions of x & y.

Once entered, selected cells (or the entire spreadsheet) can optionally be "locked" to prevent accidental overwriting. Typically this would apply to cells containing formulas but might apply to cells containing "constants" such as a kilogram/pounds conversion factor (2.20462262 to eight decimal places). Even though individual cells are marked as locked, the spreadsheet data are not protected until the feature is activated in the file preferences.

A cell or range can optionally be defined to specify how the value is displayed. The default display format is usually set by its initial content if not specifically previously set, so that for example "31/12/2007" or "31 Dec 2007" would default to the cell format of date. Similarly adding a % sign after a numeric value would tag the cell as a percentage cell format. The cell contents are not changed by this format, only the displayed value.

Some cell formats such as "numeric" or "currency" can also specify the number of decimal places.

This can allow invalid operations (such as doing multiplication on a cell containing a date), resulting in illogical results without an appropriate warning.

Depending on the capability of the spreadsheet application, each cell (like its counterpart the "style" in a word processor) can be separately formatted using the attributes of either the content (point size, color, bold or italic) or the cell (border thickness, background shading, color). To aid the readability of a spreadsheet, cell formatting may be conditionally applied to data; for example, a negative number may be displayed in red.

A cell's formatting does not typically affect its content and depending on how cells are referenced or copied to other worksheets or applications, the formatting may not be carried with the content.

In most implementations, a cell, or group of cells in a column or row, can be "named" enabling the user to refer to those cells by a name rather than by a grid reference. Names must be unique within the spreadsheet, but when using multiple sheets in a spreadsheet file, an identically named cell range on each sheet can be used if it is distinguished by adding the sheet name. One reason for this usage is for creating or running macros that repeat a command across many sheets. Another reason is that formulas with named variables are readily checked against the algebra they are intended to implement (they resemble Fortran expressions). The use of named variables and named functions also makes the spreadsheet structure more transparent.

Cell reference

In place of a named cell, an alternative approach is to use a cell (or grid) reference. Most cell references indicate another cell in the same sheet, but a cell reference can also refer to a cell in a different sheet within the same spreadsheet, or (depending on the implementation) to a cell in another spreadsheet entirely, or a value from a remote application.

A typical cell reference in "A1" style consists of one or two case-insensitive letters to identify the column (if there are up to 256 columns: A–Z and AA–IV) followed by a row number (e.g., in the range 1–65536). Either part can be relative (it changes when the formula it is in is moved or copied), or absolute (indicated with $ in front of the part concerned of the cell reference). The alternative "R1C1" reference style consists of the letter R, the row number, the letter C, and the column number; relative row or column numbers are indicated by enclosing the number in square brackets. Most current spreadsheets use the A1 style, some providing the R1C1 style as a compatibility option.

Animation of a simple spreadsheet that multiplies values in the left column by 2, then sums the calculated values from the right column to the bottom-most cell. In this example, only the values in the A column are entered (10, 20, 30), and the remainder of cells are formulas. Formulas in the B column multiply values from the A column using relative references, and the formula in B4 uses the SUM() function to find the sum of values in the B1:B3 range.

When the computer calculates a formula in one cell to update the displayed value of that cell, cell reference(s) in that cell, naming some other cell(s), causes the computer to fetch the value of the named cell(s).

A cell on the same "sheet" is usually addressed as:

=A1

A cell on a different sheet of the same spreadsheet is usually addressed as:

=SHEET2!A1 (that is; the first cell in sheet 2 of the same spreadsheet).

Some spreadsheet implementations in Excel allow cell references to another spreadsheet (not the currently open and active file) on the same computer or a local network. It may also refer to a cell in another open and active spreadsheet on the same computer or network that is defined as shareable. These references contain the complete filename, such as:

='C:\Documents and Settings\Username\My spreadsheets\[main sheet]Sheet1!A1

In a spreadsheet, references to cells automatically update when new rows or columns are inserted or deleted. Care must be taken, however, when adding a row immediately before a set of column totals to ensure that the totals reflect the values of the additional rows—which they often do not.

A circular reference occurs when the formula in one cell refers—directly, or indirectly through a chain of cell references—to another cell that refers back to the first cell. Many common errors cause circular references. However, some valid techniques use circular references. These techniques, after many spreadsheet recalculations, (usually) converge on the correct values for those cells.

Cell ranges

Likewise, instead of using a named range of cells, a range reference can be used. A reference to a range of cells typically takes the form (A1:A6), which specifies all the cells in column A from row 1 through row 6. A formula such as "=SUM(A1:A6)" would add all the cells specified and put the result in the cell containing the formula itself.

Use of user-defined function sq(x) in Microsoft Excel

In the earliest spreadsheets, cells were a simple two-dimensional grid. Over time, the model has expanded to include a third dimension, and in some cases a series of named grids, called sheets. The most advanced examples allow inversion and rotation operations which can slice and project the data set in various ways.

A formula identifies the calculation needed to place the result in the cell it is contained within. A cell containing a formula, therefore, has two display components; the formula itself and the resulting value. The formula is normally only shown when the cell is selected by "clicking" the mouse over a particular cell; otherwise, it contains the result of the calculation.

A formula assigns values to a cell or range of cells, and typically has the format:

where the expression consists of:

values, such as 2, 9.14 or 6.67E-11;
references to other cells, such as, e.g., A1 for a single cell or B1:B3 for a range;
arithmetic operators, such as +, -, *, /, and others;
relational operators, such as >=, 100, "More than 100%", SUM(A1:A6)

Further examples:

=IF(AND(A1<>"",B1<>""),A1/B1,"") means that if both cells A1 and B1 are not <> empty "", then divide A1 by B1 and display, other do not display anything.

=IF(AND(A1<>"",B1<>""),IF(B1<>0,A1/B1,"Division by zero"),"") means that if cells A1 and B1 are not empty, and B1 is not zero, then divide A1 by B1, if B1 is zero, then display "Division by zero", and do not display anything if either A1 and B1 are empty.

=IF(OR(A1<>"",B1<>""),"Either A1 or B1 show text","") means to display the text if either cells A1 or B1 are not empty.

Subroutine in Microsoft Excel writes values calculated using x into y.

The best way to build up conditional statements is step by step composing followed by trial and error testing and refining code.

A spreadsheet does not have to contain any formulas at all, in which case it could be considered merely a collection of data arranged in rows and columns (a database) like a calendar, timetable, or simple list. Because of its ease of use, formatting, and hyperlinking capabilities, many spreadsheets are used solely for this purpose.

Spreadsheets usually contain several supplied functions, such as arithmetic operations (for example, summations, averages, and so forth), trigonometric functions, statistical functions, and so forth. In addition there is often a provision for user-defined functions. In Microsoft Excel, these functions are defined using Visual Basic for Applications in the supplied Visual Basic editor, and such functions are automatically accessible on the worksheet. Also, programs can be written that pull information from the worksheet, perform some calculations, and report the results back to the worksheet. In the figure, the name sq is user-assigned, and the function sq is introduced using the Visual Basic editor supplied with Excel. Name Manager displays the spreadsheet definitions of named variables x & y.

Functions themselves cannot write into the worksheet but simply return their evaluation. However, in Microsoft Excel, subroutines can write values or text found within the subroutine directly to the spreadsheet. The figure shows the Visual Basic code for a subroutine that reads each member of the named column variable x, calculates its square, and writes this value into the corresponding element of named column variable y. The y column contains no formula because its values are calculated in the subroutine, not on the spreadsheet, and simply are written in.

Whenever a reference is made to a cell or group of cells that are not located within the current physical spreadsheet file, it is considered as accessing a "remote" spreadsheet. The contents of the referenced cell may be accessed either on the first reference with a manual update or more recently in the case of web-based spreadsheets, as a near real-time value with a specified automatic refresh interval.

Many spreadsheet applications permit charts and graphs (e.g., histograms, pie charts) to be generated from specified groups of cells that are dynamically re-built as cell contents change. The generated graphic component can either be embedded within the current sheet or added as a separate object. To create an Excel histogram, a formula based on the REPT function can be used.

In the late 1980s and early 1990s, first Javelin Software and Lotus Improv appeared. Unlike models in a conventional spreadsheet, they utilized models built on objects called variables, not on data in cells of a report. These multi-dimensional spreadsheets enabled viewing data and algorithms in various self-documenting ways, including simultaneous multiple synchronized views. For example, users of Javelin could move through the connections between variables on a diagram while seeing the logical roots and branches of each variable. This is an example of what is perhaps its primary contribution of the earlier Javelin—the concept of traceability of a user's logic or model structure through its twelve views. A complex model can be dissected and understood by others who had no role in its creation.

In these programs, a time series, or any variable, was an object in itself, not a collection of cells that happen to appear in a row or column. Variables could have many attributes, including complete awareness of their connections to all other variables, data references, and text and image notes. Calculations were performed on these objects, as opposed to a range of cells, so adding two-time series automatically aligns them in calendar time, or in a user-defined time frame. Data were independent of worksheets—variables, and therefore data, could not be destroyed by deleting a row, column, or entire worksheet. For instance, January's costs are subtracted from January's revenues, regardless of where or whether either appears in a worksheet. This permits actions later used in pivot tables, except that flexible manipulation of report tables, was but one of many capabilities supported by variables. Moreover, if costs were entered by week and revenues by month, the program could allocate or interpolate as appropriate. This object design enabled variables and whole models to reference each other with user-defined variable names and to perform multidimensional analysis and massive, but easily editable consolidations.

Spreadsheets that have a formula language based upon logical expressions, rather than arithmetic expressions are known as logical spreadsheets. Such spreadsheets can be used to reason deductively about their cell values.

My Spreadsheet
	A	B	C	D
01	Sales	100000	30000	70000
02	Purchases	25490	30	200

=expression

Programming issues

Spreadsheets have become a favorite tool for people who create their own programs but aren't professional programmers. Many find it simpler to do calculations with spreadsheets than by writing traditional programs. This is because spreadsheets use space to show how things are connected, which is easier for people to understand than typing lines of code. They are also forgiving, letting parts of a program work even if other parts aren't finished yet.

Spreadsheet programs are made to do many kinds of tasks using space instead of time to organize things. They can update automatically when information changes and have grown to include tools from programming languages to do even more.

Shortcomings

Spreadsheets are very useful tools, but they also have some problems. One issue is that the way cells are labeled with letters and numbers can feel unfriendly. Many people who use spreadsheets haven’t had formal training, and often no one checks their work for mistakes. Research suggests that about 1% of all formulas in spreadsheets may be wrong.

Spreadsheets can be tricky to use for complex tasks because each cell must be handled one at a time. It’s easy to forget what each cell means, especially when there are many. Using named variables instead of cell addresses can help make spreadsheets easier to understand and check for errors. When a formula needs to be changed, it can be difficult if it appears in many cells. Changing the size of a table by adding or removing rows and columns can also be very time-consuming and sometimes requires starting over. Working together on spreadsheets can be challenging when everyone is editing individual cells.

There are other problems too. Some people think other types of software might be better for certain tasks like budgeting or statistics. Older versions of a popular spreadsheet program could only handle a limited number of rows and columns, which caused problems with large amounts of data. Without proper checks, it’s easy for mistakes or even intentional errors to slip into spreadsheets. Because spreadsheets don’t always track who made changes or when, it can be hard to follow rules or catch errors. Despite these issues, tools exist to help improve spreadsheets, but many users aren’t aware of them or don’t use them.

Spreadsheet risk

Further information: Financial modeling § Accounting, and List of spreadsheet mistakes

Spreadsheet risk is the chance of getting the wrong answer from a spreadsheet that people use to make important decisions. This can happen when someone puts in the wrong numbers, makes a mistake in the calculations, or forgets to update important information, like exchange rates. These errors can sometimes cost very large amounts of money.

Even though these mistakes can be serious, studies show that many big companies don’t have strong rules to check their spreadsheets. This means mistakes can more easily happen. In one famous case, a student found big mistakes in a spreadsheet that was used to support important economic decisions.