Safekipedia

String (computer science)

Adapted from Wikipedia · Discoverer experience

Illustration showing how a string is made up of individual characters in computer science.

In computer programming, a string is a sequence of characters, such as letters, numbers, or symbols, used to represent words, sentences, or other pieces of text. These sequences can be fixed, meaning they cannot change after they are created, or they can be changed and grown as needed. Strings are often stored in memory using an array data structure that holds each character in order.

Strings are typically made up of characters, and are often used to store human-readable data, such as words or sentences.

Depending on the programming language, a string can either take up a fixed amount of space in memory or adjust its size dynamically. When a string is written directly in the code, it is called a string literal or an anonymous string.

In more advanced areas like formal languages, which are studied in mathematical logic and theoretical computer science, a string is seen as a sequence of symbols chosen from a specific group called an alphabet. This helps computer scientists understand how languages and computations work at a deeper level.

Purpose

Strings are mainly used to store text that people can read, such as words and sentences. They help programs show messages to users or receive input from them. For example, a program might display a message like "file upload complete" to end users. Users can also type in text, like "I got a new job today" on a social media site, which the program stores in a database.

Strings can also hold data that isn’t meant for reading, like letters representing DNA sequences such as "AGATGCCGT" or special codes like "?action=edit" in a query string. While strings can sometimes refer to other types of data, they usually mean a sequence of characters.

History

The word "string" has been used for a long time to describe things arranged in a line. In the 1800s, people who arranged letters for printing used "string" to talk about a row of printed letters.

Later, the idea of a "string" was used in math and language studies to mean a group of symbols in a certain order, without worrying about what the symbols mean. This helped people study how symbols behave in rules-based systems. One of the first computer languages to work well with strings was COMIT in the 1950s, followed by SNOBOL in the early 1960s.

String datatypes

See also: Comparison of programming languages (string functions)

A string datatype is a type of data used in computer programming to store sequences of characters. Strings are very important and are used in almost every programming language. Some languages treat strings as primitive types, while others treat them as composite types. The way a programming language handles strings can change how they are written and used.

Strings can be fixed-length, meaning they have a set maximum size decided when the program is created. Or they can be variable-length, which can change size during the program’s run depending on what needs to be stored. Most modern programming languages use variable-length strings, but they still depend on how much memory is available.

FRANKNULkefw
4616521641164E164B1600166B16651666167716
lengthFRANKkefw
05164616521641164E164B166B16651666167716

Literal strings

Main article: String literal

Sometimes, we need to put words and sentences inside files that both people and computers can read. This is important for writing code or setting up programs. Using a special invisible character to mark the end of a word isn't easy because we can't see it or type it easily on a keyboard.

There are two common ways to do this. One way is to put the words between quotation marks, like "hello" or 'hello'. This works for most programming languages. If we need to use special characters like the quotation mark itself or characters we can't see, we can use escape sequences, which usually start with a backslash. Another way is to end the word with a newline, like in some Windows files called INI files.

Non-text strings

A string in computer science can mean any sequence of similar data, not just letters. For example, a bit string or byte string can hold binary data, like information from a computer or phone. Whether this data is stored as a string depends on what the programmer needs and what the programming language can do.

In the programming language C, there is a big difference between a "string" (which always ends with a special sign) and an "array of characters" (which may not). Using certain C tools on an array can sometimes cause safety issues later.

String processing algorithms

There are many ways to work with strings in computer programs, and each way has its own good and bad points. We can look at these methods to see how fast they work and how much space they need. The term stringology started in 1984 by a computer expert to describe the study of these methods.

Some types of methods include finding parts of strings, changing strings, sorting them, using special patterns called regular expressions, breaking strings apart, and finding patterns in sequences. More advanced methods use special tools like suffix trees and finite-state machines.

Character string-oriented languages and utilities

Character strings are very helpful in computer programs, so some languages are made to work with them easily. Examples include AWK, Icon, MUMPS, Perl, Rexx, Ruby, sed, SNOBOL, Tcl, and TTM.

Many tools on Unix systems can change and work with strings simply. Files and streams can also be treated like strings. Some APIs such as Multimedia Control Interface, embedded SQL, or printf use strings to store commands.

Scripting languages like Perl, Python, Ruby, and Tcl use special text patterns called regular expressions to help with text tasks. Perl is well-known for this. Some languages, like Perl and Ruby, also let you add values directly into strings while writing code, which is called string interpolation.

Character string functions

See also: Comparison of programming languages (string functions)

String functions help us make strings or change what they say, and they can also tell us about a string. Different computer programming language have different names for these functions.

A simple example is the string length function. This tells us how many letters are in a string without changing the string. It might be called length, len, or size. For instance, length("hello world") would give us the number 11. Another common function is concatenation, which joins two strings together, often using the + sign.

Some microprocessor instruction set architectures have special commands for working with strings, like copying blocks of text (for example, in intel x86m REPNZ MOVSB).

Formal theory

Strings are sequences of characters, like letters or numbers, used in computer programming. They can be fixed in length or changeable after creation.

For example, if we have a set of symbols like {0, 1}, the combination "01011" is a string made from those symbols. The length of a string is simply how many symbols it contains. The empty string has no symbols at all.

Strings can be combined, cut into pieces, reversed, or rotated, and they can be ordered in a specific way called lexicographical order, similar to dictionary order.

Related articles

This article is a child-friendly adaptation of the Wikipedia article on String (computer science), available under CC BY-SA 4.0.

Images from Wikimedia Commons. Tap any image to view credits and license.