Nucleic acid notation
Adapted from Wikipedia · Discoverer experience
Nucleic acid notation is a system used by scientists to represent the building blocks of DNA, which carries the instructions for all living things. This notation was officially created in 1970 by the International Union of Pure and Applied Chemistry (IUPAC). It uses simple letters—G, C, A, and T—to stand for the four main parts, or nucleotides, found in DNA.
DNA, short for deoxyribonucleic acids, is like a tiny instruction manual inside every cell of a living organism. Understanding DNA helps scientists learn about life, health, and how animals and plants grow and change.
As scientists have learned more about DNA, they have created new ways to write and study these instructions. Some of these new notations use the size and shape of symbols to make it easier to analyze and work with genetic information. These tools help researchers solve mysteries about living things and even create new medicines.
Single nucleobase and nucleoside
Under the IUPAC system, nucleobases are represented by the first letters of their chemical names: guanine, cytosine, adenine, and thymine. This shorthand also includes special characters to show possible variations in DNA sequences, which helps scientists report errors or differences in DNA.
The system also provides symbols for rare nucleobases and ways to show changes to the basic DNA building blocks. These symbols help scientists describe complex DNA structures and modifications clearly.
Nucleic acid chain
The positions of the carbons in the ribose sugar that forms the backbone of the nucleic acid chain are numbered. This numbering helps us understand the direction of nucleic acids, which is usually from 5' to 3'. This is the natural way DNA and RNA are built and also how messages are read by the ribosome.
We can add extra groups to these chains by using special prefixes or suffixes. For example, (CNEt)-A-C-(Ph) shows a chain with a cyanoethyl group at one end and a phenyl group at the other. Sometimes, groups can connect two positions, like in A-C>p, which has a cyclic phosphate cap linking the 2' and 3' positions.
Legibility
The system for writing DNA uses letters like G, C, A, and T, which are easy to find on a keyboard and widely used. However, these uppercase letters can sometimes be hard to tell apart, especially the letters C and G, which look very similar.
When scientists write DNA sequences in files, they sometimes use lowercase letters. This shows parts of the DNA that might repeat many times, and the exact length isn't always known.
Alternative visually enhanced notations
Scientists have created different ways to show DNA sequences to make them easier to read. These methods use special symbols or shapes instead of the usual letters G, C, A, and T. One method, called the Stave Projection, uses circles on lines like musical notes to represent the DNA bases.
Another method uses different shapes—like rectangles, squares, small circles, and diamonds—to stand for the DNA bases. There are also fonts, such as the DNA Skyline, that use tall blocks to show the bases. These creative ways help scientists compare and study DNA more easily.
Main article: multiple sequence alignment
Base pairing
Base pairing between two chains of nucleic acids should be shown using a "•" symbol. For example, you might see A•T, which means adenine pairs with thymine. The IUPAC rules from 1970 say we should not use "-" because that symbol represents a covalent bond, nor should we use ":" or "/" as these can be mistaken for ratios. Leaving out any symbol can also cause confusion, as it might look like a polymer sequence.
In some special cases, like Hoogsteen base pairing, scientists need to show different kinds of hydrogen bonds. Since IUPAC has not given specific rules for this, some researchers use symbols like "*" or ":" to make the differences clear.
This article is a child-friendly adaptation of the Wikipedia article on Nucleic acid notation, available under CC BY-SA 4.0.
Images from Wikimedia Commons. Tap any image to view credits and license.
Safekipedia