Nucleic acid notation — Safekipedia Adventurer

Nucleic acid notation is a system scientists use to show the building blocks of DNA. DNA carries the instructions for all living things. This notation was created in 1970 by the International Union of Pure and Applied Chemistry (IUPAC). It uses simple letters—G, C, A, and T—to stand for the four main parts in DNA.

DNA, short for deoxyribonucleic acids, is like a tiny instruction manual inside every cell of a living organism. Learning about DNA helps scientists understand life, health, and how animals and plants grow and change.

As scientists have learned more about DNA, they have created new ways to write and study these instructions. Some of these new notations use the size and shape of symbols to make it easier to analyze genetic information. These tools help researchers learn about living things and create new medicines.


## Single nucleobase and nucleoside

Under the IUPAC system, nucleobases are shown by the first letters of their names: guanine, cytosine, adenine, and thymine. This short way of writing helps scientists talk about DNA.

The system also has symbols for rare nucleobases and ways to show changes in DNA building blocks. These symbols help scientists describe DNA clearly. 

<table><caption><a href="/wiki/International_Union_of_Pure_and_Applied_Chemistry">IUPAC</a> degenerate base symbols<sup></sup></caption><tbody><tr><th rowspan="2">Description</th><th rowspan="2">Symbol</th><th colspan="5">Bases represented</th><th rowspan="2"><a href="/wiki/Complementarity_(molecular_biology)">Complementary</a><br>bases</th></tr><tr><th>No.</th><th>A</th><th>C</th><th>G</th><th>T</th></tr><tr><td><a href="/wiki/Adenine">Adenine</a></td><td><b>A</b></td><td rowspan="5">1</td><td>A</td><td></td><td></td><td></td><td>T</td></tr><tr><td><a href="/wiki/Cytosine">Cytosine</a></td><td><b>C</b></td><td></td><td>C</td><td></td><td></td><td>G</td></tr><tr><td><a href="/wiki/Guanine">Guanine</a></td><td><b>G</b></td><td></td><td></td><td>G</td><td></td><td>C</td></tr><tr><td><a href="/wiki/Thymine">Thymine</a></td><td><b>T</b></td><td></td><td></td><td></td><td>T</td><td>A</td></tr><tr><td><a href="/wiki/Uracil">Uracil</a></td><td><b>U</b></td><td></td><td></td><td></td><td>U</td><td>A</td></tr><tr><td>Weak</td><td><b>W</b></td><td rowspan="6">2</td><td>A</td><td></td><td></td><td>T</td><td>W</td></tr><tr><td>Strong</td><td><b>S</b></td><td></td><td>C</td><td>G</td><td></td><td>S</td></tr><tr><td><a href="/wiki/Amine">Amino</a></td><td><b>M</b></td><td>A</td><td>C</td><td></td><td></td><td>K</td></tr><tr><td><a href="/wiki/Ketone">Ketone</a></td><td><b>K</b></td><td></td><td></td><td>G</td><td>T</td><td>M</td></tr><tr><td><a href="/wiki/Purine">Purine</a></td><td><b>R</b></td><td>A</td><td></td><td>G</td><td></td><td>Y</td></tr><tr><td><a href="/wiki/Pyrimidine">Pyrimidine</a></td><td><b>Y</b></td><td></td><td>C</td><td></td><td>T</td><td>R</td></tr><tr><td>Not A</td><td><b>B</b></td><td rowspan="4">3</td><td></td><td>C</td><td>G</td><td>T</td><td>V</td></tr><tr><td>Not C</td><td><b>D</b></td><td>A</td><td></td><td>G</td><td>T</td><td>H</td></tr><tr><td>Not G</td><td><b>H</b></td><td>A</td><td>C</td><td></td><td>T</td><td>D</td></tr><tr><td>Not T<sup></sup></td><td><b>V</b></td><td>A</td><td>C</td><td>G</td><td></td><td>B</td></tr><tr><td>Any one base</td><td><b>N</b></td><td>4</td><td>A</td><td>C</td><td>G</td><td>T</td><td>N</td></tr><tr><td>Gap</td><td><b>-</b></td><td>0</td><td></td><td></td><td></td><td></td><td>-</td></tr><tr><td colspan="8"><div><div><div><div> Not U for RNA</div></div></div></div></td></tr></tbody></table>

## Nucleic acid chain

The [carbons](/wiki/Carbon) in the [ribose](/wiki/Ribose) sugar help form the backbone of nucleic acids. This numbering shows us the direction of nucleic acids, usually from 5' to 3'. This is how DNA and RNA are built and how messages are read by the [ribosome](/wiki/Ribosome).

We can add extra groups to these chains with special prefixes or suffixes. For example, (CNEt)-A-C-(Ph) shows a chain with a cyanoethyl group at one end and a phenyl group at the other. Sometimes, groups connect two positions, like in A-C>p, which has a cyclic phosphate cap linking the 2' and 3' positions.

## Legibility

The system for writing DNA uses letters like G, C, A, and T. These letters are easy to type and commonly used. But sometimes, the letters C and G can look very similar, making them hard to tell apart. 

Scientists may use lowercase letters when writing DNA sequences in files. This helps show parts of the DNA that repeat many times, especially when the exact length isn’t known.

## Alternative visually enhanced notations

Scientists have created different ways to show DNA sequences to make them easier to read. These methods use special symbols or shapes instead of the usual letters G, C, A, and T. 

One method, called the Stave Projection, uses circles on lines like musical notes to represent the DNA bases. 

Another method uses different shapes—like rectangles, squares, small circles, and diamonds—to stand for the DNA bases. There are also fonts, such as the DNA Skyline, that use tall blocks to show the bases. These creative ways help scientists compare and study DNA more easily. 

<figure class="inline-figure"><img src="https://upload.wikimedia.org/wikipedia/commons/e/ed/Stave_Projection.jpg" data-caption="The Stave Projection uses spatially distributed dots to enhance the legibility of DNA sequences."><figcaption>The Stave Projection uses spatially distributed dots to enhance the legibility of DNA sequences.</figcaption></figure>

Main article: [multiple sequence alignment](/wiki/Multiple_sequence_alignment)

[Ambigrams](/wiki/Ambigrams)

[ambigraphic](/wiki/Ambigraphic)

## Base pairing

Base pairing between two chains of nucleic acids should be shown using a "•" symbol. For example, you might see A•T, which means adenine pairs with thymine. The IUPAC rules from 1970 say we should not use "-" because that symbol represents a [covalent bond](/wiki/Covalent_bond). We should also not use ":" or "/" as these can be mistaken for ratios. Leaving out any symbol can cause confusion, as it might look like a polymer sequence.

In some special cases, like Hoogsteen base pairing, scientists need to show different kinds of hydrogen bonds. Since IUPAC has not given specific rules for this, some researchers use symbols like "\*" or ":" to make the differences clear.