Structural Biology 101

An interactive guide to the molecular world.

Why molecular structure matters

Every biological process you can think of — a muscle contracting, a virus entering a cell, DNA being copied — is ultimately driven by the three-dimensional shapes of molecules. An enzyme fits its substrate like a key in a lock. An antibody recognises a pathogen by the contours of its surface. A drug works (or doesn't) because of how snugly it nestles into a protein's binding pocket.

Structural biology is the field that figures out these shapes. Its core principle is simple: structure determines function. If you know the 3D arrangement of atoms in a molecule, you can begin to understand — and even predict — what that molecule does.

On the right you can see a small protein called Crambin, isolated from the seeds of an Abyssinian cabbage. It's one of the smallest proteins with a known crystal structure — just 46 amino acids — which makes it a perfect starting point. Try rotating it with your mouse. You can also look at it as a molecular surface to see its overall shape, or as space-filling spheres to get a sense of its real volume. Switch back to ribbon view whenever you like.

In the pages that follow, you'll learn what molecules like this are made of, how they fold into shape, and where their structures come from.

Atoms and bonds

At the most fundamental level, every molecule is a collection of atoms connected by chemical bonds. The atoms you'll encounter most often in biology are carbon (C), nitrogen (N), oxygen (O), hydrogen (H), and sulfur (S). Each element plays a different role: carbon forms the backbone skeleton, nitrogen and oxygen create the polar and reactive sites, and sulfur can lock parts of the chain together with disulfide bridges.

The viewer now shows the same Crambin protein as a ball-and-stick model: small spheres for atoms connected by cylindrical bonds. Atoms are coloured by element — a convention chemists call CPK colouring:

Carbon — gray or green
Nitrogen — blue
Oxygen — red
Sulfur — yellow
Hydrogen — white (often hidden for clarity)

There are two other common ways to show atomic detail.

In a space-filling (or VdW) view, each atom is drawn at its real van der Waals radius — the spheres overlap and merge into a compact, bumpy surface. This is the most physically accurate picture of what the molecule "looks like" to another molecule approaching it.

In a pure stick view the atoms are implied — you only see the bonds as thin cylinders. This is the most common representation for examining bonding geometry and is much easier to read for large molecules.

The ball-and-stick view you started with is a compromise between the two — small spheres for atoms, sticks for bonds.

Wikipedia: Atom · Wikipedia: Chemical bond

Amino acids

Proteins are built from a set of 20 standard amino acids — you can think of them as the letters of the protein alphabet. String them together, and you get a word; fold that word in three dimensions, and you get a functional protein.

Every amino acid shares the same core structure — a backbone made of nitrogen, carbon-alpha (Cα), and a carbonyl group (C=O). What makes each of the 20 amino acids unique is the side chain (also called the R group) attached to the Cα. Side chains range from a single hydrogen atom (glycine, the simplest) to bulky aromatic rings (tryptophan, the largest).

Side chain properties

Side chains determine how an amino acid behaves:

Hydrophobic (e.g. leucine, valine) — avoid water, tend to pack into the protein's interior
Polar (e.g. serine, threonine) — form hydrogen bonds, often found on the surface
Charged (e.g. glutamate, lysine) — carry positive or negative charge at physiological pH
Aromatic (e.g. phenylalanine, tyrosine) — bulky ring structures with special optical and chemical properties

Wikipedia: Amino acid · Wikipedia: Peptide bond

Proteins: from chain to shape

In the viewer you see a dipeptide — two amino acids from lactate dehydrogenase (PDB: 4OJN), a metabolic enzyme found in nearly every living cell. The two residues are joined by a peptide bond — a covalent link between the carbonyl carbon of one amino acid and the nitrogen of the next. This is how amino acids chain together into a polypeptide. Now reveal the entire chain — over 300 residues. The view becomes a dense tangle of atoms where individual details are lost.

Clearly, staring at thousands of atoms is not the best way to understand a protein. Biochemists think about protein architecture in four levels of increasing scale — from the sequence of amino acids up to multi-chain assemblies. Let's walk through each one.

Primary structure

The primary structure is simply the linear sequence of amino acids — a string of letters like Glu-Leu-Ala-Leu-Val... This sequence, encoded in DNA, contains all the information the chain needs to fold into its final shape.

Secondary structure

Switch to a cartoon view and the chaos of atoms resolves into recognisable shapes. Locally, the backbone folds into repeating patterns held together by hydrogen bonds:

α-helices — right-handed coils where each backbone NH hydrogen-bonds to the C=O four residues earlier. In the cartoon they appear as spiralling ribbons. Zoom in on Lys 317 to see one up close — it sits in the middle of a long helix that runs from Thr 309 to Lys 328.
β-sheets — extended strands lying side by side, connected by hydrogen bonds between them. They appear as flat arrows showing the chain direction. Zoom in on Leu 48 to see a strand within a six-stranded β-sheet at the core of the enzyme.

Tertiary structure

Zoom back out to see the tertiary structure — the complete three-dimensional fold of a single polypeptide chain. Hydrophobic side chains pack into the core while polar residues face the surrounding water. The overall shape of this fold determines what the enzyme can bind and how it catalyses its reaction.

Quaternary structure

Many proteins function not as lone chains but as multi-chain assemblies — this is quaternary structure. Lactate dehydrogenase is a tetramer of four identical subunits. In the secondary-structure colouring you can see that each subunit shares the same fold — the same helices and sheets repeated four times.

At this level of organisation we usually care about the subunits themselves rather than their internal elements, so it makes sense to colour by chain — now each subunit stands out as a distinct unit. The four chains pack together into a compact assembly; only the complete tetramer is catalytically active.

Wikipedia: Protein structure

Nucleotides

If amino acids are the alphabet of proteins, then nucleotides are the alphabet of nucleic acids. DNA uses four: adenine (A), guanine (G), cytosine (C) and thymine (T) — you can see all four in the viewer.

Zoom in on adenine to look at the three parts every nucleotide is built from:

Phosphate group — links nucleotides into a chain and gives DNA its negative charge
Sugar — a five-carbon deoxyribose ring that connects the phosphate to the base
Nitrogenous base — the "letter" that carries the genetic information

DNA vs RNA

The chemical difference between DNA and RNA is small but consequential. Compare deoxyadenosine (left) with adenosine (right): RNA's ribose has an extra hydroxyl group at the 2' position. This makes RNA more flexible and reactive, while DNA's missing hydroxyl makes its double helix more chemically stable — perfect for long-term information storage.

The other difference is one base: thymine (left) vs uracil (right). Thymine carries an extra methyl group — DNA uses thymine, RNA uses uracil.

Wikipedia: Nucleotide

DNA and RNA

Nucleotides link together through their phosphate and sugar groups to form long chains. What makes these chains extraordinary is base pairing: the bases on opposite strands recognise each other through hydrogen bonds. Adenine always pairs with thymine, and guanine always pairs with cytosine — this is Watson–Crick base pairing, the foundation of heredity.

In the viewer you see a fragment of DNA — 12 base pairs of the classic double helix, one of the first DNA structures solved by X-ray crystallography (1981). Look at an A–T pair — two hydrogen bonds hold the bases together. A G–C pair is stronger — three hydrogen bonds. Switch to a cartoon view to see the iconic double helix shape: two sugar-phosphate backbones wind around each other while the base pairs stack inside. Go back to sticks to see the atoms again.

RNA

RNA is usually single-stranded, but it can fold into complex three-dimensional shapes by base-pairing with itself. A classic example is transfer RNA (tRNA) — the adaptor molecule that carries amino acids to the ribosome during protein synthesis. Its single chain folds into a distinctive L-shape, held together by internal base pairs. Ribosomal RNA forms the catalytic core of the ribosome itself — the molecular machine that reads the genetic code and builds proteins.

Wikipedia: DNA · Wikipedia: RNA

How structures are determined

Every structure you've seen in this tutorial — amino acids, proteins, DNA — was determined experimentally. But you can't see individual atoms with a light microscope: they're far too small. So how did scientists figure out where every atom sits in that DNA double helix?

X-ray crystallography

The DNA fragment from the previous section was solved by X-ray crystallography in 1981. The process goes like this: you grow a crystal of your molecule, shoot a beam of X-rays through it, and record the diffraction pattern — the way the rays scatter off the atoms. From this pattern, a computer reconstructs an electron density map — a 3D cloud showing where electrons (and therefore atoms) are most likely to be. The viewer shows the DNA structure fitted inside its electron density. Each atom was placed by hand into the densest regions of the map.

Most known molecular structures were solved this way. X-ray crystallography remains the workhorse of structural biology, capable of resolving detail down to 1 Å.

The angstrom

Structural biologists measure distances in angstroms (Å). One angstrom is 10⁻¹⁰ m, or 0.1 nm — about the radius of a hydrogen atom. A typical C–C bond is 1.5 Å long, and the diameter of the DNA double helix is roughly 20 Å. When we say a structure was solved at "1.9 Å resolution", we mean that details separated by at least 1.9 Å can be distinguished — enough to see individual atoms.

Cryo-electron microscopy (cryo-EM)

The method that triggered a resolution revolution in the 2010s. Instead of growing crystals, you flash-freeze molecules in a thin layer of ice, image thousands of individual copies with an electron microscope, and computationally average the images to reconstruct the 3D shape. Cryo-EM excels at large complexes — ribosomes, viral capsids, membrane proteins — that are difficult or impossible to crystallise. Modern detectors routinely reach 2–3 Å resolution.

NMR spectroscopy

Nuclear magnetic resonance works in solution, giving you a view of how the molecule behaves under conditions closer to its natural environment. NMR is limited to relatively small proteins (typically under 40 kDa) but uniquely reveals dynamics — which parts of the molecule are rigid and which are flexible.

Computational prediction

In 2020, AlphaFold from DeepMind showed that AI can predict protein structures from sequence alone with near-experimental accuracy. The AlphaFold database now contains predicted structures for over 200 million proteins — far more than all experimental methods combined have produced in half a century. These predictions don't replace experiments, but they give researchers a starting model that can be refined and validated.

Method	Size range	Resolution	Key strength
X-ray	Any	0.5–3 Å	High resolution, well-established
Cryo-EM	> 50 kDa	2–4 Å	Large complexes, no crystals needed
NMR	< 40 kDa	Ensemble	Solution state, dynamics
AlphaFold	Any protein	~1–2 Å	Fast, sequence-only input

Wikipedia: X-ray crystallography · Wikipedia: Cryo-EM · AlphaFold DB

Structure databases and file formats

Once a structure is solved, it's deposited in a public database so that anyone in the world can download and study it. The main repositories are:

RCSB Protein Data Bank — the primary archive for experimentally determined structures. Over 220,000 entries and growing. Each structure gets a four-character accession code (like 1CRN for Crambin).
PDBe — the European mirror of the PDB, hosted at EMBL-EBI. Besides coordinates, PDBe provides electron density maps for crystallographic entries — the raw experimental data from which structures were built.
PDB-REDO — automatically re-refined and rebuilt versions of PDB entries, often with improved quality.
AlphaFold Protein Structure Database — predicted structures for ~200 million proteins from the UniProt database. Incredibly useful when no experimental structure exists.

The PDB file format

The classic format for storing atomic coordinates is the PDB format — a fixed-column text file dating back to the 1970s. Each line is exactly 80 characters wide, with atom positions recorded as x, y, z coordinates in angstroms.

ATOM        1  N   THR A   1       17.047  14.099   3.625  1.00 13.79           N
ATOM        2  CA  THR A   1       16.967  12.784   4.338  1.00 10.80           C
ATOM        3  C   THR A   1       15.685  12.755   5.133  1.00  9.19           C

Each line contains the record type (ATOM), atom serial number, atom name (N, CA, C), residue name (THR), chain ID (A), residue number, coordinates, a temperature factor (B-factor) indicating how much the atom vibrates, and the element type at the end of the line.

HETATM records

Non-standard residues — ligands, water molecules (HOH), metal ions, and modified residues — use HETATM instead of ATOM. In the viewer you can see them as small spheres alongside the protein, or hide them to focus on the protein itself.

Modern formats

Format	Type	Advantages
PDB	Fixed-column text	Universal support, human-readable
mmCIF	Dictionary-based text	No column limits, extensible — now the primary PDB archive format
bCIF	Binary compressed	10–100x smaller, fast streaming for large assemblies

Electron density maps

Atomic coordinates are only half the story. For structures solved by X-ray crystallography or cryo-EM, the underlying experimental evidence is an electron density map — a 3D grid of values showing where electrons were detected. Depositing maps alongside coordinates lets other scientists verify that the atomic model actually fits the data.

Format	Source	Notes
CCP4 / MRC	X-ray & cryo-EM	Binary grid format, the most widely used for density maps
MTZ	X-ray	Stores structure factors (Fourier coefficients) — the map is computed on the fly
DSN6 / BRIX	X-ray	Legacy format from O and FRODO, still encountered in older datasets

PDBe provides pre-computed density maps for most crystallographic entries. For cryo-EM structures, maps are deposited in the EMDB (Electron Microscopy Data Bank).

wwPDB format specification · wwPDB file formats

Molecular simulations

Every structure we've looked at so far is a frozen snapshot — a single arrangement of atoms captured at one moment in time. But real molecules are in constant motion: bonds vibrate, side chains rotate, loops open and close. Molecular simulations let us watch these motions unfold on a computer.

Molecular dynamics

The most common approach is molecular dynamics (MD). Starting from an experimental structure, the simulation applies Newton's equations of motion to every atom. At each tiny time step — typically one or two femtoseconds (10⁻¹⁵ s) — the forces on every atom are calculated and the atoms are moved accordingly. Millions of these steps add up to nanoseconds or even milliseconds of simulated time, revealing how the molecule breathes, flexes, and responds to its environment.

Force fields

The equations and parameters that describe how atoms interact — bond stretching, angle bending, electrostatics, van der Waals forces — are collected into a force field. Common force fields include AMBER, CHARMM, and GROMOS. Choosing the right force field for your system is one of the key decisions in setting up a simulation.

What simulations reveal

Simulations can show things that static structures cannot:

How a protein folds from an unstructured chain into its native shape
How a drug molecule finds and binds to its target
Conformational changes — the large-scale motions that let enzymes do their work
How proteins behave in a lipid membrane or under different conditions of temperature and pH

Trajectories

A simulation produces a trajectory — a sequence of coordinate frames, each recording the position of every atom at a given point in simulated time. Think of it as a movie of the molecule. Molecular viewers can play trajectories frame by frame, letting you see loops swing open, helices unwind, or ligands slide into a binding pocket.

Format	Software	Notes
XTC	GROMACS	Compressed coordinates, very compact
TRR	GROMACS	Full precision with velocities and forces
DCD	NAMD, CHARMM	Binary format, widely supported across viewers
NetCDF (NC)	AMBER	Self-describing binary, portable across platforms

Wikipedia: Molecular dynamics · GROMACS · AMBER