The ATGCs of DNA
Sometimes I feel sad about how little I know of modern science...
I understand the scientific process all right, but the new discoveries often seem way out of reach. I'm not equipped to really comprehend the Big Bang 13.8 billion years ago, the warping of spacetime by a black hole, or force particles like the Higgs boson. I'm not alone, and this isn't a new human condition. Science has always been working at the frontiers, beyond common knowledge. Despite the learnings of the ancient Greeks, some people two thousand years ago may have struggled with the idea of a round Earth. A person two hundred years ago may not have been able to grasp the connection between electricity and magnetism. Today these concepts are trivial. Perhaps general relativity will be second nature to future generations. Does this make me feel better? Yes, I guess so, a little bit.
I accept that I'm unlikely to ever fully understand in detail all the fantastic discoveries of modern science. For some of the more impactful ideas, however, I want to at least understand the basics. A couple years ago, I pushed myself to fathom the essence of quantum physics. Then I tackled Einstein's gravity. I found a general approach to learning that works well for me (more on this later...). Now I'm striving to wrap my mind around the age-old inquiry: what is life and how does it work? Or more specifically: what is DNA and how does it work?
Watson and Crick
In 1953, James Watson and Francis Crick built ball-and-stick models to help discover the structure of deoxyribonucleic acid, or DNA. In this kind of model, the balls are atoms and the sticks are bonds between them. I figure there's no better way for me to understand what DNA is than to build a ball-and-stick model myself. Then I'll use another model to simulate how DNA copies itself and how the DNA code is used to produce proteins.
On a Saturday in January, 2018, I'm sitting on the carpet in our guest bedroom where there's room to lay out all the contents of the kit I bought online from the Molecular Models Company. I feel excited, like I used to when I was a kid unpacking a new box of LEGO on the floor and previewing the construction steps. The atoms are small plastic balls with stubs where bonds will be (you might recall that atoms have a nucleus of positive protons and neutral neutrons surrounded by shells of negative electrons, and electrons in the outer shells can sometimes be shared between atoms to form bonds). Hydrogen atoms are white. Carbon atoms are black. Oxygen atoms are red. Nitrogen atoms are blue. Phosphorus atoms are purple. There are only these five elements in DNA.
The first step in the model instructions, which are nine pages long and include illustrations, is to assemble the bases. A base is a molecule that tends to accept a proton or donate an electron pair (an acid, which we'll run into later as the "A" in DNA, is a molecule that tends to accept an electron pair or donate a proton). I'll be building 20 base molecules: five called thymine, five called adenine, five called cytosine, and five called guanine. These bases are often referred to as T, A, C, and G. Their main structures are flat rings built with carbon and nitrogen atoms—the nitrogen atoms make them bases because nitrogen can donate an electron pair. T and C look alike. Both have one ring in the shape of a hexagon, with four carbons and two nitrogens at the corners. I connect each atom to its two neighbors using 20-millimetre-long grey tubes that slip over the stubs on the atoms. These grey tubes represent strong covalent bonds formed by the sharing of electrons. I adorn the rings around the outside by connecting single atoms or simple molecules (of no more than four atoms) to most of the corners. The appendages are composed of hydrogen, oxygen, carbon, and nitrogen atoms.
Purines and Pyramidines
The T and C bases are called pyrimidines, and the bigger A and G bases are called purines. Each purine has a pyrimidine-like hexagon with a pentagonal ring attached by sharing two adjacent corner carbon atoms.
The next step is to join the bases into pairs using hydrogen bonds. A hydrogen bond is a weak electrostatic attraction between a partially positive hydrogen atom and a partially negative atom such as oxygen or nitrogen. Hydrogen bonds are represented by 30-millimetre-long clear tubes in the model. The weaker the bond, the longer the bond length—recall the stronger covalent bond is represented by a 20-millimetre-long tube. The base pairs in DNA need to be approximately the same length in order to fit within the double helix structure I'll be making later. The sizes of the four bases and the exact appendages of atoms on the hexagonal rings make it such that T and A pair and C and G pair. T and C don't pair because they are both small pyrimidines. Likewise, A and G don't pair because they are both large purines. If the base pairs need to be the same length, we need one pyrimidine and one purine per pair. Why doesn't T pair with G, and C with A? The appendages of atoms on the hexagonal rings don't favor these combinations. It turns out T pairs with A via two hydrogen bonds and C pairs with G via three hydrogen bonds. I let this sink in overnight.
The following day I'm at it again. The instructions tell me to assemble 18 phosphate molecules and 20 deoxyribose sugar molecules. I make the phosphates by attaching four red oxygen atoms to a central purple phosphorus atom using grey tubes (covalent bonds). Then I make the deoxyribose sugars by building covalently bonded pentagonal rings of four carbons and one oxygen, with attachments on the perimeter composed of hydrogen, oxygen, and carbon atoms. The "d" in deoxyribose is the "D" in DNA.
This is easy and relaxing work. It's Sunday afternoon and, while it's cold outside, I'm warmed by unobstructed sunlight flooding in through a southern window. The bedroom is quiet and peaceful and cozy. Constructing the phosphates and sugars is a repetitive exercise. My hands slide plastic stubs into tubes over and over while my mind wanders. I wonder why DNA is an acid when yesterday I built all those bases. After finishing all the molecules, I take a break to look this up. The phosphates I made are phosphoric acid without the hydrogen atom nuclei. In phosphoric acid, three of the oxygen atoms bonded to the central phosphorus atom are also bonded to single hydrogen atoms, the positive nuclei (i.e. protons) of which can be donated. Because the pyrimidine and purine bases—as I am about to see—are on the inside of DNA, and the phosphoric acids are on the outside, the phosphoric acids dominate and DNA is overall acidic.
Piles of bases...
I now have tidy piles of base pairs, phosphates, and sugars, and I'm ready to put them all together into the macromolecule DNA. I should say that the order of construction specified in the model instructions (thoughtfully designed for human hands) is not the order of construction in reality. In living organisms, the building block of DNA is the nucleotide: a phosphate attached to a sugar attached to one of the four bases. The human body makes nucleotides from scratch in the liver, or salvages them from degraded RNA (to be introduced later) and DNA.
The model stand that will support the DNA is a 23-inch-tall rod on a 15-inch-diameter wood base. The rod penetrates 10 shelves of acrylic glass spaced 2 inches apart vertically. The model instructions specify the sequence of base pairs to be placed on each shelf from bottom to top. I follow it, not wanting to make a mistake. The flat base pairs rest easily on the shelves. At one point I question why I'm worried about following the sequence; the base pairs are approximately the same length and can be placed with any of the four bases on the left and any on the right. The truth is the order doesn't matter for the model, but for real DNA, as we shall see in the next model, the order is the code of life.
With the 10 base pairs in place, I begin attaching sugar molecules. A carbon atom on each sugar covalently bonds to a nitrogen atom on the end of each base. The final step is to insert and connect the phosphates between the sugars to produce the two sugar-phosphate backbones. Before I do this, all the shelves are oriented in the same direction. For the phosphates to fit between the sugars, I have to rotate the shelves. I start inserting phosphates on one side only, from the bottom up. I know that each backbone of DNA is in the shape of a helix, but I'm surprised by the amount of twisting of the shelves I have to do to fit the phosphates. I feel like I'm being overly aggressive. I carry on—trusting the instructions which have not let me down so far—and of course, it turns out just right. The finished backbone makes approximately one helical turn. I insert the phosphates on the other side to complete the second backbone, and I'm done. I spend some time comparing the model to diagrams of DNA in a textbook. Yes, it's correct!
Before I move on to the next model, let me describe my recent pattern of learning. I fell into it a couple years ago as I struggled to understand quantum mechanics, a modern marvel of science. Physicists talk about wave-particle duality. Engineers use it to design computers. Quantum mechanics is surely a bizarre and important thing, but what the heck is it?! I wanted to know.
I read popular science books. There are many good ones. And from these books, I garnered a clue. But I wanted more than a clue, so I looked for other ways to learn. I decided to focus my attention on the double-slit experiment, which embodies the main concepts of quantum mechanics. I read technical textbooks, pushing myself to continue even when I wasn't comprehending much. I watched online lectures. As a last resort, I consulted Wikipedia (don't tell my wife; I tell her anybody can write anything on there).
All this passive learning helped. I turned the corner in understanding, however, when I decided to try to perform the double-slit experiment myself. I did it at home with a laser pointer. Now things were starting to make sense. I drew the interference pattern of light. I played with the math of waves. Neurons in my brain were firing. Emboldened, I contacted universities in search of a lab with the double-slit experiment apparatus. I flew across the country from my home in Portland, Oregon to Providence, Rhode Island to observe the experiment performed one photon at a time at Brown University. I reached my level-of-understanding goal by active learning.
Making a project out of it—that was the key.
To learn about general relativity, I witnessed scientists reproduce Sir Arthur Eddington's famous 1919 eclipse experiment during the solar eclipse of August 21, 2017. By finding out how light from distant stars bends as it passes the massive sun, I came to appreciate the tenets of Einstein's theory of gravity.
Now I'm on to DNA: what it is and how it works. I started with popular science books (e.g. DNA by James Watson), textbooks (e.g. Molecular Biology of the Gene by Watson, Hopkins, Roberts, Steitz, and Weiner), and a video lecture series (Understanding Genetics: DNA, Genes, and Their Real-World Applications by David Sadava). Then I began my project: model building. First I built the ball-and-stick model, as you've seen. Now I'm working on a model that shows how DNA copies itself and how proteins are made.
The Lab-Aids DNA-RNA Protein Synthesis Model Kit I purchased online is designed to be used by students in a classroom under the guidance of a teacher. As I'm unpacking the plastic pieces of the kit on the carpet in our guest bedroom, I feel like a teacher working through the model to make sure I know what I'm doing before leading the students through it. I do have the Teacher's Guide!
Step one is to build the DNA molecule, which will be much simpler than the ball-and-stick model I built before. In this model, each base, phosphate, and sugar is one piece of plastic. Each base molecule is a 25-millimetre-long soft tube. T is green, A is orange, C is blue, and G is yellow. Each phosphate molecule is a 25-millimetre-long soft white tube. Each sugar molecule is a small black pentagon with stubs.
The instructions say to make 18 nucleotides. I know what a nucleotide is from the previous model: a phosphate attached to a sugar attached to one of the four bases. I make each nucleotide by sliding a phosphate tube and a base tube onto stubs of the sugar pentagon. Five nucleotides are made with T, five with A, four with C, and four with G. I know from the previous model that T pairs with A and C pairs with G, so the number of T bases equals the number of A bases and the number of C bases equals the number of G bases (this is Chargaff's rule of base pairing).
I make two "half-ladders" of nine nucleotides each. Then I lay them side by side and connect the base pairs using small rods—representing hydrogen bonds—that slip into the base tubes. The result is an 11.5-inch-long DNA molecule in the shape of a ladder, with nine base pair rungs. It's flat. The Student's Guide doesn't say anything at this point about the shape of the DNA molecule. After consulting the Teacher's Guide, I gather that this is an opportunity to ask the students about the shape. Because the model is constructed using flexible plastic tubes, it can be twisted into the form of a double helix. I do it again and again; it never gets old.
DNA is contained in chromosomes in the nucleus of cells. This gives us the "N" in DNA. Human cells have 23 pairs of chromosomes, one set of 23 from Mom and one set of 23 from Dad. For a cell to divide into two cells, each possessing 23 identical pairs of chromosomes (a process called mitosis), DNA must replicate.
Before I simulate DNA replication, I pretend I'm a liver and make 18 more nucleotides. Then, to begin replication, I unzip the DNA molecule along the weak hydrogen bonds between base pairs. As the double helix splits into two separate strands, I attach the bases of the new nucleotides to the freshly unpaired bases on each strand. As always, T pairs with A and C pairs with G. I started unzipping at the "top". Half-way through the process, the model looks like the letter "Y": the original double helix is at the bottom and two new double helices are branching off at the top. When I'm done, I have two identical copies of the original DNA.
Prove to yourself that the copies are identical: Suppose the first three base pairs in the original DNA are T-A, C-G, A-T. The two unzipped strands would be T-_, C-_, A-_ and _-A, _-G, _-T. Knowing that T pairs with A and C pairs with G, pencil in the new bases.
It is critically important that when a cell divides, each new cell retains the exact same version of DNA as in the original cell, because DNA is the code for making the proteins that do the things we recognize as life. Proteins provide the structures of cells and therefore of bones, muscles, skin, and hair. Other proteins, called enzymes, provide shapes that facilitate chemical reactions. The DNA code is used to create proteins through the processes of transcription and translation.
Proteins are not made in the nucleus, where DNA is. They are made in ribosomes in the cytoplasm of the cell. The cell needs some way get the code from the nucleus to the ribosomes. This is where we first meet ribonucleic acid, or RNA. RNA is different from DNA in two ways. The sugar molecule in RNA is ribose, which is very similar to, but not exactly the same as, deoxyribose. And instead of thymine (T), RNA uses the pyrimidine uracil (U). RNA comes in different flavors. Messenger RNA, or mRNA, transcribes the code from DNA in the nucleus and delivers it to the ribosomes.
I start the transcription exercise by building nine nucleotides. These will eventually form the mRNA, so I use purple pentagons for ribose and lavender tubes for U instead of black pentagons for deoxyribose and green tubes for T. Then I unzip one of my DNA molecules from the replication activity. As the double helix splits into two separate strands (as it did during replication), I attach the bases of the mRNA nucleotides to the freshly unpaired bases on one of the strands (which goes AGTCTAGCT). U pairs with A, A with T, C with G, and G with C. The mRNA nucleotides are now side by side, and covalent bonds form between the neighboring sugar and phosphate molecules. As the new strand of mRNA forms, I unzip it from the DNA strand and zip the two DNA strands back together. Voila, I now have a single strand of mRNA, and the DNA molecule is restored. In a real cell, at this point the mRNA would leave the nucleus through pores in the membrane and travel to a ribosome in the cytoplasm.
My mRNA strand contains only nine bases (UCAGAUCGA). A real mRNA strand would be much longer. The DNA in each human chromosome contains millions of base pairs, which group into hundreds to thousands of genes. A gene, which contains hundreds to thousands of base pairs, is an important unit of code. It used to be thought that each gene is translated into one protein; now we know it's about three proteins on average. In the transcription exercise, I unzipped the entire DNA molecule, which was no big deal since it had only nine base pairs. In real life, transcription does not involve unzipping an entire DNA molecule with its millions of base pairs; only a segment corresponding to a specific gene unzips and is transcribed. The mRNA carries the code of this specific gene, with its thousands of bases, out to the ribosomes.
I place the mRNA strand with nine bases on a flat piece of purple plastic in the shape, I presume, of a ribosome. The ribosome is purple like the ribose sugar pentagon because ribosomes, as you might tell from the name, contain ribose. Its structure, which I guess is complex, will have to be a lesson for another day.
Elsewhere in the cytoplasm, another type of RNA called transfer RNA, or tRNA, is gathering amino acids. The tRNA molecules are different from the mRNA molecules. One side of tRNA does look like mRNA, but much shorter: it holds a string of only three bases. The other side looks like a keyhole. The key that fits the keyhole is a part of an amino acid.
Translation: from gene to protein
Amino acids are the building blocks of proteins. There are 20 of them. The human body can make some; the rest we get from food. All amino acids have a central carbon atom covalently bonded to four characters: a hydrogen atom, an amino group, a carboxylic acid group, and a "side chain." The side chain is the only character that differs among the 20 amino acids. It is the key of the amino acid. Each of the three amino acids in the model has a different key: rectangle, triangle, and half-circle.
Each of the three tRNA model pieces has, on one side, a keyhole slot for either a rectangle, triangle, or half-circle. The other side has stubs for three bases. Together, the three bases form a unit that is called a triplet. There is a specific relationship between the base triplet, tRNA with its keyhole, and amino acid with its key. Bear with me; this is going to be awesome.
The bases on the three tRNA molecules have to pair with the bases on the mRNA, which is waiting patiently on the ribosome. The first three bases on the mRNA are U, C, and A, so the bases on the first tRNA have to be A, G, and U. I put these bases onto the first tRNA. This tRNA has, on its other side, a rectangular keyhole holding an amino acid with a rectangular key. The next three bases on the mRNA are G, A, and U, so the bases on the second tRNA have to be C, U, and A. The second tRNA has the triangular keyhole holding the amino acid with the triangular key. The last three bases on the mRNA are C, G, and A, so the bases on the last tRNA have to be G, C, and U. The last tRNA has the circular keyhole holding the amino acid with the circular key. I consult a textbook to find that tRNA molecules with base triplets AGU, CUA, and GCU always have, on their other sides, the amino acids serine (rectangular key), aspartic acid (triangular key), and arginine (circular key), respectively.
Now I attach the bases of the three tRNA molecules to the bases of the mRNA molecule using hydrogen bonds. In this arrangement, the three amino acids are side by side, with the amino group of one next to the carboxylic acid group of another. I make covalent bonds between the neighboring amino acids using grey tubes. Finally, I detach the string of three bonded amino acids from the tRNA molecules. I now have a representation of a very short protein molecule. In real life, a long string of hundreds of amino acids would fold into a specific protein dictated by the exact type and sequence of the amino acids.
The genetic code
Let's catch our breaths and summarise what has happened. When we unzipped the DNA molecule for transcription, these bases were exposed: AGT, CTA, and GCT. The corresponding mRNA molecule had the following bases: UCA, GAU, and CGA. During translation, the corresponding tRNA molecules had the following bases/amino acids: AGU/serine, CUA/aspartic acid, and GCU/arginine. Here is the same information in a table:
DNA bases AGT CTA GCT
mRNA bases UCA GAU CGA
tRNA bases AGU CUA GCU
amino acids serine aspartic acid arginine
Serine, aspartic acid, and arginine then bonded to form part of a specific protein molecule. These are the broad strokes of how the DNA code is used to produce proteins. I am in awe that we know how this works.
It's the end of another Sunday. My legs ache from sitting for so long on the carpet. I pull open the curtain; it's almost dark outside. Tomorrow I have to go to work.
I'm done with both models, which is sad because I really enjoyed them. I do still have a lot of questions. Different kinds of enzymes are required for all the processes I learned about. I've acquired only a vague sense of how enzymes work. The compositions of ribosomes and tRNA molecules are still mysterious to me. This was beyond the scope of the Lab-Aids model kit. I don't know how mRNA and tRNA find their way to ribosomes. Even though I'm tired—it's painstaking work keeping track of all those bases and their letters—I scour Molecular Biology of the Gene looking for answers. Despite the amazing learning experience I just had, I leave the guest bedroom a little frustrated at some of the details I still don't know.
After a good night's sleep, I wake up with the proper perspective. I remember my objective: to develop a basic understanding of what DNA is and how its code is used to make proteins. Mission accomplished. Molecular biologists say that life obeys chemistry, and now I can appreciate how. If I want to learn more, I have the foundation to do so.