How does CRISPR work?

A guide to the latest genome editing technology...

23 February 2018

mouse.jpg

Credit:

Scientists in the USA reported recently that they’d successfully used the molecular editing tool CRISPR/Cas9 to alter the genomes of human embryos. This comes hot on the heels of similar claims from two research groups in China, who also reported human embryo editing. Here, Alex Ashcroft explains this new genetic science that's taking the world by storm...

CRISPR is the technique all three groups used to edit the DNA of human embryos. It stands for “Clustered regularly interspaced short palindromic repeats”, and you’ll find in this attempt at a no-nonsense guide to CRISPR that scientists love acronyms; in fact, CRISPR is itself short for CRISPR/Cas9, the full name of the technology.

The Central Dogma

In order to appreciate how CRISPR works, we must first visit the “central dogma” of genetics. This was first described in 1958 by Francis Crick, who shared the 1962 Nobel Prize with Maurice Wilkins and James Watson for figuring out the structure of DNA. The central dogma explains how the information encoded in DNA makes cells, and ultimately organs and organisms. At its simplest level, the central dogma argues that a gene encoded on DNA (deoxyribonucleic acid) is copied into an RNA (ribonucleic acid) form, and this so-called "messenger RNA" molecule is used to generate proteins, which carry out most of the functions inside cells.

The reason for this complex molecular dance is that the DNA genome is located inside the cell's nucleus, which is a special membrane-enclosed bag. Proteins, meanwhile, are made outside the nucleus on the cellular equivalent of a weaving loom called a ribosome. RNA acts as the messenger between the DNA and the loom. In essence, the mRNA is a copy of the "knitting pattern" which is read by the ribosome to stitch together the right amino acid "threads" to make the desired protein.

You can think of the DNA genome as a cell's master collection of knitting patterns. The weaver doesn't want to carry this entire lot around the workshop, so he jots down a messenger RNA "copy" of the design he wants to weave, and carries just this copy to the loom where he uses the RNA instructions to dial up just the desired protein.

So what is so special about RNA? Well it's actually very similar to DNA; in fact, the main difference between them is that a single "unit" or genetic "letter", which is known as a base, of DNA has one less oxygen atom than an equivalent RNA unit. There are some other minor differences too, like the fact that, in vertebrates, RNA molecules encode just a single gene whereas DNA can carry so-called “junk” that doesn’t code directly for proteins.

One other subtlety to be aware of is that when DNA creates an RNA molecule for a given gene, it creates an RNA molecule that is the genetic mirror image of the DNA sequence. This means that RNA matches a specific DNA sequence and can pair up with, or “bind”, to it. This isn’t really important for generating proteins, but it is a property scientists take advantage of in CRISPR.

Repeated sequences of DNA are essential for CRISPR function

CRISPR can ultimately alter the DNA sequence of a given gene. Sometimes it can be used to fix a broken gene so the new DNA can make a functional protein. This is the goal with genome editing of human embryos: to cure diseases, caused by a single gene, before they can manifest themselves clinically. In other cases, scientists might be trying to remove a gene in an experimental or "model" organism to better understand its function. But how does CRISPR help them to achieve this?

Genomic information is encoded in DNA with four simple molecules that scientists refer to with single letters: “A”, “T”, “G” and “C”. Generically, these letters are called base pairs. Every gene is created from unique sequences of these four letters, and the average human gene is between 10 and 15 thousand base pairs long. It’s fairly easy to imagine how you can generate a truly unique order of the letters (or gene sequences) when you have 15000 slots to play with. But what happens when you start looking at DNA sequences on a much smaller scale - a scale of, say, just 3 letters?

Well you could have three of the same letters all in a row - perhaps, AAA or GGG. You might get all different letters: ATC, ATG, GTA, CAT and so on. Or you might get duplicates of one letter - for example AAT or CCG.

There are numerous different ways you can write “A”, “T”, “G” and “C” when you have just three spaces. And it gets even more complex when you consider the fact that any letter could be duplicated. Nevertheless, there are only so many different ways you can do this, meaning that the number of possible combinations is finite. So you might imagine that, if you look at DNA sequences on the scale of just 3 base pairs, you’ll start to see the same pattern, or “motif”, come up again and again. And it turns out that throughout the genomes of most species there is a three letter motif that occurs with high frequency. It's called the “protospacer adjacent motif” and scientists have abbreviated this to the "PAM" domain.

Initially, scientists just ignored the PAM domain and other common 3 letter base pair sequences, because they couldn’t do anything with them. They also weren’t useful; since they were repeated throughout the genome scientists couldn’t use them to identify one gene from the other. But, recently, some scientists discovered that a protein called Cas9 could recognise the PAM domain. At this point, everything changed…

The Cas9 protein cuts DNA at PAM domains

Cas9 is an “endonuclease”, meaning that it can cut DNA. Under the right conditions, when Cas9 finds a PAM domain it creates a break in the DNA nearby.

A cell’s DNA breaks all the time, so the majority of species have evolved DNA repair mechanisms to fix this problem. But these mechanisms don’t always repair the DNA perfectly. Thus, if Cas9 cuts DNA enough times, eventually, by chance, it can leave behind an altered gene sequence.

At this point all we have is a protein that can recognise a three letter sequence in DNA and cause breaks at a predictable location nearby. PAM domains occur frequently throughout the genome, and without a way to target one PAM domain over another, Cas9 is a useless tool.

This is the genius insight that several scientists made: they found a way to target the Cas9 protein to specific PAM domains and in doing so created CRISPR. Now they're currently fighting over who did it first and thus who gets to patent it. Some say it's the scientific scandal of the decade!

CRISPR/Cas9 - how it works

The mechanism generated by scientists is conceptually rather simple, but very long winded. There are many steps needed before a gene can be edited. First you need to find a PAM domain that’s in a gene that you want to edit. Then you need to look at the DNA next to the domain: are the next 18-21 base pairs unique? If not, then you need to look for another PAM domain and repeat the process. Once you have a PAM domain in a useful place that’s immediately next to a unique sequence of DNA... then the hard work starts. You need to synthesise a piece of RNA that exactly matches the PAM domain and the adjacent sequence of DNA. You then attach this RNA molecule to another piece of RNA that can bind the Cas9 protein.

This complex of RNA molecules is almost like the ultimate diplomat. The segment of RNA that matches the PAM domain and the adjacent sequence of DNA is actually able to recognise and bind onto the DNA at that exact location in the genome. The other half of the RNA supermolecule binds - "escorts" - the Cas9 protein, facilitating its interactions with the target piece of DNA. As we've already discussed, when the Cas9 protein sees a PAM domain, it breaks the DNA next to it.

The broken DNA, in turn, activates the cell’s DNA repair mechanisms. At this point, a scientist can choose one of two routes in genome editing: the very accurate but inefficient mechanism (called HDR) or the more efficient but less accurate process (NHEJ).

What is NHEJ?

NHEJ stands for “non-homologous end joining”. NHEJ is the easiest approach for scientists to take when using CRISPR. Scientists get the CRISPR system into the nucleus of a cell, and basically just hope for the best.

This is not an exaggeration. The theory is that is Cas9 keeps cutting its target DNA and the repair mechanisms keep fixing it, then eventually, by chance, something will go wrong and the gene remains broken. Perhaps the DNA repair mechanism will add in the wrong base pair, creating a broken gene, or perhaps small sections of DNA will accidentally be deleted (called INDELS).

There are many more errors that DNA repair machinery can make, and the hope with all of them is that they will break the target gene. But, as you might imagine, with a random system, the scientists have no idea what CRISPR will change in the DNA and if that change will even be useful. But if all a scientist wants to do is break a gene, and doesn’t care how it happens, then NHEJ is the way to go as it is by far the easiest of the two systems. It is also the most efficient, although it's difficult to give precise estimates of CRISPR NHEJ efficiency because many different things influence it including which gene you are targeting, which species you are using, how you are getting CRISPR inside the cell and whether you are using cells or actual embryos.

What is HDR?

“Homology directed repair”, or HDR, is the more complicated of the CRISPR approaches. Scientists use HDR if they want to control what the altered DNA sequence will be. This is obviously a lot harder.

Up until the DNA repair stage, HDR works the same as NHEJ. However, in HDR, scientists want to trick a cell’s DNA repair mechanism into adding in a piece DNA they designed.

They do this by flanking their new DNA sequence with bits of DNA that exactly match the DNA on either side of the CRISPR cut site. They hope that instead of repairing the original DNA sequence, the repair mechanism will add in their chosen bit of DNA, in a manner analogous to cut and paste in a word document.

This sort of method is useful when scientists want to add something into a genome. For example, a scientist may want to transfer a human gene to a mouse to study what it does. Scientists can also use this technique to replace a broken copy of a gene with a working version, so it has potential for new medical treatments for inherited diseases. HDR offers a way to fix a broken gene in a very precise manner, but it’s very inefficient. Some scientists have only been able to get it to work in mice embryos, 5-23% of the time.

Since the less accurate NHEJ arises because of random errors in the cells own natural DNA repair mechanisms, it is theoretically possible for NHEJ and HDR to occur in the same cell. It’s not always possible for scientists to engineer a work around for this problem. But is it always an issue? Well, when scientists are genetically modifying individual cells, it’s not that big a deal. However, when scientists are altering the DNA of embryos it is a huge problem. This because embryos can be “mosaic”.

What is mosaicism?

The overwhelming majority of people and animals have the same DNA in every cell in their body. A very small percentage have different DNA in different cells and we say that they are mosaic.

Mosaic animals and humans occur naturally, but it is possible to produce them accidently with CRISPR. CRISPR is typically injected directly into the nucleus of single cell embryos. But these embryos will eventually divide to form two cells, and those two cells into four, and so on until a complete organism arises.

What happens if the CRISPR machinery hangs around after the first cell division? Then you would have CRISPR machinery in two different cells, each acting independently of each other. If the random editing of NHEJ occured again in one cell and not the other, then you’d end up with 2 cells with different DNA sequences. And if the first 2 cells of an embryo have different DNA sequences, then they would create an entire organism with 2 different sets of DNA. It would be mosaic.

As with all limitations with CRISPR, mosaicism is not such a problem when working with cells in a lab, but it becomes a huge issue in the clinic. When developing a therapy, even if it’s a genetic therapy, scientists and doctors need to know exactly what they are doing. This is about more than just scientists being control freaks! Consider what would happen if the mosaic genome editing produced some cells with DNA that had been fixed and some with DNA that hadn’t, or some with DNA that’s been fixed and subsequently broken.

In either scenario, you’d end up with a patient that still has some broken cells. And because we don’t know which organs have developed from which embryonic cells, the patient will have some organs made from healthy cells and some organs made from broken cells, and scientists would have no idea which was which. Not a great start for a therapy... So, unsurprisingly, scientists are putting a lot of effort into limiting the development of mosaic CRISPR animals.

Off-target cutting

The problem of mosaic animals in CRISPR seems complicated enough. But there is one final level of complexity to CRISPR that not only exacerbates the problems with mosaicism but is also, on it’s own, a major hurdle for using CRISPR therapeutically. I am talking about the infamous off-target effects.

When scientists use CRISPR they spend a lot of time designing the genome editing machinery that will be used so as to limit off-target effects as much as possible. Unfortunately, it’s impossible to avoid them completely. So what are they? And why do they matter?

Well, CRISPR is composed of the Cas9 enzyme that cuts the DNA and the RNA supermolecule that diplomatically brings the Cas9 enzyme to the correct bit of DNA.

The RNA supermolecule contains a sequence that allows it to attach to the Cas9 enzyme and a sequence that recognises the important PAM domain and the bit of DNA next to it.

The bit of DNA next to the PAM domain, which the RNA diplomat targets, is usually 18 - 21 basepairs long and theoretically completely unique.

The problem is that the human genome contains around 3 billion of these basepairs. Statistically, it’s extremely unlikely that an 18-21 basepair sequence is not going to be repeated at least once.

What happens if this repeated sequence also happens to lie next to a PAM domain, which is not uncommon?

It’s hardly surprising what happens in such a scenario: The CRISPR machinery finds and cuts both sites in the genome, meaning that it alters the target site and some other random (“off-target”) bit of DNA.

It’s nearly impossible to design RNA diplomats that don’t recognise at least one off-target site but you can try to minimise their impact.

If the off-target site occurs inside a gene that codes for a different protein, then you may be in big trouble. If CRISPR, by chance, breaks that gene then it could cause a disease. For instance, if both the CFTR gene copies you get from mum and from dad are broken, then you would unfortunately have a nasty disease called cystic fibrosis.

So scientists generally try to design RNA supermolecules where the off-target regions are not located within other proteins. The theory is that if a piece of DNA that doesn’t encode a protein is broken it’s less likely to cause a disease and thus is not such a big deal.

However, it is becoming apparent that DNA that doesn’t make proteins, originally called “junk DNA”, actually plays some really important physiological roles. There isn’t really any DNA that is actually useless.

Whilst messing up the so-called junk DNA may not produce diseases in the same way damaging genes might, it may produce an organism that is less evolutionarily “fit”. Everything that’s supposed to be there, will be there and it will work...but it may not work as well as it could. Perhaps this will mean the metabolism is slower than it could be.

Can we get around off-target effects?

Given how problematic off-target cutting by CRISPR can be, it’s no surprise that scientists try to limit it as much as possible.

Scientists can only use CRISPR when they know the complete genomic sequence of the species they want to edit. Because the whole genome sequence is known, scientists can computationally find RNA supermolecules with the fewest possible expected off-target sites.

But mother nature still has a few tricks up her sleeves. Some scientists have investigated how accurate this off-target prediction method is in cells. They found that CRISPR always seems to cut in some unexpected places.

This doesn’t seem to be the fault of the algorithms - but rather the RNA supermolecule is so eager to bring Cas9 to DNA that sometimes it accidently recognises the wrong bits of DNA. Interestingly, it seems to do this in a consistent manner - something that scientists are still investigating.

So we know that off-target cutting is going to happen and that it potentially has pretty severe consequences. But is it really such a big deal - surely scientists have found a solution, given the hype about CRISPR, right?

Well, as is becoming a theme in this guide - it’s a lot easier to deal with in the lab. If you are genetically editing rodents, scientists can get around off-target effects by “backcrossing” the edited animals.

Backcrossing isn’t a new idea. Scientists, and people who breed animals, like dogs or horses, have been doing it for centuries (although it has gone by many different names).

When you backcross, you essentially want to move a trait from one strain or breed to another. For instance, crossing black labradors with a brown dog, perhaps a chocolate lab that occurred by chance, to produce a half breed. Then crossing the offspring with brown coats that look most like labs with actual labs. And so on...Over time, across many generations, you get a strain of labrador where the desired trait (brown coats) has been bred in.

With genetic testing, we don’t need to look for an obvious trait, such as coat colour, we can just test for the presence of absence of the desired version of a certain gene.

Scientists can thus, over time and across generations, move the genetically altered gene from one rodent strain to another. If they do this for a sufficient number of generations, eventually all the off-target effects will be lost and they will have created a strain of mice, for example, that has normal DNA and one edited gene.

We can’t do this in humans. In fact, the major concern with CRISPR to edit human embryos is that some cells in the embryo will go onto become the “germline”. These are cells that will develop into eggs or sperm and thus pass the altered genes onto the next generation. Passing on a fixed gene to the next generation is not so bad but passing on off-target alterations, that we cannot fully predict, is pretty scary.

Until scientists develop a way to use CRISPR and guarantee there will be no off-target effects, it’s unlikely that the technique will be used to edit human embryos, regardless of the ethical debates surrounding designer babies.

So why are scientists excited about CRISPR?

With all these limitations to CRISPR, you might be wondering what all the hype is about. CRISPR still is the most powerful genome editing tool that we have but it’s unlikely that, in its current form, it will be used to edit human embryos therapeutically.

CRISPR is special because it has so many potential targets, because of all the PAM domains. It also allows scientists to edit the genomes of new organisms as long as they have a sequenced genome. It’s also comparatively cheap and quick. Earlier techniques would need 18 - 24 months to generate a single mutant mouse. Now you can do it with a single injection, in theory.

CRISPR is thus an extremely powerful tool for basic research. And it also does actually have some potential therapeutic applications in humans.

Humans with genetic diseases caused by a single gene, affecting a single organ such as Duchenne Muscular Dystrophy are a great example of how CRISPR might currently be used in the clinic.

Scientists could take an individual's muscle stem cells, fix the broken gene with CRISPR and give the altered cells back to the patient as therapy. Such an approach is the theoretical basis of gene therapy, although other genome editing techniques may be used.

Another example of how CRISPR could work in the clinic, is to take a patient's immune cells and edit their genome, with CRISPR, to target cancer.

There are many more (theoretical) examples of how CRISPR might be used therapeutically, without editing human embryos. In these approaches the problems of off-target effects and mosaicism are less important because scientists would only be altering one cell type and those altered cells would never be inherited by the next generation.

Despite its limitations, CRISPR does somewhat live up to its hype, it has revolutionised basic scientific research and may underpin a host of medical breakthroughs waiting to happen...

Alexandra Ashcroft is a PhD student at the University of Cambridge and recently completed an internship with the Naked Scientists supported by the UK's Genetics Society, during which this article was written.