week316

July 17, 2011

This Week's Finds (Week 316)

John Baez

Here on this This Week's Finds I've been talking about the future and what it might hold. But any vision of the future that ignores biotechnology is radically incomplete. Just look at this week's news! They've 'hacked the genome':

Ed Yong, Hacking the genome with a MAGE and a CAGE, Discover, 14 July 2011.

Or maybe they've 'hijacked the genetic code':

Nicholas Wade, Genetic code of E. coli is hijacked by biologists, New York Times, 14 July 2011.

What exactly have they done? These articles explain it quite well... but it's so cool I can't resist talking about it.

Basically, some scientists from Harvard and MIT have figured out how to go through the whole genome of a bacterium and change every occurrence of one codon to some other codon. It's a bit like the 'global search and replace' feature of a word processor. You know: that trick where you can take a document and replace one word with another every place it appears.

To understand this better, it helps to know a tiny bit about the genetic code. You may know this stuff, but let's quickly review.

DNA is a double-stranded helix bridged by pairs of bases, which come in 4 kinds:

adenine (A)
thymine (T)
cytosine (C)
guanine (G)

Because of how they're shaped, A can only connect to T:

while C can only connect to G:

So, all the information in the DNA is contained in the list of bases down either side of the helix. You can think of it as a long string of 'letters', like this:

ATCATTCAGCTTATGC...

This long string consists of many sections, which are the instructions to make different proteins. In the first step of the protein manufacture process, a section of this string copied to a molecule called 'messenger RNA'. In this stage, the T gets copied to uracil, or U. The other three base pairs stay the same.

Here's some messenger RNA:

You'll note that the bases come in groups of three. Each group is called a 'codon', because it serves as the code for a specific amino acid. A protein is built as a string of amino acids, which then curls up into a complicated shape.

Here's how the genetic code works:

The three-letter names like Phe and Leu are abbreviations for amino acids: phenylalanine, leucine and so on.

While there are 4³ = 64 codons, they code for only 20 amino acids. So, typically more than one codon codes for the same amino acid. If you look at the chart, you'll see one exception is methionine, which is encoded only by AUG. AUG is also the 'start codon', which tells the cell where a protein starts. So, methionine shows up at the start of every protein, at least at first. It's usually removed later in the protein manufacture process.

There are also three 'stop codons', which mark the end of a protein. They have cute names:

amber: UAG
ochre: UAA
opal: UGA

UAG was named after Harris Bernstein, whose last name means 'amber' in German. The other two names were just a way of continuing the joke.

And now we're ready to understand how a team of scientists led by Farren J. Isaacs and George M. Church are 'hacking the genome'. They're going through the DNA of the common E. coli bacterium and replacing every instance of amber with opal!

This is a lot more work than the word processor analogy suggests. They need to break the DNA into lots of fragments, change amber to opal in these fragments, and put them back together again. Read Ed Young's article for more.

So, they're not actually done yet.

But when they're done, they'll have an E. coli bacterium with no amber codons, just opal. But it'll act just the same as ever, since amber and opal are both stop codons.

That's a lot of work for no visible effect. What's the point?

The point is that they'll have freed up the codon amber for other purposes! This will let them do various further tricks.

First, with some work, they could make amber code for a new, unnatural amino acid that's not one of the usual 20. This sounds like a lot of work, since it requires tinkering with the cell's mechanisms for translating codons into amino acids: specifically, its set of transfer RNA and synthetase molecules. But this has already been done! Back in 1990, Jennifer Normanly found a viable mutant strain of E. coli that 'reads through' the amber codon, not stopping the protein there as it should. People have taken advantage of this to create E. coli where amber codes for a new amino acid:

Nina Mejlhede, Peter E. Nielsen, and Michael Ibba, Adding new meanings to the genetic code, Nature Biotechnology 19 (2001), 532-533.

But I guess getting an E. coli that's completely free of amber codons would let us put amber codons only where we want them, getting better control of the situation.

Second, tweaking the genetic code this way could yield a strain of E. coli that's unable to 'breed' with the normal kind. This could increase the safety of genetic engineering. Of course bacteria are asexual, so they don't precisely 'breed'. But they do something similar: they exchange genes with each other! Three of the most popular ways are:

conjugation: two bacteria come into contact and pass DNA from one to the other.

tranformation: a bacterium produces a loop of DNA called a plasmid, which floats around and then enters another bacterium.

transduction: a virus carries DNA from one bacterium to another.

Thanks to these tricks, drug resistance and other traits can hop from one species of bug to another. So, for the sake of safe experiments, it would be nice to have a strain of bacteria whose genetic code was so different from others that it couldn't share DNA.

And third, a bacterium with a modified genetic code could be resistant to viruses! I hadn't known it, but the biotech firm Genzyme was shut down for three months and lost millions of dollars when its bacteria were hit by a virus.

This third application reminds me of a really spooky story by Greg Egan, called "The Moat". In it, a detective discovers evidence that some people have managed to alter their genetic code. The big worry is that they could then set loose a virus that would kill everyone in the world except them.

That's a scary idea, and one that just became a bit more practical... though so far only for E. coli, not H. sapiens.

So, I've got some questions for the biologists out there.

A virus that attacks bacteria is called a bacteriophage---or affectionately, a 'phage'. Here's a picture of one:

Isn't it cute?

Whoops—that wasn't one of the questions. Here are my questions for biologists:

To what extent are E. coli populations kept under control by phages, or perhaps somehow by other viruses?

If we released a strain of virus-resistant E. coli into the wild, could it take over, thanks to this advantage?

What could the effects be? For example, if the E. coli in my gut became virus-resistant, would their populations grow enough to make me notice?

and more generally:

What are some of the coolest possible applications of this new MAGE/CAGE technology?

Also, on a more technical note:

What did people actually do with that strain of E. coli that 'reads through' amber?

How could such a strain be viable, anyway? Does it mostly avoid using the amber codon, or does it somehow survive having a lot of big proteins where a normal E. coli would have smaller ones?

Finally, I can't resist mentioning something amazing I just read. I said that our body uses 20 amino acids, and that 'opal' serves a stop codon. But neither of these are the whole truth! Sometimes opal codes for a 21st amino acid, called selenocysteine. And this one is different from the rest. Most amino acids contain carbon, hydrogen, oxygen and nitrogen, and cysteine contains sulfur, but selenocysteine contains... you guessed it... selenium!

Selenium is right below sulfur on the periodic table, so it's sort of similar. If you eat too much selenium, your breath starts smelling like garlic and your hair falls out. Horses have died from the stuff. But it's also an essential trace element: you have about 15 milligrams in your body. We use it in various proteins, which are called.... you guessed it... selenoproteins!

So, a few more questions:

Do humans use selenoproteins containing selenocysteine?

How does our body tell when opal is getting used to code for selenocysteine, and when it's getting used as a stop codon?

Are there any cool theories about how life evolved to use selenium, and how the opal codon got hijacked for this secondary purpose?

Finally, here's the new paper that all the fuss is about. It's not free, but you can read the abstract for free:

Farren J. Isaacs, Peter A. Carr, Harris H. Wang, Marc J. Lajoie, Bram Sterling, Laurens Kraal, Andrew C. Tolonen, Tara A. Gianoulis, Daniel B. Goodman, Nikos B. Reppas, Christopher J. Emig, Duhee Bang, Samuel J. Hwang, Michael C. Jewett, Joseph M. Jacobson, and George M. Church, Precise manipulation of chromosomes in vivo enables genome-wide codon replacement, Science 333 (15 July 2011), 348-353.

For more discussion go to my blog, Azimuth.

Pessimists should be reminded that part of their pessimism is an inability to imagine the creative ideas of the future - Brian Eno