Imagine the human genome as a string extending the length of a football field with all the genes that encode proteins at its end. All the information about proteins is now in your hands.
There are about three billion base pairs in the human genome, but only about 2% of them encode proteins. This wasteful use of genetic material is not unique to humans: a large number of bacteria appear to devote 20% of their genome to noncoding filler.
Noncoding DNA remains a mystery, as well as whether it is worthless junk or something else. At least some of it has turned out to be vitally important biologically. Even beyond the question of its functionality (or lack thereof), researchers are now beginning to see how noncoding DNA can serve as a genetic resource for cells and a nursery for new genes.
"Slowly, slowly, slowly, the concept of 'junk DNA' [has] started to die," says geneticist Cristina Sisu at Brunel University London.
Scientists have casually referred to "junk DNA" since the 1960s, but the term was first used formally in 1972 by geneticist and evolutionary biologist Susumu Ohno, who argued that large genomes would inevitably contain sequences that did not encode any proteins. Afterwards, researchers uncovered hard evidence of how abundant this junk is in genomes, how diverse its origins are, and how much of it is transcribed into RNA despite lacking the blueprints for proteins.
Technological advances in sequencing, particularly in the past two decades, have done a lot to shift how scientists think about noncoding DNA and RNA, Sisu said. Although these noncoding sequences don’t carry protein information, they are sometimes shaped by evolution to different ends. As a result, the functions of the various classes of “junk” — insofar as they have functions — are getting clearer.
Using some of their noncoding DNA, cells create a variety of RNA molecules that regulate or assist protein production in various ways. RNA molecules include small nuclear RNAs, microRNAs, small interfering RNAs, and many others. Some of the segments are short, typically less than two dozen base pairs long, while others are much longer. There are some that are double strands or fold back on themselves in hairpin loops. Yet all of them can bind selectively to a target, such as a messenger RNA transcript, in order to either promote or inhibit its translation into protein.
The wellbeing of an organism can be profoundly affected by these RNAs. In mice, shutting down certain microRNAs has caused tremors to liver dysfunction.
In humans and many other organisms, transposons make up the majority of noncoding DNA, segments of DNA that can move around within a genome. The "jumping genes" tend to make many copies of themselves throughout the genome, says Seth Cheetham, a geneticist at the University of Queensland in Australia. Retrotransposons are especially prolific since they reproduce efficiently by creating RNA copies of themselves that are converted back into DNA at another location in the genome. In some maize plants, that figure reaches about 90%. About half of the human genome is made up of transposons.
Additionally, eukaryotic genes can contain noncoding DNA in the intron sequences, which are found between the protein-coding exons of humans and other eukaryotes. During transcription, exons are spliced together into mRNA, while introns are discarded. Nevertheless, some of the intron RNAs are converted into small RNAs that make proteins. Researchers suspect that introns accelerate gene evolution by making it easier to reshuffle exons into new combinations in eukaryotes.
Noncoding DNA in genomes contains highly repeated sequences of varying lengths, which account for a large and variable portion of it. As an example, telomeres, which cap chromosome ends, are largely composed of these. Chromosomes may be maintained by repeats (the shortening of telomeres as a result of repeat loss is linked to aging). Although many repeats in cells serve no known purpose, they can be lost or gained with no apparent ill effects during evolution.
Pseudogenes are considered to be remnants of working genes that have been accidentally duplicated and then degraded through mutation as an interesting category of noncoding DNA in the present day. A redundant copy may not exert much pressure on natural selection if only one copy functions.
In terms of genomic garbage, pseudogenes are similar to broken genes. Cheetham warns, however, that some pseudogenes might not be fakes after all. Numerous of them were believed to be defective copies of recognized genes and labeled as pseudogenes without any experimental evidence that they weren't.
It is also possible for pseudogenes to become functionally active. "Sometimes they can actually control the activity of the gene from which they were copied," Cheetham said, if their RNA is similar enough to that of the working gene to be able to interact with it. The discovery in 2010 that the PTENP1 pseudogene had found a second life as an RNA regulating tumor growth convinced many researchers to examine pseudogene junk more closely.
Because dynamic noncoding sequences can generate so many genomic changes, they can both be the engine for the evolution of new genes and the raw material for those genes. Researchers have discovered an example of this in the ERVW-1 gene, which encodes a protein essential for the development of the placenta in Old World monkeys, apes, and humans. About 25 million years ago, the gene hitched a ride on a retrotransposon into the genome of an ancestral primate infected by retroviral DNA. Cheetham said the retrotransposon "basically co-opted this element, jumping around the genome, and turned it into something that's really crucial to human development."
But how much of this DNA therefore qualifies as true “junk” in the sense that it serves no useful purpose for a cell? This is hotly debated. In 2012, the Encyclopedia of DNA Elements (Encode) research project announced its findings that about 80% of the human genome seemed to be transcribed or otherwise biochemically active and might therefore be functional. However, this conclusion was widely disputed by scientists who pointed out that DNA can be transcribed for many reasons that have nothing to do with biological utility.
Alexander Palazzo of the University of Toronto and T. Ryan Gregory of the University of Guelph have described several lines of evidence — including evolutionary considerations and genome size — that strongly suggest “eukaryotic genomes are filled with junk DNA that is transcribed at a low level.” Dan Graur of the University of Houston has argued that because of mutations, less than a quarter of the human genome can have an evolutionarily preserved function. Those ideas are still consistent with the evidence that the “selfish” activities of transposons, for example, can be consequential for the evolution of their hosts.
Cheetham thinks that dogma about “junk DNA” has weighed down inquiry into the question of how much of it deserves that description. “It’s basically discouraged people from even finding out whether there is a function or not,” he said. On the other hand, because of improved sequencing and other methods, “we’re in a golden age of understanding noncoding DNA and noncoding RNA,” said Zhaolei Zhang, a geneticist at the University of Toronto who studies the role of the sequences in some diseases.
Researchers may be less inclined in the future to describe noncoding sequences as junk, as there are now so many other ways to label them. According to Sisu, the field needs to keep an open mind when evaluating noncoding DNA and RNA and their biological importance. "Take a step back," she said. "One person's trash is another person's treasure."