Much of the human genome consists of regulatory regions that control which genes are expressed at a particular time within the cell. These regulatory elements can be located close to the target gene or up to 2 million base pairs away from the target.
To enable these interactions, the genome rotates itself into a three-dimensional structure that brings distant regions closer together. Using new technology, MIT researchers have shown that they can map these interactions with 100 times greater accuracy than was previously possible.
says Anders Seger-Hansen, Underwood-Prescott Assistant Professor of Biological Engineering Career Development at MIT and senior author of the study. “We are excited to be able to reveal a new layer of 3D structure at such high resolution.”
The researchers’ findings indicate that many genes interact with dozens of different regulatory elements, although further study is needed to determine which of these interactions is most important for the regulation of a particular gene.
says Viraat Goel, a graduate student at MIT and one of the paper’s lead authors. “We are excited to offer the research community a tool that helps them untangle the mechanisms that drive gene regulation.”
Miles Hussain, a postdoctoral researcher at the Massachusetts Institute of Technology is also the lead author of the paper, which appears today in normal genetics.
High resolution maps
Scientists estimate that more than half of the genome is made up of regulatory elements that control genes, which make up only about 2% of the genome. Genome-wide association studies, which link genetic variants to specific diseases, have identified several variants that appear in these regulatory regions. Identifying the genes these regulatory elements interact with can help researchers understand how these diseases arise, and possibly how to treat them.
Detecting these interactions requires identifying the parts of the genome that interact with each other when the chromosomes are packed into the nucleus. Chromosomes are organized into structural units called nucleosomes–strands of DNA that wrap tightly around proteins–helping the chromosomes to fit within the tiny confines of the nucleus.
More than a decade ago, a team including researchers from MIT developed a method called Hi-C, which revealed that the genome is organized as a “fractal ball,” which allows a cell to tightly pack its DNA while avoiding knots. This structure also allows DNA to spread easily and refold when needed.
To perform Hi-C, researchers use restriction enzymes to cut the genome into many small pieces and connect the biochemical pieces close together in a three-dimensional space within the cell nucleus. They then determine the identities of the interacting pieces by amplifying and sequencing them.
While Hi-C reveals a great deal about the general three-dimensional organization of the genome, it has limited accuracy for selecting specific interactions between genes and regulatory elements such as enhancers. Enhancers are short strands of DNA that can help activate gene transcription by binding to the gene’s promoter — the site where transcription begins.
To achieve the resolution needed to find these interactions, the MIT team built on a newer technology called Micro-C, which was invented by researchers at the University of Massachusetts Medical School, led by Stanley Hsieh and Oliver Rando. Micro-C was first applied in budding yeast in 2015 and later applied to mammalian cells in three papers in 2019 and 2020 by researchers including Hansen, Hsieh, Rando, and others at UC Berkeley and at the UMass School of Medicine.
Micro-C achieves higher resolution than Hi-C by using an enzyme known as a micrococcal exonuclease to chop up the genome. Restriction enzymes of Hi-C only cut the genome at specific randomly distributed DNA sequences, resulting in DNA fragments of varying and larger sizes. In contrast, microcosm exonucleases uniformly cut the genome into nucleosome-sized fragments, each containing 150 to 200 DNA base pairs. This uniformity of small parts gives Micro-C superior precision over Hi-C.
However, because Micro-C scans the entire genome, this approach still didn’t achieve a high enough resolution to pinpoint the kinds of interactions the researchers wanted to see. For example, if you want to look at how 100 different genome sites interact with each other, you need to sequence at least 100 multiplied by 100 times, or 10,000. The human genome is very large and contains about 22 million sites with nucleosome precision. Therefore, Micro-C mapping of the entire human genome would require at least 22 million multiplied by 22 million sequence reads, at a cost of more than $1 billion.
To lower this cost, the team devised a method for more targeted sequencing of genome interactions, allowing them to focus on the parts of the genome that contain genes of interest. By focusing on regions spanning a few million base pairs, the number of potential genomic loci decreases a thousand-fold and sequencing costs decrease a million-fold, to about $1,000. The new method, called Region Capture Micro-C (RCMC), is able to generate maps 100 times more informative than other published technologies for a fraction of the cost.
“We now have a way to get super-resolution, very affordable 3D genome structure maps. Previously, it was financially inaccessible because you would need millions, if not billions of dollars, to get high resolution,” says Hansen. “The only limitation is that you can’t get the whole genome, so you need to know roughly the region you’re interested in, but you can get very high resolution, and very affordable.”
In this study, the researchers focused on five regions varying in size from hundreds of thousands to about two million base pairs, which they chose because of the interesting features revealed by previous studies. This includes a well-characterized gene called Sox2, which plays a key role in tissue formation during embryonic development.
After capturing and sequencing the DNA fragments of interest, the researchers found several enhancers that interact with Sox2, as well as interactions between nearby genes and enhancers that had not been seen before. In other regions, especially those full of genes and enhancers, some genes interacted with up to 50 other pieces of DNA, and on average each interacting site contacted about 25 more pieces.
“People have seen multiple interactions of one DNA fragment before, but it’s usually on the order of two or three, so seeing that many of them was very significant in terms of the difference,” Hussain says.
However, the researchers’ technique does not reveal whether all of these interactions occur simultaneously or at different times, or which of these interactions is most important.
The researchers also found that DNA appears to wrap itself in overlapping “microparts” that facilitate these interactions, but they were unable to determine how the microparts formed. The researchers hope that further study of the underlying mechanisms will shed light on the fundamental question of how genes are regulated.
“Although we don’t currently understand why these are in small parts, and we have all these open questions in front of us, we at least have a tool to rigorously ask these questions,” Joel says.
In addition to pursuing these questions, the MIT team also plans to work with researchers at Boston Children’s Hospital to apply this type of analysis to regions of the genome that have been linked to blood disorders in genome-wide association studies. They are also collaborating with researchers at Harvard Medical School to study variants associated with metabolic disorders.
The research was funded by a Koch Institute Support Grant (core) from the National Cancer Institute, the National Institutes of Health, the National Science Foundation, a Solomon Buchsbaum Research Support Committee Award, the Koch Institute Frontier Research Fund, an NIH Fellowship, and an EMBO Fellowship.