If the human genome is like the blueprint for a house, a new study on the non-coding parts of DNA maps the wiring, connecting the lights to their switches.
Around 98.5% of human DNA is non-coding, meaning it isn’t copied to make proteins. A new study has connected many of these non-coding regions to the genes they affect and laid out guidelines for how researchers can continue this work going forward.
Understanding the non-coding portion of our DNA is critical for understanding the genetic components of disease, says Steven Reilly, an assistant professor of genetics at Yale School of Medicine who co-led the study.
“When we find mutations in DNA that are associated with some trait or disease, they’re often in these non-coding regions,” said Reilly. Understanding which genes these mutations impact is really critical.”
The study was published in Nature Methods.
For the study, Reilly and his colleagues set out to understand how non-coding regions of DNA known as “enhancers” and “promoters” are linked to genes. Promoters are bits of DNA just upstream of genes that control whether the genes are transcribed into mRNA, which will eventually be turned into protein. Molecules that activate genes bind to promoters to initiate the process. Enhancers are regions of DNA that act as additional control elements for promoters, instructing them where and when to turn on. However, they can be quite far away from the genes they control, making it hard to predict which genes that a mutation in an enhancer might impact.
Essentially, these genetic regulators help turn genes on and off.
The research is part of a 20-year-long project called the Encyclopedia of DNA Elements, or ENCODE, Consortium. The National Human Genome Research Institute funds it and includes over 30 institutions.
In earlier phases of the project, researchers mapped out where enhancers and promoters are located in the human genome. Reilly said the genome is something like a blueprint for a house; he said that discerning the location of enhancers and promoters would be like locating where the light switches are in a house. This study, he said, was about identifying the wiring plan for the house, to know which lights — or genes — those switches turned off and on, with promoters comparable to a regular light switch and enhancers more like a dimmer knob.
To do this, the researchers used CRISPR, a gene-targeting tool, to turn off small sections of DNA, one at a time, and then observed what happened to genes. Normally, CRISPR homes in on a specific DNA sequence and cuts it. Here, the researchers used a modified version tethered to a molecule that silenced nearby DNA rather than cutting it.
This, Reilly said, essentially allowed them to flick the light switches on and off.
And they did this with large parts of the genome, not just with what they suspected were enhancers or promoters.
“The good news was that the only things that seemed to do anything were the things we’d already mapped out as enhancers or promoters,” said Reilly. “So there weren’t some secret light switches we hadn’t known about. That confirms that when we’re looking at a DNA variation that might impact disease, the enhancers and promoter maps we have are the places to look.”
In a more surprising finding, the researchers discovered that individual enhancers could affect multiple genes. It was as if one light switch turned on several lights.
“We originally had tended to think that one enhancer was affecting one gene, but we found it was really common for one enhancer to impact many genes,” said Reilly. “That says that if you have a mutation in an enhancer that’s associated with a disease, you might need to be looking for several impacted genes, not just one.”
Together, the researchers performed these experiments on more than 540,000 sections of DNA.
Doing this work together and systematically allowed the group to find patterns and identify best practices that they likely wouldn’t have through separate experiments, Reilly said.
The group was collectively able to determine the best way to go about these particular CRISPR experiments, identifying which guides should be used to direct CRISPR and which analysis methods are most accurate. This will help other researchers do these types of experiments in their DNA regions of interest more effectively and more efficiently, said Reilly.
“Particularly if researchers are working with patient cell samples, which they may only have a certain amount of, they’ll want to use our guidelines to maximize their chances of linking enhancers to their target genes,” he said.
Additionally, the researchers found that when using this type of CRISPR screening, it matters which of the two DNA strands you target.
“Depending on which strand you target, you will get different results of how big of an effect the CRISPR-mediated DNA repression has on genes,” said Reilly. “Knowing these differences will allow researchers to design the right analysis methods.”
This particular finding wouldn’t have been possible without the large collaborative effort of this work, he added.
“We only saw this because we were analyzing hundreds of these experiments. You need to assemble really large datasets to see these patterns,” said Reilly. “This has been the theme of the human genome work from the beginning. The genome is huge. One person or one lab can’t tackle it all. And this work has been a cool example of how large-scale collaborations work and their necessity for this monumental task of understanding the human genome.”
The ENCODE Consortium, which was launched in 2003, is coming to an end with many of its main goals achieved. Going forward, Reilly aims to use the best practices that have come out of this work to do these types of analyses in more complicated systems. One goal is to better understand how many genes are involved in the development of disease or in conferring observable traits like height.
“We have a good sense of what DNA variants exist, but we don’t have a good sense of how those variants affect genes,” said Reilly. “This study gives us a roadmap to do those experiments better.”
Source: Yale University