By Natasha Murape

Of the 3 billion bases that make up the DNA content of the human genome, approximately 21,000 protein-coding genes are scattered along them. This represents only 3% of the human genome. The remaining vast stretches of repetitive DNA were previously dismissed as “junk”, evolutionary debris without a biological function. More recently, however, dumpster diving through the human genome has unearthed a treasure trove of hidden regulatory networks that hold clues to gene regulation and disease risk. In genetics, “Junk DNA” refers to non-protein-coding DNA regions. The term was coined in the 1960s and popularised by Susumo Ohno. The idea around Junk DNA arose from the assumption that vast amounts of non-coding regions consisted of non-functional, repetitive DNA elements. Today, however, the concept of junk DNA is evolving as emerging evidence reveals that non-coding regions function as critical regulatory elements essential for gene regulation and disease risk.

Non-coding regions of genomes, particularly abundant in higher organisms, were poorly understood by early researchers; their size, complex composition and lack of protein code led many geneticists to readily assume they were non-functional. These regions are often characterised by short and simple DNA sequences, sometimes as few as two to three nucleotides long, repeated a thousandfold throughout the genome. Additionally, the length and composition of these repeats tends to be extremely variable between species and members of the same species, and sometimes between cells from the same organism. This stands in sharp contrast to the structure of known genes, which are generally longer, unique and highly conserved, indicating their crucial function. From this observation, early researchers concluded that sequences exhibiting such drastic variability were unlikely to have an important biological function. Furthermore, early prominent geneticists had a reductionist view that everything else that is non-protein-coding in the DNA is likely junk.

Contrary to earlier assumptions of non-functionality, short tandem repeats within non-coding DNA have been shown to play a critical role in regulating gene expression by shaping transcription factor binding. Gene expression is regulated by transcription factor proteins that bind to DNA sequences called motifs. However, TFs do not always behave as predicted, occasionally binding to regions that lack motifs or by avoiding strong motifs. One explanation lies in the changes within non-coding regions, such as short tandem repeats, which seem to alter the local DNA landscape around motifs and influence transcription factor affinity. A new study by Horton and colleagues demonstrated that length variations within short tandem repeats around a motif could enhance transcription factor binding by 70-fold, ultimately modulating the expression of nearby protein-coding genes. Given that Short tandem repeats are associated with polygenic diseases ranging from autism to cancers these findings are important as they may provide clinical insights and therapeutic strategies for such conditions.

Non-protein-coding regions of the genome, once thought to be inert, also generate functional RNAs that regulate gene expression. A large fraction of the eukaryotic transcriptome consists of non-coding RNAs including microRNA (miRNA), long non- coding RNA (lcRNA) and circular RNA (CircRNA). Among these, microRNAs are a significant discovery and are now recognised as key regulators of various cellular processes. The first miRNA to be identified, lin-4, was discovered by scientists Victor Ambros and Gary Ruvkun in Caenorhabditis elegans in the early 1990s. In his laboratory, Ruvkun demonstrated that lin-4 repressed expression of lin-14 during the post-transcriptional stage. Ambros was concurrently investigating the regulation of lin-14 by lin-4, and comparison of their findings revealed that the two genes shared complementary sequences. Together, they successfully showed that lin-4 bound to the lin-14 mRNA transcript, thereby blocking its translation into a protein. This discovery reveals an unusual mechanism of gene regulation not only peculiar to C. elegans but present throughout the animal kingdom. Since then, miRNAs have been essential for normal cell and tissue development. Dysregulation of miRNAs has been linked to the onset of cancers; moreover, mutations in the genes coding for miRNAs have been associated with hearing loss, eye disorders, and skeletal disorders. 

In conclusion, the notion of “Junk DNA” is increasingly recognized as misleading, as non-coding regions once dismissed as genomic debris are now understood to contain regulatory elements that control gene expression and influence disease risk. Evidence shows that short tandem repeats within non-coding regions influence transcription factor binding, while non-coding RNAs regulate genes at the post-transcriptional level. These discoveries paint a different picture of the biological function of these regions once considered inert. It may be time to retire the phrase “Junk DNA,” as it fails to capture the biological activity observed in these genomic regions.

References

  1. Connor A. Horton et al. ,Short tandem repeats bind transcription factors to tune eukaryotic gene expression.Science381,eadd1250(2023).DOI:10.1126/science.add1250
  2. Lee RC, Feinbaum RL, Ambros V. The C. elegans heterochronic gene lin-4 encodes small RNAs with antisense complementarity to lin-14. Cell. 1993;75(5):843-854. doi:10.1016/0092-8674(93)90529-y
  3. Wightman B, Ha I, Ruvkun G. Posttranscriptional regulation of the heterochronic gene lin-14 by lin-4 mediates temporal pattern formation in C. elegans. Cell. 1993;75(5):855-862. doi:10.1016/0092-8674(93)90530-4

    Posted in

    Leave a comment