As leading biologists around the world endeavored to decode the human genome in the late 1990s, they tried to estimate the number of genes contained in the 3 billion base pairs that make up our genetic code. Consequently, numerous predictions emerged. According to classical speculation from about a decade ago, humans were thought to have around 100,000 genes responsible for countless cellular reactions that create the biological functions of the body. However, it has now been revealed that we actually possess only about 25,000 genes, which is nearly the same as the number of genes in the flowering plant Arabidopsis and only slightly more than that of the roundworm Caenorhabditis elegans.
This surprising finding has reinforced a new understanding among geneticists: our genome, along with those of some other mammalian species, is far more dynamic and complex than previously thought. Even the most fundamental doctrine in genetics, one gene/one protein, has given way to a more accurate and universal theory: many genes, where each gene can produce more than one protein. In exploring how, where, and when genes are expressed, researchers have discovered that the regulatory processes controlling gene expression are not confined to external factors, such as regulatory proteins, but that the genome itself also regulates its expression through non-coding DNA regions and structural and chemical modifications of the genome itself. Therefore, one of the greatest challenges for modern biologists today is to demonstrate how all these factors cooperate to orchestrate the process known as “gene expression.”
Relative gene count in humans (Photo: sciencemag)
To explain why the human genome can achieve an astonishing level of complexity with a relatively small number of genes, researchers suggest that the phenomenon of “alternative splicing” plays a crucial role. The human genome contains both regions: coding DNA regions, known as exons, and non-coding DNA regions. In many genes, the combination of different exons can occur at different times, and each combination results in a different protein. Initially, when the process of alternative splicing was discovered, biologists considered it an “anomaly” during transcription; however, it is now agreed among researchers that this phenomenon occurs in half—if not most—of our genes. Understanding the mechanism of alternative splicing helps explain how a modest number of genes can produce hundreds of thousands of different proteins. However, a lingering mystery remains unanswered: how does the transcription machinery decide which part of a gene will be read at a given point in time?
In addition to the mechanism of alternative splicing used to explain the one gene/multiple proteins phenomenon, other mechanisms have also garnered attention. Researchers noted an intriguing point: to function effectively in a specific time and space, genes require hundreds of assistants to carry out their tasks. These assistants include proteins that can indirectly turn genes on or off by, for instance, adding methyl or acetyl groups to DNA. Other proteins interact more directly with genes, such as transcription factors: they land and occupy positions near the genes they will help control. Like alternative splicing, different transcription factors can bind to
![]() |
RNA molecules (Photo: psc.edu) |
various target sites to finely tune gene expression. At this point, another challenge for biologists is to demonstrate how regulatory proteins, whether indirectly or directly, can coordinate seamlessly within a complex system, as well as the relationship between regulatory proteins and the aforementioned alternative splicing mechanism.
In another approach over the past decade, researchers have been increasingly captivated by the pivotal role of histone proteins and RNA in regulating gene expression. Histone proteins are fundamental components for packaging and condensing DNA within the cell nucleus, as well as helping chromosomes maintain a tightly coiled state. In fact, even a slight change in configuration at a certain region can swing open the door, allowing one or more genes in that area to be transcribed.
Genes, besides coding for proteins, also produce RNA. Small RNA molecules, sometimes less than 30 bases, have now been confirmed to play roles similar to other gene regulatory factors. Many biologists, who previously focused on messenger RNA and larger RNA molecules, have now turned their attention to their smaller counterparts, including microRNA and small nuclear RNA. At least to date, it has been discovered that small RNA molecules play a crucial role in determining cell fate during organism development, but the underlying mechanisms are not yet fully understood.
By the early 21st century, researchers have made significant strides in understanding the biological mechanisms mentioned above. In addition to traditional methods that remain effective, biologists have also leveraged genetic information from organisms across different branches of the evolutionary tree to conduct extensive comparisons and analyses. From these analyses, researchers are gradually revealing how the aforementioned mechanisms, such as alternative splicing, may have evolved and identifying which regions of the genome act as regulatory regions. This, in turn, can help us understand how these regions perform their functions. Aside from experiments on classic model organisms like mice, such as adding or removing regulatory regions and modifying RNA, it can be said that computer modeling has been of great assistance to researchers. However, the central question that remains unanswered is: How do all the biological attributes of the genetic machinery intertwine to create a wondrous product, which is our entire body?
Trần Hoàng Dũng