The difference between the estimates of human gene number by the public consortium (31,000 genes) and the privately-funded Celera group (39,000 genes) left many wondering. However, as the two groups use different gene-identification techniques, their estimates are not directly comparable. Each of these estimates would be affected by even minor changes in laboratory procedure. For example, because of such changes to methodology, an initial estimate of 120,000 genes was lowered to 81,000—which also now seems indefensible.
This highlights the fact that in the study of molecular genetics there is still much uncertainty. A closer look at the scientific literature to see what researchers actually say about the current state of knowledge reveals just how little is yet known. (In stark contrast to the impression given by much of the popular daily news media.)
Researchers have described the recently-published sequence of the human genome as a 'rough draft'.1 Despite around 90% of the 'gene-rich'2 portion of the genome having been sequenced, only about a quarter of the whole genome is considered 'finished'.
The core difficulty is that the published sequences are not continuousthere are gaps. If there are too many gaps, it makes it difficult to clearly identify which base sequences belong to specific genes—some predicted genes could turn out to be 'pseudogenes', or just fragments of real genes. (Researchers concede that the evidence for at least 10,000 of these predicted genes is only weak.1) Also, until the sequence is 'finished', researchers won't know what is 'missing'—it might turn out that even 'gene-poor' areas will be critical for gene regulation.
A leading genetics expert aptly summed up the present state of knowledge when he cautioned, 'It is important to remember that no statements can be made with high precision because the draft sequences have holes and imperfections, and the tools for analysis remain limited …'.3
References and notes
Bork, P. and Copley, R., The draft sequences: Filling in the gaps, Nature 409(6822):818820, 2001.
The estimated total size of the genome is 3.2 Gb (gigabases), with about 2.95 Gb being the gene-rich, 'euchromatic' portion. The remaining (heterochromatin) regions of chromosomes have not yet been examined.
Baltimore, D., Our genome unveiled, Nature 409(6822):814816, 2001.