Originally published in Journal of Creation 14, no 3 (December 2000): 55-71.
The claim that pseudogenes and their respective variations are shared between primates in a nested hierarchy, and can only be explained through common evolutionary descent, is found wanting.
The human genome is believed to be littered with pseudogenes, which are gene-like structures that do not code for proteins because of some presumed defect.3 A recently-published4 abridged example is shown (Table 1). Useful summaries on this topic are available.5,6 The term pseudogene, as used here, encompasses both the classical and the retroposited varieties, the latter of which includes interspersed repeats*, notably SINEs* and LINEs*.7 Creationist scientists (including me) generally assume that God would not create purposeless genes in different primates, and that God did not independently disable the same genes in humans and nonhuman primates during the Curse.
Unfortunately, the distinction between empirical observation and evolutionary interpretation is often particularly difficult in molecular biology. There is always an element of subjectivity in the process of aligning sequences of homologous (orthologous*) DNA8,9 (Fig. 1), and this is aggravated by non-corresponding segments of the same.10 Furthermore, it is unclear just how close the resemblance must be to rule out a fortuitous match-up of mistakenly orthologous sequences. For instance, there is ambiguity11 about the status of one 34 bp (base-pair*) segment exhibiting 68% nucleotide* correspondence between the human and rat genomes. And last, molecular similarities, including those of pseudogenes, do not create self-evident truths, but must be interpreted:
‘At face value, this is just wrong—alignment procedures delineate similarity between sequences but tell us nothing about their common ancestors, if such ever existed. To give an absurd but relevant example, poly-A* tails of any two processed pseudogenes are perfectly alignable, but it would be a stretch to consider them homologous.’12
Contrary to the assertions of some,6 the presumed temporal persistence of supposedly-useless pseudogenes actually constitutes a serious problem for evolution. The manufacture of DNA is energetically costly to the cell, and natural selection should remove DNA were it actually useless.13 A mechanism for removal is now known.14
If they are actually selectively neutral and subject to random mutations, ‘old’ pseudogenes should in fact be scrambled beyond recognition. Apropos to this, orthologous SINEs have now been found in different phyla,15 and the cited researchers recognize that the (evolutionary) maintenance of a close correspondence between such phylogenetically*-distant organisms is very difficult to explain if SINEs are of no use to their carriers. More on this later.
If pseudogenes are functional, they are no different from any other homologous structure found in nature. These all reflect the fact that God used the same ‘blueprint’ or ‘art form’ repeatedly when constructing different living things. In this case, the orthologous placement of pseudogenes, and their respective differences, are moot.
The importance of pseudogene-caused genetic diseases6 is apt to be exaggerated because, by their very nature, deleterious retroelements* are so obvious.16 The opposite is the case with beneficial pseudogenes. In fact, for at least some pseudogenes, failure to observe them coding a product under experimental conditions is not ipso facto proof of their inability to do so:
‘In these and other examples it cannot be stated with certainty that a gene is unequivocally either a pseudogene or a gene. It is possible that analysis has not been performed in the appropriate temporospatial conditions to detect expression.’17
One argument adduced in support of pseudogene nonfunction is the observation that they contain many more nucleotide differences (which are assumed to be mutations), and are more variable in terms of base-pair composition than their paralogous* protein-coding genes. Yet this observation is compatible with function.2 In fact, as mentioned below, an ability to code for a product useful to the host organism hardly exhausts the possibilities for pseudogene function. It is interesting to note that the inferred nucleotide-substitution rate in pseudogenes shows only crude correspondence with primate phylogeny, for which reason it has to be manipulated post hoc by up to tenfold18–20 in order to contrive an agreement between the timing of different episodes of primate evolution. Pseudogenes whose age is deduced on the basis of the numbers of nucleotide differences from their coding paralogs show only a weak relationship between age and the numbers of indels*.21 Each branch of the phylogenetic tree, of pseudogenes relative to primate evolution, exhibits widely divergent rates of indel formation.22
It is interesting to note that there are some pseudogenes which cannot be straightforwardly portrayed as inactivated copies of their paralogous genes. This includes the human AS pseudogenes, each of which shares a concerted pattern of 19 nucleotides that sets it apart from its inferred gene paralog.23
A large and rapidly-growing body of evidence for pseudogene functionality exists, most of which will be presented in a forthcoming paper.24 Earlier-known evidences are given elsewhere.5 There is a theory25 which proposes that pseudogenes interact with antisense RNA*. The functionality of Alu* units has long been suspected,26 and recently confirmed.27,28
The distinction between ‘processed genes’ and ‘processed pseudogenes’ is not, contrary to one critic,6 the result of creationist confusion, but is instead the product of the critic’s semantics. After all, the former is but a functional version of the presumably-nonfunctional latter.29 Evolutionists assume that certain retropseudogenes have become ‘recruited’ by evolutionary processes and are thereby secondarily functional. These are called processed genes. But this of course begs the question about them having lost function to begin with! The claim6 that functioning pseudogenes are manifestations of beneficial mutations is also an egregious act of begging the question. The latter also reflects the following prejudicial and erroneous notion: if ‘crippling mutations’6 prevent a protein-coding function, this is ipso facto synonymous with no function.
Max’s response6 to this evidence is to use the ‘ATT’ (Appeal to Technicalities) fallacy and the ‘ATM’ (Appeal to Marginalization) fallacy.30 Each is described in turn.
Users of the ATT fallacy engage in much post hoc quibbling about the broad applicability of contrary evidence.31 One example32 is the belittling of the discovery of ‘junk DNA’ function33 by pointing out the (correct) fact that this noncoding DNA differs from that in pseudogenes. But discoveries of this nature, and successive ones,34 cannot be dichotomized so easily (see Abstract). This is especially so in light of the fact that the identical ‘nonfunctional unless proven functional’ mentality besets our understanding of all types of noncoding DNA. Finally, and as noted earlier (and especially in the forthcoming paper24), evidence for function is not limited to generic ‘junk DNA’, but is now known for representatives of all the major types of pseudogenes. Therefore, attempts to depreciate the significance of such function (as by asserting that it is only true of a few processed pseudogenes6) appears to be another use of the ATT fallacy.
The ATM fallacy treats evidence as a simple numbers game.35 But, as pointed out by the philosopher of science Sir Karl Popper,36 evidence cannot be treated in this way (e.g. as so many points for, versus so many points against, a theory). Indeed, one contrary observation is often sufficient to falsify a theory. Popper’s philosophy clarifies the fact that, contrary to Max,6 the ‘nonfunctional pseudogenes’ argument is not substantiated by large numbers of apparently nonfunctional pseudogenes, but is instead falsified by a significant and rapidly growing body of evidence which demonstrates such function.
In response to those who assume that pseudogene nonfunction is well established,6 we must consider several factors, not the least of which is the following:
‘Science is supposed to advance step by step, with all conclusions supported by adequate evidence. Yet conclusions are sometimes widely accepted without much evidence, and woe to those who come along later with data supporting what is already “received” wisdom.’37
Apropos to this, it is acknowledged38,39 that nonfunctional-pseudogene beliefs took hold at a time when the genome was little understood, and when sociobiology dominated all of biology,40 favouring such attitudes. The classic essays by Orgel, Crick, Doolittle and Sapienza, which largely inspired the notion that noncoding DNA is useless, parasitic and ‘selfish’, are recognizably anthropomorphic and speculative.41 In addition, Howard and Sakamoto40 stress the fact that, majority opinion notwithstanding, pseudogene-nonfunction beliefs rest largely upon negative evidence. In stark contrast to Max,6 others are not so sure that we know, even to this day, what pseudogenes cannot do (see also Abstract):
‘Short interspersed repetitive DNA elements (SINEs) are found in various eukaryotes* … . We still do not know the biological significance of these elements and how these elements evolved to the present status.’42
‘Do these elements [LINEs and SINEs] serve a generally useful function or are they simply “selfish DNA”?’43
‘However, this is not a strong argument, and whether L1* is “selfish” remains to be determined.’ 44
‘The problem is that generally one does not know whether a pseudogene has any noncoding phenotypic effect and whether the effect is deleterious or advantageous.’45
In addition, the ‘few known functional pseudogenes implies few functional pseudogenes’ thinking, though presented by Max6 as virtual fact, is recognizably no more than a hypothesis.46 Moreover, this hypothesis is either explicitly or implicitly rejected by various investigators, who recognize the fact that the relatively small number of known functional pseudogenes is not at all commensurate with their overall significance:
‘There are severe limits to our recognition of the roles of mobile elements … the knowledge of all of the control elements that may be important to genes is still very restricted. Since mobile elements occur and carry out useful functions in positions many kilobases from the initiation of transcription even those significant mobile elements that have been inserted within the last few million years may not have been principally recognized. Thus it can be argued that 21 examples represent a large number.’47
‘The question then is: which of the hundreds of thousands of Alu inserts are contributing to the regulation of nearby genes, and which are without significant effect?’48
Not surprisingly, the perceived rarity of functional pseudogenes has been self-perpetuating:
‘… given the fact that there are a million Alu elements in the human genome and there have been no systematic studies to identify which of them have regulatory functions, it must be only be a matter of time before human-specific Alus are found to control gene expression (emphasis added).’28
‘Recognizing that Alu repeats might be junk DNA, most researchers chose to study their mobility and incidental effects on genome structure, as opposed to their possible function.’49
Other investigators40 have also discussed how low expectations of pseudogene function have been self-fulfilling.
Apropos to this, it is erroneous to compare overall pseudogene function to a defendant in a criminal trial pleading innocence because evidence favourable to him may emerge in the future.6 To begin with, he is actually seeking acquittal as a result of the current state of evidence.50 Second, evolutionists cite ‘use current evidence only’ arguments6 selectively, i.e. for pseudogenes, but certainly not for naturalistic theories for life’s origins, otherwise they would admit the complete inadequacy of such theories, and acknowledge an external Designer. But a double standard is followed instead, and we are assured that no Designer is needed because, ‘Even though today we cannot explain life’s origins mechanistically, one day we probably will.’
A large fraction of most pseudogenes differ considerably from their paralogous genes. For instance, a compilation of 65 primate pseudogene sequences,51 totalling 80.6 kb*, indicates that parts of the pseudogene sequences resemble their paralogs at not much higher than chance levels (50% for two unrelated strands of DNA). Less than one-third of the 80.6 kb aggregate sequences are 85% similar to their paralogs, and a very small unspecified fraction of the same reaches 90%. The authors point out that progressively lower levels of similarity mean progressively greater ambiguity as to the origins and the timing of the accumulated pseudogene/gene differences. Taken to its logical conclusion, this means that ‘shared mistake’ arguments cannot even have relevance, let alone validity, for a large fraction (perhaps the majority) of pseudogenes.
Numerous pseudogenes consist of multiple paralogous copies in each primate genome. In such cases, ‘shared mistakes’ take on a life of their own. Evolutionists must essentially ‘shop around’ for the closest match52 in trying to deduce the orthologous pairings of pseudogenes from primate to primate. This can also occur in the case of multiple Alu repeats.53 If evolutionary ‘trees’ indicate an anomaly in which the pseudogenes of distantly-related primates resemble each other more closely than those of more closely-related primates, this can always be blamed after-the-fact on either an artefact of the ‘tree’ itself, or on an incorrect pairing of orthologs.54
Let us now consider those pseudogenes which have only single copies per primate genome. In doing this, I will adhere to the evolutionary methodology of counting only shared similarities and dissimilarities each of which simultaneously differs from that of ‘less derived’ primates.6 Even so, as shown below, while some pseudogenes appear to be hierarchically shared (as illustrated in Fig. 2) between primates,6 others definitely are not. Most of the latter are apomorphic*. (C), however, is an example of phylogenetic discordancy: it occurs in humans and orangs, but not in any primates of intermediate evolutionary derivation.
Years ago, I had called attention to a pseudogene which was shared by humans and gorillas but not chimps.55 It has since been alleged that the chimp pseudogene is lacking because its locus* had been deleted.56 This is an inference which rests on the assumption that all primates are evolutionarily related, and so any differences in DNA sequences must be of secondary origin. Other phylogenetic studies may have ignored missing loci.57 This complication, usually reckoned ‘missing information’, eventually makes any phylogenetic analysis uninformative.58
Moreover, missing loci cannot come to the rescue of evolutionists in still other hierarchy-defying instances of pseudogene deployment:
‘These include two of the OR genes (hOR17-7 and OR17-209), which are intact in human and chimpanzee, but are pseudogenes in gorilla, due to one-base deletions*. In both cases, the gorilla pseudogenes are accompanied by an intact variant, a potential case of heterozygosity with one of the alleles being a pseudogene.’59
Other examples of gorilla-only pseudogenes are given below. Otherwise, one OR pseudogene is human-specific and another set of OR pseudogenes are shared by humans and chimps but, ironically, are believed to be of independent origin.
Evolutionists can always invoke the ‘gene inactivation occurred after divergence’ claim, after the fact, in such situations, but such thinking is admittedly an assumption.60 More pointedly, this ad hoc rationalization begs the question about pseudogenes forming a phylogenetic nested hierarchy in the first place. And it is far from the only one. Gene conversions* can also be invoked for apomorphic pseudogenes, as was the case with the human-only BC200-Beta pseudogene.61
In other primates, the deployment of known pseudogenes also often fails to conform to a nested evolutionary hierarchy. The spider monkey has an apomorphic gamma-globin pseudogene.62 Elsewhere, seemingly orthologous DRB3 pseudogenes in the tamarin and titi contain different ‘inactivating mutations’. According to evolutionary storytelling, once upon a time some genes had come to resemble each other by convergence* before each one of them had become a pseudogene.63 In still another example, we encounter an exact reversal of the usual evolutionary expectation of genes increasingly becoming converted into pseudogenes in progressively more derived primates.6 Apropos to this, an inferred inactivation of the theta-1 globin genes exists in the less-derived non-primates (e.g. rabbit) and in the less-derived galago, but it is the more-derived higher primates that have functional orthologs instead.64 As a final example, nuclear pseudogenes in the primate family Cebidae portray a confusing phylogenetic picture, and this is largely blamed on confounding homoplasies* among the pseudogenes.65
Ironic to those who highlight pseudogenes as an accumulation of ‘shared mistakes’,6 there are evolutionists who are suspicious of pseudogenes as a means of charting the course of primate evolution:
‘Pseudogenes appear to be subject to virtually no selection and have, therefore, been used to provide the missing data. However, most pseudogenes are members of gene families in which frequent exchange of sequences among members may complicate interpretations of sequence divergence and phylogeny.’66
Finally, to put the ‘shared pseudogenes’ argument in a broader context, note that evolutionists cannot even agree as to which particular genomic structures can only be explained by shared evolutionary descent. The mitochrondrial gene order in birds has been shown to arise independently.67 The MHC complex exhibits considerable similarities among primates, with most of these genetic motifs believed to predate the chimp-human divergence.68 Yet, in a major about-face, evolutionists now recognize that complex MHC genetic motifs can arise independently.63,69 They currently reckon only 7 of 13 allelic lineages, and only at most a few of the 135 alleles of the DRB1 locus, as predating the human-chimp split.70
Both creationists and evolutionists recognize the fact that the majority of classical pseudogenes have always been located close to their protein-coding paralogous genes. But retropseudogenes are believed to have been retrotransposited at considerable distances from the paralogous parent gene, and only shared evolutionary ancestry is supposed to be able to account for such coincident placement in different primates.6 The most numerous retropseudogenes, by far, are SINEs (especially Alus71) and LINEs (notably L1 elements), each of which number in the hundreds of thousands72 in the human genome alone. Evolutionists believe that these elements are periodically inserted during the course of primate evolution (Fig. 3), and that each such episode generates a unique new family of interspersed repeats, creating markers suitable for phylogenetic analyses.
There are, however, numerous rationalizations available for dealing with inserted elements that fail to conform to a nested hierarchy. Contrary to the claim that successive families of LINEs and SINEs are hierarchically deployed among animals, there are many instances where clearly intact loci lack the predicted interspersed element. This occurs between members of different species73 as well as different orders.74 The rationalization invoked is this: the LINE or SINE element did not happen to integrate into that part of the host population which eventually survived into the present.
Evolutionists have long believed that Alu insertion* is an irreversible process; hence the absence-presence of an Alu at an orthologous site constitutes an ipso facto primitive-derived polarity. Were a formerly-inserted Alu to undergo subsequent deletion, this event would supposedly be betrayed by the simultaneous deletion of some of the flanking sequence*.6 To the contrary, precise excisions of Alu units can occur: a gorilla-human shared Alu is absent at the orthologous chimp locus, and an extra 12 bp right Alu-flanking repeat, added to an empty-site sequence, marks the missing-Alu spot.75
Members of the ‘wrong’ family of inserted repeats can even share particular orthologous sites. In one instance, an old-family Alu in the gibbon was found to be situated at the orthologous site of a new-family Alu in gorillas, chimps and humans.76 The former was then assumed to be a template for the evolution of the latter. In another instance,77 a modern-family Alu unit was found in humans, located anomalously at the site expected for an orthologous older-family unit. So a gene conversion event was conjured up, after the fact, for having supposedly reconfigured and ‘modernized’ the onetime old Alu family member to make it nearly identical to a modern human-specific Alu family member.
For the longest time, many evolutionists have argued that the parallel* insertion of essentially identical retropseudogene units, at the orthologous site in different animals, is a virtual impossibility. One estimate placed the odds against such an event at one in many billions.78 Wouldn’t you know it—the same SINE units,79,80 as well as LINE units,81 have now been discovered independently emplaced at orthologous sites in different genomes.
For the vast majority of the ostensibly-younger Alus, there can be no question about their occurrence in a nested hierarchy, as the vast majority of them are apomorphic.78 Furthermore, the Ya5 Alu family is a showcase of a violated nested hierarchy. Originally believed to occur only in humans, Ya5 Alu repeats turned up in chimps,82 and then gorillas. So it was then supposed that the source gene had generated Ya5 retropseudogenes prior to the human-chimp-gorilla divergence, and so, in accordance with a nested hierarchy, these ape Ya5 Alus would also be found at the orthologous sites in humans. But they were not, and this development was thus explained away:
‘However, it is also remarkable that according to our interpretation, the PV EPL must have been active at least once in each of the three divergent HCG lineages.’76
Remarkable indeed. We are seriously asked to believe that the PV EPL source gene became activated independently in all three primates, and many times in two of them, after their mutual divergence. The plasticity of organic evolution is a sight to behold!
Of course, the belief that families of interspersed elements form nested hierarchies is predicated on the belief that the families are factual entities. But, not only are apomorphic nucleotide substitutions found, but also ones which appear, disappear, and then reappear again in ostensibly progressively more derived Alu families.83 The same occurs in L1 families.74 In addition, there are so many recent L1 families in existence that they have no clear-cut boundaries, and it is admittedly difficult to sort out the resulting ‘inconsistent pattern of shared characters’.84 Such blurring also occurs between the older Alu families.85
Earlier, I noted that the molecular ‘clock’ varies considerably from one pseudogene to another. The same holds for the rate of nucleotide substitutions in Alu units. The accumulation of what may ironically be called unshared-mistake nucleotide differences, between orthologous human-chimp Alu elements, differ significantly from one Alu element to another. Obviously, independent of ‘age’ and degree of evolutionary relatedness, nucleotide-substitution rates turn out to be governed by the base composition of the host DNA.86
Can we really be sure that the same interspersed repeat is located at the identical location in different primate genomes? Evolutionists commonly believe that orthologous inserted-element units and orthologous flanking sequences (including any flanking repeats) can all be unambiguously identified. The actual situation is not as clear-cut. As discussed below, orthologs are usually far from identical, and there are features which reduce the distinctiveness of each inserted element from another.
To begin with, the Alu units themselves, apart from varying in terms of nucleotide sequence, do not even have to be of equal length to be judged orthologous.78 In particular, the differences in length of the poly-A tail, between presumably orthologous Alu units, are often excused on the basis of the vulnerability of homopolymeric* sequences to episodes of partial deletion after insertion.87 In addition, the direct repeats* which usually surround each retropseudogene often have ambiguous boundaries with the Alu unit itself and/or the surrounding flanking sequence.26 Furthermore, owing to the prevalence of (A+T) upstream of Alu insertions,88 the direct repeats are also (A+T)-rich, thereby reducing their capability of differing from their counterparts in unrelated pseudogenes. This further diminishes the distinctiveness of suspected orthologous pairings.
Now consider flanking sequences. The earlier discussed fact that there is always some uncertainty in aligning of sequences18 implies that there must always be an element of doubt if ostensibly orthologous retropseudogenes are really located in exactly the same position in two or more genomes. In fact, it is acknowledged that the exact positions of many retroposed elements are uncertain or erroneous.89 Although primers can recognize presumably orthologous retropseudogene sequences whose flanking regions differ by as much as 25–30%,90 there are no absolute rules for the minimum degree of similarity required to justify such orthologous pairings.89 There are even published instances91 of orthologous pairings of LINE elements being accepted by several teams of investigators and then, upon reinvestigation, relocated hundreds of bases apart. Orthologous Alus, with dissimilarities in flanking sequences approaching 30%, are not limited to distantly-related primates, but are known to occur even in human-chimp comparisons, with the flanking repeats additionally differing in both base composition and overall length.87 In severe cases, the flanking regions of prospective orthologs are so dissimilar to each other that the orthologous pairing itself is doubtful.58
An unavoidable fudge factor is created by matching inexact sequences. There are even instances where the nucleotide differences in the presumed-orthologous flanking sequences actually form phylogenetically discordant groupings:
‘Thus, there is a C and an A shared by the gorilla and orangutan; a G shared by the baboon and rhesus; a C shared by the gorilla and pygmy chimpanzee; and a T shared by the orangutan and baboon. These examples of shared characters are discordant. The orangutan cannot have a recent common ancestry with the gorilla and with the baboon. The shared nucleotides can be interpreted as having arisen independently in two lineages. This raises the question of how many of such “shared nucleotides”, that have been used to support common ancestry, have actually arisen independently in two lineages(emphasis added)?’71
The flanking sequences which surround paralogous and orthologous retropseudogenes, already imprecisely similar to each other, are evidently not free to differ from each other in an unconstrained manner. An examination of three paralogous AS pseudogenes, each of which is compared to its orthologous pseudogene in different primates, indicates that flanking sequences vary from each other in a very nonrandom pattern of nucleotide substitutions that recur in parallel.92 This raises further doubts about the diagnostic uniqueness, in terms of nucleotide sequence, of each flanking sequence in the genome, as well as the belief that each such sequence is so unique that it (and its contained retropseudogene) can only be explained by shared evolutionary ancestry.
To begin with, most pseudogenes contain multiple, nonunique alterations relative to their coding paralogs, making it often difficult to declare which one ostensibly inactivated the original gene.93 Moreover, orthologous primate pseudogenes can have different ‘inactivating mutations’.63 The fact that some orthologous human-chimp pseudogenes contain the same stop codon*6 appears impressive until one realizes that this is often not the case. For instance, a gorilla-specific CYP21 pseudogene has a stop codon while its indisputably-functional chimp ortholog does not and its human pseudogene ortholog does—but at a different location in the sequence.94 The CD8B1 gene provides another example of a gorilla-only stop codon.95 Elsewhere, a human OR pseudogene has a stop codon while its orthologous chimp pseudogene does not.96 And, when coincidental stop codons do occur, this is hardly compelling evidence for ‘shared mistakes’ in view of evidence for parallel nucleotide substitutions and parallel deletions (discussed later). The latter is relevant to frameshift-generated stop codons. Finally, we would expect coincidental stop codons because there are only three possibilities, and even these do not occur at subequal frequencies in pseudogenes.97
Nucleotide substitutions in pseudogenes, far from qualifying as ‘shared mistakes within the shared mistakes (pseudogenes)’, often contradict evolutionary schemes. The alpha-1,3-GT pseudogene, for instance, includes a nucleotide substitution at position 726 which is uniquely shared by cows, squirrel monkeys and gorillas.98 In the alpha-1,2-fucosyltransferase pseudogene,99 at position 258, the human and orang uniquely share a C, while chimp and gorilla uniquely share T. The rat and chimp uniquely share C at position 55 in the GLO pseudogene.100 Many nucleotide substitutions in the long Eta-globin pseudogene are either apomorphic or phylogenetically discordant.101 Orthologous Alu units of even closely related primates (e.g. humans and chimps) frequently exhibit considerable variance in nucleotide positions.87
Indels don’t fare much better, evolutionarily speaking. One can examine the 25,689 bases of the primate Beta-globin cluster (of which nearly half is the Eta-globin pseudogene) and quickly see that the vast majority of indels in the entire sequence are apomorphies. Furthermore, there are so many indels in the whole nearly-26 kb sequence [tabulated elsewhere22] that large ‘holes’ (Fig. 4) exist in the claimed sequence alignment of primates’ DNA. Still other indels are phylogenetically discordant. Although these include individual repetitive nucleotides, this fact must be put in perspective: some form of repetition is prevalent throughout even coding sequences.102
Elsewhere, a CYP chimp pseudogene has an 8 bp deletion not shared with its orang-utan, gorilla or human orthologous pseudogenes.94 A TPI chimp pseudogene has a long insertion* not found in its human orthologous pseudogene,103 while a DRB6 chimp pseudogene contains two insertions not shared with its human orthologous pseudogene.104 Not to be outdone, the gorilla ADPRTP1 pseudogene has a 30 bp duplicated region absent from its human orthologous pseudogene.105 In another instance, we observe a unique 6-base deletion/substitution sequence in the SHMT pseudogene undergoing a phylogenetic somersault: it is absent in the (ancestral) New World monkeys, present in the (more derived) Old World monkeys, and then is absent once again in the (most highly derived) apes and humans.106
Whether or not they occur only in pseudogenes, numerous molecular ‘shared events’ (mistakes or not), once considered virtually foolproof ‘perfect markers’ of evolutionary relatedness, have fallen victim to contrary evidence:
‘Nonetheless, almost every new molecular approach to phylogenetic inference has been ballyhooed as capable of “revolutionizing” the field … . Similar claims have been made for other kinds of data in the past. For instance, DNA-DNA hybridization data were once purported to be immune from convergence, but many sources of convergence have been discovered for this technique. Structural rearrangements of genomes were thought to be such complex events that convergence was highly unlikely, but now several examples of convergence in genome rearrangements have been discovered. Even simple insertions and deletions within coding regions have been considered to be unlikely to be homoplastic, but numerous examples of convergence and parallelism of these events are now known. Although individual nucleotides and amino acids are widely acknowledged to exhibit homoplasy, some authors have suggested that widespread simultaneous convergence in many nucleotides is virtually impossible. Nonetheless, examples of such convergence have been demonstrated in experimental evolution studies.’58
Of course, evolutionists still have faith (sic) in most if not all of these molecular markers. But they can hardly maintain any longer that common evolutionary descent is required to explain such things as ‘shared mistakes’.
|Autosomal pseudogene sequence:||Phylogeny:|
It has been asserted6 that evolutionary trees constructed on the basis of DNA similarities ‘agree remarkably well with the evolutionary trees derived earlier from anatomic similarities’. This statement is egregiously untrue. If anything, primate phylogenies are in a mess as a result of major contradictions between molecular and morphological data.57,107,108 Consider some recent craniodental data, which is very robust, statistically speaking. In a virtual mockery of pseudogene-based phylogenies (Fig. 2, Table 2), humans branch off first, followed by chimps, and finally a gorilla-orang clade*.108 (Gibbon was not considered in this study.)
Pseudogene-derived phylogenies are not even consistent with each other (Table 2). A common rationalization6 would have us believe that any difficulties in resolving the human-chimp-gorilla trichotomy have no impact on the validity of evolutionary theory itself. But consider the original prediction:
‘High expectations were placed on molecular methods, when these were first introduced, as to their power to resolve the trichotomy problem.’107
It is transparent special pleading to exalt molecular methods when they are predicted to support evolutionary notions, and then turn around and say that they are no threat to evolutionary theory when they fail! And, regardless of any post hoc rationalization invoked by the evolutionist to try to discredit it, the prima facie evidence (Table 2) refutes the claim that pseudogenes qualify as unambiguous shared mistakes among primates.
Of course, such inconsistencies are not limited to the H-C-G trichotomy. Barriel109 recently compared the previously-discussed Beta-globin data101 with 75 morphological elements, from another study, in order to construct a general primate phylogeny. The two data sets were found to conflict with each other, and so were ‘reconciled’ by being pooled together. The morphological data alone had placed the orang-utan as the sister group of the Homo/Pan/Gorilla clade (as in Fig. 2), but the pooled data displaced orang with the gibbon. In another study,110 Alu sequences were cited in support of the tarsier as the sister group of the anthropoid apes (and man), but this was acknowledged to contradict other phylogenies which place tarsiers elsewhere in the primate evolutionary tree. Overall, primate phylogenies constructed on the basis of retropseudogenes are not even confirmed by those based on other retroposons, the latter of which exhibit considerable phylogenetic conflicts among just themselves.111
Phylogenies based on ‘shared mistakes’ are not, of course, limited to primates, and the origin of whales has received much attention.6 Yet there are widely divergent phylogenetic inferences based on different lines of evidence.112 As usual, much evidence contradicting evolutionary relatedness is disregarded by the standard attribution to convergence. Apropos to the unconventional hippo-cetacean clade controversy, we are now in the proverbial situation of an irresistible force (pro: SINEs) encountering an immovable object (con: very strong skeletal evidence113). While some evolutionists insist that a favoured line of evidence trumps any dissenting evidence, other evolutionists warn against making such an assumption.114
All of the myriad problems with ‘convergent’ evolution, both molecular and morphological, are much too pervasive to be wished away as unimportant. If organic evolution is science, in the Popperian sense, and therefore subject to potential falsification, evolutionists must eventually acknowledge the fact that the overall profusion of divergent and contradictory phylogenies, pertaining to all forms of life, falsify macroevolution itself.
Phylogenetically-shared pseudogenes, as ‘shared mistakes’, have been compared to plagiarized written errors.6 A defendant was convicted of plagiarism by a court which recognized that, whereas similarity in books’ contents is to be expected from independently-acting authors writing about the identical topic, the same cannot be said about exact written errors. But this, of course, assumes the essential random nature of such errors, with concomitant extreme improbability of independent duplication. The court in question would have seen things differently had the ‘duplicated’ errors actually been only partly coincident from one book to another, especially if it was discovered that similar writing errors could arise independently after all.115 I will show that both considerations are very much applicable to pseudogenes.
Figure 5 illustrates a retropseudogene insertion in its genomic context. In contrast to the assertion that processed pseudogenes are inserted at random locations into DNA,6 Miyamoto116 concludes that the tacit belief in the randomness of SINE insertion into the genome is ‘the least convincing assumption’ related to their role as phylogenetic markers. He cites evidences which show that specific target-site selection by retroelements is common. Let us develop this further, examining progressively finer levels of nonrandomness.
To begin with, lengthy Alu-barren intervals of host DNA are much more common than can be accounted for by a model which assumes constant probability of Alu insertion.117 It is hardly surprising that the density of Alu repeats, per kb of host DNA, varies widely according to location in the genome.118 Furthermore, Alu units often occur in clusters,119 even to the point of aggregating at almost the same orthologous position in different animals.120They are often found inserted, at the same spot, into each other.121,122 Evidence that the same site in the same primate is invaded repeatedly by Alus recognizably indicates that these are hotspots for Alu insertion,122 and the same holds for L1 insertions.123
The vast majority of Alus are located in the richest 40% (in terms of G+C) host DNA,124 and a disproportionate share of these insertions occur into 40–46% G+C host DNA.125 Both the tail and target regions are strongly enriched in A.126 There exists an astonishing positive correlation between (G+C) and CG-dimer* levels in Alus, or CG-dimer islands, and the (G+C) levels in the host DNA.127
The polynucleotide sequences located upstream some 10–20 sites from inserted Alu repeats and other retropseudogenes, are strongly biased towards certain hexamers*,128 and the same holds for L1 elements.129
Out of the 1024 (45) possible patterns of pentanucleotides* observed upstream from Alu repeats, only three of these are by far the most frequent.130 These, and successive, observations are recognized as evidence suggesting,131 and even indicating,132 site-specific insertions for retropseudogenes.
There exists a higher level of nonrandomness, one that is largely independent of, and therefore superimposed upon, the departures from randomness discussed thus far. Alu units are found concentrated in mitotic hotspots, early-replicating chromosomal bands, and other genomic locations.133 Moreover, the insertion of both LINEs and SINEs are believed to be strongly governed by the timing of chromosomal events.134 Locally, SINEs are believed to insert into existing breaks in the host DNA.135 Finally, experimental evidence136,137 demonstrates that there are very specific cleavage hotspots, for retropseudogene insertion, in bent or coiled DNA. All of these observations indicate that the widespread independent acquisition of interspersed elements (including retropseudogenes) is a workable proposition.
Can retropseudogenes be directly acquired by one individual organism from another? Some6 try to belittle the fact of horizontally-transmitted* genetic information as much as possible. But the list of known or strongly-suspected instances27 is now too large to be swept under the rug. Newer examples include the surprising discovery of SINE elements shared by distantly-related salmonid species,138 as well as between such evolutionarily-distant creatures as rodents and squids.15 There are also horizontally-shared LINE elements between vertebrate classes.139
It is not difficult to envision parallel occurrences of ‘shared mistakes’ because, as we have seen, coincidences between orthologous pseudogenes of different primates are, as a whole, very inexact. Also, as shown below, the similarities between indisputably unrelated pseudogenes is astonishing, and this indicates that only a limited number of degrees of freedom exist by which any given pseudogene can potentially differ from its paralogous gene, paralogous pseudogene(s), and/or orthologous pseudogene(s).
Consider some additional constraints: the DNA ‘alphabet’ consists of only 4 letters (bases), and the abundances of each nucleotide usually differ significantly from 25%,140 regardless of the etiology of the DNA sequence. Most pseudogenes, in comparison with their coding paralogs, are enriched in the following order: A>T>G>C.51 The same holds for Eta-globin pseudogene orthologs that are ‘progressively older’ insofar as they are shared by progressively more kinds of primates.141 Likewise, the inferred ‘mutational decay’ of AS pseudogenes shows a striking parallel pattern of nucleotide substitutions in different paralogous AS pseudogenes.92
Overall, transitional* nucleotide substitutions occur nearly twice as often as predicted by chance in pseudogenes.142 And, if there is a single base which differs from a consensus of 4 other orthologs, this nonconforming base is very likely to be a transition instead of a transversion*.143 Nor are the bases serially independent. For instance, if its right-side neighbour is G, the nucleotide C is particularly prone to vary, from pseudogene to pseudogene, as a transition.97 Nucleotide triplets also occur at strongly nonrandom frequencies.51
As with the example of lightning proved to strike twice, once it is shown that pseudogene alterations can happen independently but coincidentally, ‘shared mistakes’ no longer compel shared evolutionary ancestry. Evolutionists try to get around this by now arguing that genuine synapomorphies* invariably outnumber convergent ones. In most instances, this is a theory-driven assumption, because:
‘One can never tell whether two taxa share a nucleotide state by descent (homology) or chance (analogy).’71
More important, the common supposition that convergent molecular events occur too sporadically or disjointedly to account for the parallel deployment of ‘shared events’ (mistakes or not), in different organisms, is decisively contradicted by recent experimental evidence. Independent nucleotide substitutions144 and indels145,146 can occur in a sufficiently concerted manner to completely obscure accepted ancestor-descendant relationships.
The following is a rigorous example of evolutionists attempting to screen out the effects of convergence. This study101 involved an examination of the 17.2 kb sequence of the Eta-globin pseudogene that is shared by humans, chimps and gorillas. Among nucleotide substitutions, 12 parallel transitions and 7 transversions unique to human and chimps were found, compared to only 3 total substitutions exclusively shared by humans and distantly-related monkeys. Assuming a random distribution of substitutions, statistical analysis indicated that, at most, 7 of the 12, and 1 of the 7, of the said human-chimp synapomorphies could have arisen fortuitously. But such results do not compel an evolutionary origin because:
‘Naturally, these apparent synapomorphies could still have arisen separately under nonrandom conditions (e.g. if there were selective pressure in two species to preserve the same change, or a propensity of a nucleotide at a particular position to mutation in a particular direction). The simplest explanation, however, is that these changes are actual synapomorphies.’20
Now evolution of humans and chimps from a common ancestor has never been observed; nonrandom base substitutions and conserved orthologous base positions have manifested themselves countless times (and examples of both are reported in this work). So which explanation is simpler? Furthermore, it would take only a very weak common biasing effect (that is, a tiny deviation from randomness), imposed over such a long sequence (17.2 kb) to, at minimum, make up the difference between 7 and 12, and between 1 and 7.
Consider some constraints on pseudogene variance imposed by indels. From pooled data comprising 78 pseudogenes, it is evident that deletions are much more common than insertions. The size distribution of indels is strongly skewed, with over 50% of them only one base in length, and relatively few longer than five bases.8,92 The DNA content deleted from pseudogenes is itself nonrandom, consisting preferentially of repeated elements within short simple tandem arrays.147
Finally, with so many divergent and contradictory phylogenies in existence, at least one of them is bound to fortuitously coincide with the broad outlines of pseudogene deployment, and alteration, among primates. Consider also the following:
‘… the circularity of using inferred phylogenies to infer properties of molecular evolution that themselves influenced the reconstruction.’144
The repeated independent insertion of seemingly orthologous SINE units is facilitated by the (previously noted) fact that each SINE unit can potentially differ by only a very limited degree from another such unit. Were each Alu unit very different from another such unit, the chance of coincidental similarity in different primates, without common evolutionary descent, would be extremely small. Instead, Alus display an average global similarity of 70% to each other,148 and this rises to 81–98% within each Alu family’s respective consensus sequence.149
A ‘census’ of up-to 290 base positions150 shows that insertions within Alus are very nonrandom in terms of both the insertion’s position and length. As for nucleotide substitutions, hardly any of the 290 positions display less than a 70% preference for a particular base, with most of the remaining ≤30% dominated by one ‘second choice’. In fact, 195 positions are called CONSBI (conserved before insertion) because fewer than 14% of all Alus deviate from the preferred nucleotide at these positions.151 About half of the remaining sites (23 pairs, 46 total) consist of CG doublet hotspots which are prone to mutate frequently and (phylogenetically) unpredictably from one Alu element to another.83 For this reason, many investigators disregard these in phylogenetic analyses.
Such exclusion of nucleotides, however, only raises questions about both the paralogous and orthologous (phylogenetic) significance of the remaining ones. How do we know that the other so-called informative nucleotide substitutions are not also hotspots (albeit less extreme ones)? Nucleotide substitutions would then occur independently in primates in an apparently hierarchical manner, thus creating both the ‘Alu families’ and Alu-based phylogenies, but without making the hotspot locations as obvious. The earlier-discussed evidences for concerted parallel genomic alterations make the foregoing consideration all the more plausible. Moreover, there is evidence152 that nucleotide substitutions in the L1 during replication are nonrandom.
The factors governing pseudogene deployment and alteration, from primate to primate, are highly nonrandom. Consequently, assertions about the impossibility of independent shared ‘mistakes’6 are incorrect (Fig. 6). The only way that this conclusion could be contradicted would be through the performance of very detailed statistical tests which would examine all of the relevant factors.
A valid statistical test of retrospseudogenes must, at a minimum, take into account the following:
The fundamental overall nonrandomness (i.e. 50% random similarity in bases51) of the DNA molecule itself.
The ubiquitous presence of indels and resulting subjectivity in the alignment of units.
The liberties created by the after-the-fact invocation of missing loci.
The several different levels of nonrandomness pertaining to the insertion points themselves in the genome.
The large number of ‘trials’ (for independent ‘orthologous’ insertions) created by the vast number of known SINE units.
The fudge factor created by tolerating varying and often considerable amounts of sequence differences in the flanking sequences (and flanking repeats) when accepting them as orthologous.
The limited degree by which one SINE unit can differ from another,
The nonrandomness of nucleotide substitutions, indels, etc., in the retropseudogene unit itself.
Considerations 1–3, and 7–8, must likewise be tested in a manner that is relevant to classical pseudogenes.
Until such tests are performed, and rigorously substantiate the premise that classical pseudogenes cannot possibly originate from the independent disabling of orthologous genes in different organisms, and that retropseudogenes cannot be inserted independently in the same corresponding locations in different primates, evolutionistic arguments about shared ‘mistakes’6 should not be given credence.
Not enough is yet known about eukaryotic genomes to construct a comprehensive creationist model of pseudogenes. Nevertheless, the belief that ‘pseudogenes are unequivocal support for evolution’6 is invalid. New evidence is constantly being published that weakens or invalidates one or other long-held evolutionistic beliefs about pseudogenes. Now, more than ever, it is an exciting time to be a creationist scientist.
Alu—A category of well-known SINEs. Return to text.
Antisense RNA—RNA which copies the DNA from the reverse direction. Return to text.
Apomorphy—A trait which is unique to the organism in question. It is not shared with either ‘less derived’ or ‘more derived’ organisms. Return to text.
Base—Denoting the 4 biochemicals (A—Adenine, G—Guanine, C—Cytosine, T—Thymine (U—Uracil in RNA)) that are part of a nucleotide. The information to code for proteins can be stored in sequences of bases. Return to text.
bp— Abbreviation for base-pair. Return to text.
Clade—A branching-off point of an organism or closely-related set of organisms relative to presumably-ancestral organisms. Return to text.
Convergence—The acquisition, by organisms, of shared traits independently (without having inherited them from a shared evolutionary ancestor). Return to text.
Deletion—The removal of a segment of the DNA sequence followed by reconnection of the free ends of the molecular ‘chain’. Compare Insertion. Return to text.
Dimer—An association of two Bases. Return to text.
Direct Repeats—That part of the Flanking Sequence which is duplicated prior to the insertion of the retropseudogene. See Fig. 5. The direct repeats are illustrated in italics. Return to text.
Eukaryotes—Organisms which have an organized cell nucleus. All living things, except bacteria and archarbacteria, are eukaryotes. Return to text.
Flanking Sequence—That part of the DNA ‘chain’ which immediately precedes, and immediately comes after, a retropseudogene. See Fig. 5. Return to text.
Gene Conversion—The process whereby one gene is used as a template to ‘overprint’ another. The latter thereby is forced to resemble the former. Return to text.
Hexamer—A string of six Bases. Return to text.
Homoplasy—Convergence and Parallelism. Return to text.
Homopolymer—A chain of identical bases: AAAAA … , CCCCC … , GGGGG … , or TTTTT … . Return to text.
Horizontal Transmission—The direct transmission of genetic information from one living individual to another. Return to text.
Indel—Acronym for Insertion or Deletion. See Fig. 4. Return to text.
Insertion—The addition of a new segment of the DNA sequence followed by reconnection of the free ends of the molecular ‘chain’. Compare Deletion. Return to text.
Intergenic—Occuring on the DNA molecule between genes. Return to text.
Interspersed Repeats—A group of genomic elements which occur in great profusion. Notable interspersed repeats are LINEs and SINEs. Return to text.
kb—Abbreviation for kilobase; 1000 Bases. Return to text.
L1—A group of well-known LINEs. Return to text.
LINE—Long interspersed nuclear element. A group of retropseudogenes that occur in the hundreds of thousands in the human genome, and which are typically about 7,000 bases long. Return to text.
Locus (Loci)—A specific position on a chromosome. Return to text.
Nested Hierarchy—A series of progressively narrowly-defined subsets which reflect presumably-increasing evolutionary derivation. For example, a member of the vertebrates gave rise to mammals, a member of the mammals gave rise to primates, and a member of the primates gave rise to humans. See Fig. 2 for an ‘advanced’-primate nested hierarchy. Return to text.
Nucleotide—A compound of a sugar, phosphate and base—DNA and RNA comprise of nucleotides. Return to text.
Ortholog—Gene and/or pseudogene which is a counterpart to a similar gene and/or pseudogene in another primate. An ortholog is presumed to be a copy of an ancestral gene sequence. Refer to Fig. 1. Compare Paralog. Return to text.
Parallelism—The acquisition, by organisms, of shared traits independently (without having inherited them from a shared evolutionary ancestor). See Fig. 6. Return to text.
Paralog—Copy of the same gene, pseudogene, etc. within the same organism. See Fig. 1. Compare Ortholog. Return to text.
Pentanucleotide—A chain of five Nucleotides. Return to text.
Phylogen(-ic, -y)—Related to the construction of an evolutionary ‘tree’. Return to text.
Poly-A—Consisting of many adenine bases in succession: AAAAAAAA … . Return to text.
Poly-A tail—A sequence of adenine bases at the end of an RNA molecule or a pseudogene. Return to text.
Purine—The Bases adenine (A) and guanine (G). Return to text.
Pyrimidine—The Bases cytosine (C) and thymine (T). Return to text.
Retro- (-element, -poson, -pseudogene)—A (given structure) created by the reverse transcription (in effect, ‘backfiring’) of RNA back into the host DNA. Return to text.
SINE—Short interspersed nuclear element. A group of retropseudogenes that occur in the hundreds of thousands in the human genome, and each of which is typically about 300 bases long. Return to text.
Stop codon—A triplet of Nucleotides which puts a stop to protein synthesis. Return to text.
Synapomorphy—A trait which is shared by two or more organisms, and which supposedly is the result of a recent common evolutionary ancestor. Return to text.
Tail—see Poly-A tail. Return to text.
Transition—In the DNA molecule, the replacement of one Purine by another Purine, or the replacement of one Pyrimidine by another Pyrimidine. Compare Transversion. Return to text.
Transversion—In the DNA molecule, the replacement of a Purine by a Pyrimidine, or vice-versa. Compare Transition. Return to text.
If one examines the overall percentage of the primate OR gene repertoire that is occupied by pseudogenes, one does observe a crude increase in percentage relative to increasingly-derived infra-orders of primates. But this crude progression breaks down as soon as one includes the prosimians. These least-derived primates have a percent pseudogene content which overlaps that of even the highly-derived hominoids. Rouquier, S. et al., The olfactory receptor gene repertoire in primates and mouse, Proc. Nat. Acad. Sci. USA 97:2873, 2000.
Elsewhere, an even more conspicuous absence of a common set of gene-inactivating mutations occurs in the delta-globin and psi-etaglobin pseudogenes of the Old World monkeys. Far from being ‘shared mistakes’, the various ostensibly gene-silencing frameshifts, deletions, and point mutations are each unique to rhesus, colubus, and the baboon: Vincent, K.A. and Wilson, A.C. Evolution and transcription of Old World Monkey globin genes. J. Molecular Biology 207:466, 478, 1989.