393
Views
0
CrossRef citations to date
0
Altmetric
Research Paper

Ordering events in a developing genetic code

ORCID Icon
Pages 1-8 | Accepted 20 Dec 2023, Published online: 03 Jan 2024

ABSTRACT

Preexisting partial genetic codes can fuse to evolve towards the complete Standard Genetic Code (SGC). Such code fusion provides a path of ‘least selection’, readily generating precursor codes that resemble the SGC. Consequently, such least selections produce the SGC via minimal, thus rapid, change. Optimal code evolution therefore requires delayed wobble. Early wobble encoding slows code evolution, very specifically diminishing the most likely SGC precursors: near-complete, accurate codes which are the products of code fusions. In contrast: given delayed wobble, the SGC can emerge from a truncation selection/evolutionary radiation based on proficient fused coding.

Introduction

A route to the SGC

Recent analysis suggests an evolutionary path to the Standard Genetic Code (SGC) using routine chemical-kinetic rules. It depends only on typical, explicit evolutionary events, yet realizes the SGC within a feasible protobiotic context.

The proposed path [Citation1,Citation2] employs partial codes, formed without wobble, which fuse by combining independently arising coding compartments. These independent codes may have solved different problems during their separate origins, but partial codes with compatible assignments can join to approach ‘complete’ coding (20 amino acids plus start and stop) and ‘full’ coding (all triplets assigned), as in the SGC.

Fusion of partially-assigned codes is decisive, gathering codes that function across a scattered, possibly divergent, population, even before a unified genome exists [Citation3,Citation4]. Such fusions necessarily make more competent, easily selected coding tables. Moreover, fusion has notable corrective powers, because a fusing population excludes deviation: for example, it excludes random assignments, or assignments unrelated to an underlying organizing principle [Citation2]. Because only fused codes with compatible assignments will be highly fit, the population of successfully fused codes is more homogeneous than were its separated, independent precursors. A fusing population therefore converges to a unified, common coding scheme that joins consistent assignments, common among partial ancestral codes. If nucleated by stereochemical assignments consistent with the SGC, such a fused population evolves towards the SGC.

Consistency with stereochemical origins

Therefore, an origin in which some cognate coding triplets existed as conserved functional elements in RNA binding sites for amino acids is plausible [Citation5,Citation6]. Such persistent required RNA sequences could later serve as primordial coding triplets, and would be joined in later code fusions. Triplets need not originate as contact points for bound amino acids, only being physically associated with their amino acids. This in turn is ensured by their experimentally-determined essential functions in characterized binding sites [Citation6], where their conservation and response to mutation have been established.

Building a code

It is unlikely that all such RNA-amino acid interactions have been detected by modern RNA-amino acid affinity selections conducted under laboratory conditions. So, the initial RNA foundation for the SGC may be broader than currently apparent. However, because all amino acids do not equally readily form RNA complexes, it is plausible that some SGC assignments were made later, by other means; for example, when early amino acids or peptides collaborated with primordial RNA in coding complexes. As examples, such collaboration could originate with aminoacylated RNAs [Citation7], peptidyl-RNAs formed on an oligonucleotide aminoacylation catalyst [Citation8], noncovalent RNA-peptide complexes [Citation9] or perhaps primordial ribonucleoprotein granules [Citation10]. Continuing code fusion would build such later ribonucleopeptide assignments on a pre-existing amino acid-RNA binding foundation, likely conserving early RNA assignments in the final, near-universal SGC.

Here, rate of initiation of wobble/timing of wobble appearance is varied to detect wobble’s effects on evolutionary progress, towards the SGC.

Results

Varying wobble onset

This work varies onset times for simple wobble [Citation11]. portrays fraction of codes making wobble assignments versus time. The probability of wobble onset changes: Pwob = 0.0, 0.0004, 0.001, 0.005 and 0.02/passage. A passage is the unit of evolutionary time – the interval for one step of code evolution (see Methods and [Citation14]. For example, Pwob = 0.001/passage yields a mean of 1 wobble initiation at 1000 passages. This implies that [1-e−1]= 63.2% of Pwob = 0.001 environments have begun wobble coding after 1000 passages, as for the blue curve in . Below, 63.2% wobble assignment occurs late (at infinity) down to earliest at 50 passages. Note especially that even with very slow wobble onset (e.g. 0.0004/passage in , red), significant commitment to wobble occurs early. Realistic wobble appearance is not sudden, but exponentially progresses; some code environments have begun wobble, even at early times.

Figure 1. Progressive advent of wobble coding. Mean fraction of environmental codon assignments utilizing simple Crick wobble instead of standard base pairing is plotted versus time in passages. Plots cover times discussed in the text, for pwob = 0.0 (brown), 0.0004 (red), 0.001 (blue), 0.005 (green) and 0.02 (orange) per passage. Other probabilities have standard values: pmut (related assignment [Citation12,Citation13] to unused neighbour) = 0.00975, Pdecay (loss of assignment) = 0.00975, Pinit (initial assignment) = 0.15, Prand (random assignment) = 0.05, pfus (code fusion) = 0.002, Ptab (new code arises with one assignment) = 0.08 per passage.

Figure 1. Progressive advent of wobble coding. Mean fraction of environmental codon assignments utilizing simple Crick wobble instead of standard base pairing is plotted versus time in passages. Plots cover times discussed in the text, for pwob = 0.0 (brown), 0.0004 (red), 0.001 (blue), 0.005 (green) and 0.02 (orange) per passage. Other probabilities have standard values: pmut (related assignment [Citation12,Citation13] to unused neighbour) = 0.00975, Pdecay (loss of assignment) = 0.00975, Pinit (initial assignment) = 0.15, Prand (random assignment) = 0.05, pfus (code fusion) = 0.002, Ptab (new code arises with one assignment) = 0.08 per passage.

Early wobble and complete coding

Early wobble is obstructive to code evolution (). With no wobble (Pwob = 0.0, wobble onset at indefinitely long time), mean time to evolve ≥ 20 encoded functions is about 670 passages under present assumptions (legend, ). But if instead wobble begins early (Pwob = 0.01, 63.2% at 100 passages), near-complete coding is delayed, to about 3500 passages. Wobble hindrance increases approximately linearly for slowly-appearing wobble, but reaches a limit when wobble is essentially always present (rightward, ). Below, to investigate wobble’s negative effects, conditions at the left of (no wobble, Pwob = 0.0) are compared with those rightward in (wobble appears early, Pwob = 0.02).

Figure 2. Early wobble appearance delays complete coding. Pwob (as in fig. 1) is plotted versus mean time (in passages, for 100 environments) to evolve the first code with ≥ 20 assigned functions in an environment. System constants are those listed in fig. 1. Error bars are standard errors of means.

Figure 2. Early wobble appearance delays complete coding. Pwob (as in fig. 1) is plotted versus mean time (in passages, for 100 environments) to evolve the first code with ≥ 20 assigned functions in an environment. System constants are those listed in fig. 1. Error bars are standard errors of means.

A time of optimal completeness

begins resolving time effects. It shows the fraction of near-complete, active codes (that is, unfused + successfully fused codes) in 1000 evolving code environments. Codes evolve without (blue, Pwob = 0.02, c.f. ) and with (red, Pwob = 0.0, ) simple Crick wobble [Citation14]. Under these conditions, a peak of near-complete coding systems (≥20 functions encoded) transiently evolves near 300 passages (). But even more striking, early wobble suppresses complete coding (reduces it to ≈ 18%) without greatly changing its timing. Thus, suggests an explanation for evolutionary delays in : wobble reduces the number of near-complete codes, the probable precursors of the closely-related SGC. Wobble hinders approach to the SGC. Moreover, wobble inhibition appears permanent, not overcome at later times.

Figure 3. Early wobble obstructs complete coding. Fraction of active codes in 1000 environments that have ≥ 20 assigned functions vs time in passages, for pwob = 0.0 (nowob, squares) and pwob = 0.02 (wob, circles). except for pwob, probabilities/passage are those in fig. 1. Error bars are standard errors.

Figure 3. Early wobble obstructs complete coding. Fraction of active codes in 1000 environments that have ≥ 20 assigned functions vs time in passages, for pwob = 0.0 (nowob, squares) and pwob = 0.02 (wob, circles). except for pwob, probabilities/passage are those in fig. 1. Error bars are standard errors.

Fused and unfused codes make distinguishable contributions

shows the distribution of completion among codes active during peak environments (300 passages, ). plots the frequency of each such code versus its number of encoded functions. Active unfused codes with only 2 functions are most frequent (blue, at left), and the relative abundance of more complete codes declines regularly, though codes with 20 and 21 functions exist at low levels among environments at 300 passages (blue, at right).

Figure 4. Early wobble specifically disrupts fusion to more complete, SGC-like codes. A – fused and unfused active codes are distinct. Mean environmental frequencies versus number of encoded functions in 1000 environments after 300 passages. Pwob = 0.0, and other probabilities/passage as in fig. 1. Unfused codes (unfus) are red, successful fusions (fused) are grey, and total active codes (sum of unfused and active fusions) are blue. B – frequencies of fused and unfused (labelled nofus) active codes versus number of assignments, with (labelled wob, solid lines, pwob = 0.02) and without (labelled nowob, dashed lines, pwob = 0.0) wobble. From 1000 environments after 200 passages; other probabilities as in fig. 1. C – frequencies of fused and unfused active codes versus number of assignments, with (marked wob, solid lines, pwob = 0.02) and without (marked nowob, dashed lines, pwob = 0.0) wobble. From 1000 environments after 300 passages; other probabilities as in fig. 1. Horizontal arrows mark codes altered by wobble (≥15 encoded functions) and closest to the SGC (≥20 encoded functions). D – frequencies of fused and unfused (labelled nofus) active codes versus number of assignments, with (solid lines, pwob = 0.02) and without (dashed lines, pwob = 0.0) wobble. From 1000 environments after 500 passages; other probabilities as in fig. 1.

Figure 4. Early wobble specifically disrupts fusion to more complete, SGC-like codes. A – fused and unfused active codes are distinct. Mean environmental frequencies versus number of encoded functions in 1000 environments after 300 passages. Pwob = 0.0, and other probabilities/passage as in fig. 1. Unfused codes (unfus) are red, successful fusions (fused) are grey, and total active codes (sum of unfused and active fusions) are blue. B – frequencies of fused and unfused (labelled nofus) active codes versus number of assignments, with (labelled wob, solid lines, pwob = 0.02) and without (labelled nowob, dashed lines, pwob = 0.0) wobble. From 1000 environments after 200 passages; other probabilities as in fig. 1. C – frequencies of fused and unfused active codes versus number of assignments, with (marked wob, solid lines, pwob = 0.02) and without (marked nowob, dashed lines, pwob = 0.0) wobble. From 1000 environments after 300 passages; other probabilities as in fig. 1. Horizontal arrows mark codes altered by wobble (≥15 encoded functions) and closest to the SGC (≥20 encoded functions). D – frequencies of fused and unfused (labelled nofus) active codes versus number of assignments, with (solid lines, pwob = 0.02) and without (dashed lines, pwob = 0.0) wobble. From 1000 environments after 500 passages; other probabilities as in fig. 1.

Total codes can be divided into two groups with differing fusion histories. A mostly younger group has never fused (red, ). Codes with few functions are mostly unfused, and they contribute little or nothing to most complete codes. A second group has undergone successful fusions (grey, ) and so has gained encoded functions. These fused codes account for almost all near-complete coding (compare blue and grey, [Citation2]. Thus, fused rather than unfused codes are plausible SGC precursors [Citation1]. So: the near-complete code peak formed in the absence of wobble () is populated predominantly by fused codes from the region where blue (all codes) and grey (fused) codes converge (). We can sharpen ’s conclusion by asking: what is wobble’s effect on near-complete fused codes?

A specific wobble effect on near-complete coding

These data are in (200 passages), 4C (300 passages) and 4D (500 passages). These three populations show competence early in the fusion era (200 passages), pass through the peak of completeness (300 passages) to environments likely to be past the point at which an SGC-candidate would have been selected (500 passages). ’s distributions each show a distribution of coding completeness, paralleling the frequency vs assignments presentation of . However, present only data for active unfused and fused codes, but plotted together so non-wobbling (dashed lines) and early wobbling populations (solid lines) can be compared.

The similarities in these figures strengthen conclusions below about the likely path of code evolution. all have the same ordinate, so that decrease in active codes as unsuccessful code fusions accumulate is evident in ’s successive panels. Code fates across the peak of complete codes can be described simply: there is little reproducible change among unfused codes when early wobble occurs. Unfused codes with their few assignments are mostly unchanged by wobble, and this is particularly evident among the most competent unfused codes with many functions encoded.

But fused codes differ greatly from the unfused. In fact, fusion is nearly universal for all codes with ≥ 20 functions, at 200, 300 and 500 passages (). Inhibition by early wobble is observed at all times within the code completeness peak (). Early wobble selectively quenches production of codes with ≥ 15 functions. And wobble’s inhibition becomes more extreme if more complete encoding is required. At 200 passages, successful fusions with ≥ 20 functions are decreased 4.4-fold by early wobble (Pwob = 0.02, ), at 300 passages decreased 5.5-fold and at 500 passages these SGC-like codes are decreased 8.8-fold (). Selection of a complete SGC from almost-complete codes will be slowed by wobble, and SGC appearance will be relatively slower the later selection occurs ().

Wobble and accurate assignment

Complete assignment of SGC-like functions has so far been emphasized. But there is a parallel wobble effect on code accuracy. Fusing codes with differing assignments for the same triplet produces ambiguous translation, making an ambiguous code fusion less fit. Selection presumably eliminates these discordant codes [Citation2]. Successful fused codes are therefore more homogeneous than their unfused parents, converging on coding assignments common in the initial unfused codes. Convergence to a unique SGC is plausible.

In , wobble coding accuracy is explored in 1000 environments at 300 passages, representing the most likely time for a near-complete code (). Only active, unfused coding tables are represented. In order to summarize the inhibitory wobble effect, the entire region altered by early wobble is averaged (, ≥ 15 functions encoded). Such averages usefully summarize distributions; however, note that such distributions extend to near-complete codes, some identical to the SGC [Citation2].

Figure 5. The most complete codes in 1000 environments at 300 passages with (wob) and without wobble (nowob), have different accuracies. Frequency among active codes of the most accurate coding (0, 1, 2 or 3 misassignments – mis0, 1, 2 or 3, respectively), with (pwob = 0.02) and without (pwob = 0.0) early wobble. Numbers (e.g. 5.6×) are frequencies for non-wobbling codes divided by frequencies of codes with wobble. Error bars are standard errors for plotted frequencies. Environmental events, save for pwob, have the probabilities in fig. 1.

Figure 5. The most complete codes in 1000 environments at 300 passages with (wob) and without wobble (nowob), have different accuracies. Frequency among active codes of the most accurate coding (0, 1, 2 or 3 misassignments – mis0, 1, 2 or 3, respectively), with (pwob = 0.02) and without (pwob = 0.0) early wobble. Numbers (e.g. 5.6×) are frequencies for non-wobbling codes divided by frequencies of codes with wobble. Error bars are standard errors for plotted frequencies. Environmental events, save for pwob, have the probabilities in fig. 1.

Early wobble decreases accuracy

concentrates on the most likely precursors for selection of the SGC; that is: codes with one (mis1), two (mis2) or three (mis3) misassignments, or identical (mis0) to the SGC. Addition of wobble, here and elsewhere, always decreases resemblance to the SGC. That is, wobbling modes (wob) always are further from the SGC than nowob. Among these SGC-like codes, the numbers in (e.g. 5.6×) are the fold decrease in accurate coding: that is, integrated abundance of codes identical to the SGC is decreased 5.6-fold by early wobble. This factor declines as distance from the SGC increases – wobbling codes with three differences from the SGC are 2.3-fold fewer than those with postponed wobble. Wobble is more detrimental, the more accurate an SGC precursor is. SGC resemblance among fusing codes will most readily evolve by avoiding wobble while complete coding is being attained.

Discussion

Evolving toward the SGC: completeness

Simplified Crick Wobble (only G/U and U/G pairing, [Citation14]) has strong effects if it is probable and becomes frequent early () during codon assignment in primordial codes. Early wobble can prolong near-complete codon assignment by a factor of 5 or more (). This negative effect is quantitatively explained () by early wobble’s inhibition of highly complete, proficient code formation arising from early fusion.

Wobble inhibition can be further resolved by the distribution of assignments among nascent codes (). Active codes are either yet unfused (red, ) or successfully fused (grey, ). Many other codes are fused and ambiguous: thus lost, and not further discussed here [Citation2]. Unfused codes dominate coding with few assignments; in contrast, fused codes with summed assignments almost exclusively account for the most complete codes (). When the peak of complete coding () is examined, wobble specifically alters codes completed by fusion (). Consequently, wobble’s obstruction occurs during code fusions. Wobble specifically depresses formation of more complete codes (≥15 functions, marked in ), depressing the likely precursors of the SGC. Thus, early wobble prevents progress towards code completion, probably because assignments enlarged by wobble present a larger target for conflict between fusing codes. This specifically removes more complete wobbling codes from the fused population, particularly depleting codes that already have many assigned functions ().

Evolving toward the SGC: accuracy

Discussion so far is about completing codon assignments. But there is a second wobble effect, on the accuracy of coding. Code fusion not only sums assignments to yield more complete codes, it also causes codes to converge on assignments shared among fusing codes [Citation2]. In fact, fusion creates a prolonged crescendo of improving adherence to shared assignments. Early wobble obstructs this prolonged convergence to assignments with similar origins.

illustrates this second result, showing that among active codes, the four most SGC-like code inventories (mis0, 1, 2 and 3) are all depressed by early wobble. also shows that wobble’s accuracy penalty increases as the SGC is approached. Evolution would likely select a code which was near-complete () if it could do so without compromising approach to a single code [Citation15]. The most homogeneous codes will evolve when wobble arises late.

Late Crick wobble was initially suggested because it helps fill the coding table [Citation14] and helps join sections of fused codes [Citation1]. Present work adds that late wobble helps complete SGC coding () and also helps converge to a uniform set of assignments, as in the SGC ().

In a simpler coding system, previously studied [Citation16], there was a similarly-timed early optimum for SGC selection, resulting from increase in completeness with time, along with decrease in accuracy with time. Here, codes evolving via fusion again display an optimal time for SGC selection, but with a newfound optimum in completeness (), considered alone.

Accurate modern wobble requires multimolecular RNA mechanics

To prevent wobble at first and second codon positions, elaborate multimolecular structures are required. For example, most mutations of the nt 27–43 base pair at the top of the tRNATrp anticodon hairpin increase first-position wobble, with effects up to 40-fold [Citation17]. Mutations throughout central structures of tRNAAla increase both first and second codon position wobble [Citation18], as well as improper third position wobble [Citation19]. Thus, a specific tRNA structure has likely evolved to regulate first, second and third position wobble. As emphasized in previous exposition, there are also multiple tertiary ribosomal RNA structure checks on the shape of the codon-anticodon complex [Citation20,Citation21]. Thus, to complement full coding, complete coding, and accurate coding arguments made here, there is a compelling structural argument that accurate third position wobble would have been adopted late, after a sophisticated translation apparatus had evolved to improve simple base-pairing. Using the structural argument, earlier calculations implemented wobble as a late event [Citation2,Citation14].

Origins of code order

Coding table structure here is attributable to a small number of captures of related codons (via Pmut [Citation14]: and to convergence in assignments due to fusion [Citation2], based on extension from a subset [Citation22,Citation23] of ancient stereochemical assignments due to amino acid:RNA interactions [Citation5,Citation6]. Such stereochemical relations may persist today in regulatory roles, for example, in modern RNA-binding proteins [Citation24]. However, this should not be read as an argument against selection against errors [Citation25,Citation26] or against a code co-evolving with amino acid synthesis pathways [Citation27–30]. Instead, the contrary seems more plausible: a code fused from diverse origins is consistent with varied sources of order in a code assembled from fragments. Diverse SGC ordering principles are also consistent with code regions with varied physicochemical patterns, e.g. within different code columns [Citation31].

Selection of the historical code

Fusion of nascent partial codes notably offers a persistent series of SGC-like codes, termed the ‘crescendo’ [Citation2], ), for selection as the historical Standard Genetic Code. But what bridged the gap between these long-lasting, varied, SGC-like, late-wobbling precursors and the historical SGC? Because of the crescendo, the final selection plausibly was undemanding – it was a ‘least selection’ [Citation32] from a highly fit subset within a broad code distribution [Citation14] (). About 4% of active non-wobbling codes, ≥ 15 functions, are identical to the SGC at 300 passages (): this abundance seems favourable for subsequent SGC emergence.

Selections for favoured characteristics are of different kinds (reviewed in [Citation32]. The more effective is truncation selection: accepting only entities better than a threshold [Citation33]. Truncation, uniquely, makes improvements of arbitrary size, given large populations. Further, there is a subtler form of truncation that is frequent, sometimes called an ‘evolutionary radiation’ [Citation32]. In such a radiation, minority possessors of some quality become a dominant biota. All radiations are truncations also, thereby associated with rapid evolution. It seems plausible that more complete late-wobbling codes described here supported faster growth based on more varied, accurate protein synthesis. Accordingly, truncation selection during a translation-sustained radiation is a plausible founding event for the SGC.

Biology as anthology

Other similar paths to biological complexity appear likely. Functional parts arising separately, selectively combined after fusion, genetics or horizontal gene transfer, may provide a general route to multipart, multifunctional competence. This kind of multiply-selected evolutionary progress is deeply characteristic of biological systems. Such unions define biology, including evolution by natural selection as an example. Accordingly, living systems are those that preserve and readily conjoin advances arising separately (compare [Citation34]. But anthology does not offer costless progress; it imposes constraints on concurrent evolutionary events, like wobble coding ().

Methods

Calculation

Evolutionary time is calculated in passages. In one passage: the model [Citation14] allows a randomly-chosen codon in each existing coding table to do one and only one of the following, with the probability Pnnn: make a new assignment (Pinit) of a random function (Prand) or an SGC function and if wobble has begun (Pwob) also assign the additional simple wobble (only third position G:U and U:G pairs allowed [Citation14] triplet, or an assignment may decay (Pdecay), or an occupied triplet may capture an unassigned neighbour related by a single mutation (Pmut) and assign its existing function or alternatively, an amino acid with a closely related polar requirement [Citation12,Citation13]. At each passage, there is a probability that persistent wobble coding will arise (Pwob). Because all events have finite probabilities, a passage can also see no change at all [Citation14].

Coding tables arise and passage in an environment [Citation2]: at each passage, a new coding table can appear (Ptab). After multiple coding tables exist in an environment (possibly tens or hundreds ultimately exist), two randomly chosen tables can fuse their coding assignments (Pfus). If the fused codes have (a) conflicting codon assignment(s), both fusing codes do not evolve further. These unsuccessful fusions are usually ignored by including only active tables (unfused plus successfully fused) in summary calculations. Fusion initially increases more-than-linearly with time [Citation2] because of the accumulation of possible fusion partners (via Ptab). However, later coding evolution (e.g, ) occurs in a pseudo-steady state with a nearly invariant number of still-evolving coding tables.

A sufficient number of environments are followed (possibly hundreds or thousands) to get a precise idea of the effect of changes (that is, to get a sufficiently small standard error for some mean environmental quantity). Evolution can stop at varied points: for example, when one coding table in the environment becomes complete or near-complete. Full details of coding history (including coding tables) in each environment can be recorded, and potentially available for analysis. But usually, only a quantity from the most complete codes, or the average of all environments at a given passage, are of interest. Calculated output is therefore limited and selected to emphasize data applicable to a current question. Examples of code in Pascal (executed in the developmental environment; Lazarus v 2.2.ORC1, Free Pascal 3.2.2 under 64-bit Windows 10) and of analysis in Microsoft Excel (2016) are available on request.

This numerical Monte Carlo kinetic model for code evolution has been diagrammed in standard computing notation [Citation14], code fusion logic has been described [Citation2], sample numerical [Citation35] and coding table [Citation14] output is available, and integration of these findings with SGC history was proposed [Citation36].

Assignments

Pinit is reduced here to ⅟₄ that initially used for calculations [Citation14], so evolutionary time in passages is slowed, very approximately, four-fold. However, calculations assume only one or no coding event per passage, so slowed Pinit yields a more accurate estimate of kinetics. However, Pinit here is the same as in initial analyses of fusion [Citation1,Citation2] and those results should be quantitatively similar.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

The author(s) reported there is no funding associated with the work featured in this article.

References