Most living organisms rely on double-stranded DNA (dsDNA) to store their genetic information and perpetuate themselves. This biological information has been considered as the main target of evolution. However, here we show that symmetries and patterns in the dsDNA sequence can emerge from the physical peculiarities of the dsDNA molecule itself and the maximum entropy principle alone, rather than from biological or environmental evolutionary pressure. The randomness justifies the human codon biases and context-dependent mutation patterns in human populations. Thus, the DNA ‘exceptional symmetries,’ emerged from the randomness, have to be taken into account when looking for the DNA encoded information. Our results suggest that the double helix energy constraints and, more generally, the physical properties of the dsDNA are the hard drivers of the overall DNA sequence architecture, whereas the selective biological processes act as soft drivers, which only under extraordinary circumstances overtake the overall entropy content of the genome.
The biological information contained within a dsDNA genome, in terms of a linear sequence of nucleotides, has been traditionally considered as the main target of selective pressures and neutral drift [1–3]. However, in this information-centered perspective, certain emerging traits of the genetic code, such as symmetries between nucleotides abundance [4–7], codon preferences [8,9] and context-dependent mutation pattern , are difficult to explain. In 1950, Erwin Chargaff made the important observation that the four nucleotides composing a double helix of DNA (adenine, A; cytosine, C; guanine, G and thymine, T) are symmetrically abundant  (number of A = number of T and number of C = number of G). This symmetry, named Chargaff’s first parity rule, played a crucial role in the discovery, in 1953, of the double helix structure of DNA [12–14]. In 1968, Chargaff extended his original observation into the Chargaff’s second parity rule [15–17], which states that the same sets of identities found for a double helix DNA also hold on every single strand of the same molecule. In other words, in every single strand of a dsDNA genome, the number of adenines is almost equal to the number of thymines and the number of guanines is almost equal to the number of cytosines. This rule does not hold for single-stranded DNA (ssDNA), and it has been found to be globally valid for all the dsDNA genomes with the exception of mitochondria [18,19]. An updated confirmation of these previous observations is reported in Figure 1 and in Supplementary Table 1S available online at https://academic.oup.com/bib, based on all reference genomes downloaded from the NCBI repository (ftp://ftp.ncbi.nlm.nih.gov/genomes/).