Abstract
Recent studies of DNA sequence of letters A, C, G and T exhibit the inverse power law form 1/f a frequency spectrum where f is the frequency and a the exponent1-5. Inverse power-law form of the power spectra of fractal space-time fluctuations is generic to the dynamical systems in nature and is identified as self-organized criticality6-9. In this study it is shown that the power spectra of the frequency distributions of bases A, C, G, T in the Human chromosome 1 DNA exhibit self-organized criticality. DNA is a quasicrystal possessing maximum packing efficiency10 in a hierarchy of spirals or loops. Self-organized criticality implies that non-coding introns may not be redundant, but serve to organize the effective functioning of the coding exons in the DNA molecule as a complete unit.
DNA topology is of fundamental importance for a wide range of biological processes11. Since the topological state of genomic DNA is of importance for its replication, recombination and transcription, there is an immediate interest to obtain information about the supercoiled state from sequence periodicities12, 13. Identification of dominant periodicities in DNA sequence will help understand the important role of coherent structures in genome sequence organization14, 15. Li16 has discussed meaningful applications of spectral analyses in DNA sequence studies. Recent studies indicate that the DNA sequence of letters A, C, G and T exhibit the inverse power law form 1/f a frequency spectrum where f is the frequency and a the exponent. It is possible, therefore, that the sequences have long-range order1-3, 17-19. Power spectra of fractal space-time fluctuations of dynamical systems such as fluid flows, stock market price fluctuations, heart beat patterns, etc., exhibit inverse power-law form identified as self-organized criticality6 and represent a selfsimilar eddy continuum. A general systems theory7-9 developed by the author shows that such an eddy continuum can be visualised as a hierarchy of successively larger scale eddies enclosing smaller scale eddies. Since the large eddy is the integrated mean of the enclosed smaller eddies, the eddy energy (variance) spectrum follows the statistical normal distribution according to the Central Limit Theorem20. Hence the additive amplitudes of eddies, when squared, represent the probabilities, which is also an observed feature of the subatomic dynamics of quantum systems such as the electron or photon21-23. The long-range correlations intrinsic to self-organized criticality in dynamical systems are signatures of quantumlike chaos associated with the following characteristics: (a) The fractal fluctuations result from an overall logarithmic spiral trajectory with the quasiperiodic Penrose tiling pattern7-9 for the internal structure. (b) Conventional continuous periodogram power spectral analyses of such spiral trajectories will reveal a continuum of wavelengths with progressive increase in phase. (c) The broadband power spectrum will have embedded dominant wavebands, the bandwidth increasing with wavelength, and the wavelengths being functions of the golden mean. The first 13 values of the model predicted7-9 dominant peak wavelengths are 2.2, 3.6, 5.8, 9.5, 15.3, 24.8, 40.1, 64.9, 105.0, 167.0, 275, 445.0 and 720 in units of the block length 10bp (base pairs) in the present study. Wavelengths (or periodicities) close to the model predicted values have been reported in weather and climate variability8, prime number distribution24, Riemann zeta zeros (non-trivial) distribution25, stock market economics26. (d) The conventional power spectrum plotted as the variance versus the frequency in log-log scale will now represent the eddy probability density on logarithmic scale versus the standard deviation of the eddy fluctuations on linear scale since the logarithm of the eddy wavelength represents the standard deviation, i.e. the r.m.s (root mean square) value of the eddy fluctuations. The r.m.s. value of the eddy fluctuations can be represented in terms of statistical normal distribution as follows. A normalized standard deviation t=0 corresponds to cumulative percentage probability density equal to 50 for the mean value of the distribution. For the overall logarithmic spiral circulation the logarithm of the wavelength represents the r.m.s. value of eddy fluctuations and the normalized standard deviation t is defined for the eddy energy as
The parameter L in Eq. 1 is the wavelength and T50 is the wavelength up to which the cumulative percentage contribution to total variance is equal to 50 and t = 0. The variable logT50 also represents the mean value for the r.m.s. eddy fluctuations and is consistent with the concept of the mean level represented by r.m.s. eddy fluctuations. Spectra of time series of fluctuations of dynamical systems, for example, meteorological parameters, when plotted as cumulative percentage contribution to total variance versus t follow the model predicted universal spectrum8.
The Human chromosome 1 DNA base sequence was obtained from the entrez Databases, Homo sapiens Genome (build 30) at http://www.ncbi.nlm.nih.gov/entrez. The first 10 contiguous data sets consisting of a total number of 9931745 bases were scanned to give a total number of 280 unbroken data sets of length 35000 bases each for the study. The number of times that each of the four bases A, C, G, T occur in successive blocks of 10 bases were determined giving 4 groups of 3500 frequency sequence values for each data set.
The power spectra of frequency distribution of bases were computed accurately by an elementary, but very powerful method of analysis developed by Jenkinson (1977)27 which provides a quasi-continuous form of the classical periodogram allowing systematic allocation of the total variance and degrees of freedom of the data series to logarithmically spaced elements of the frequency range (0.5, 0). The cumulative percentage contribution to total variance was computed starting from the high frequency side of the spectrum. The power spectra were plotted as cumulative percentage contribution to total variance versus the normalized standard deviation t. The average variance spectra for the 280 data sets and the statistical normal distribution are shown in Fig. 1 for the four bases. The 'goodness of fit' (statistical chi-square test) between the variance spectra and statistical normal distribution is significant at less than or equal to 5% level for 98.6, 99.3, 98.9, 97.9 percent of the 280 data sets respectively for the four bases A, C, G and T. The average and standard deviation of the wavelength T50 up to which the cumulative percentage contribution to total variance is equal to 50 are also shown in Fig. 1. The power spectra exhibit dominant wavebands where the normalized variance is equal to or greater than 1. The dominant peak wavelengths were grouped into 13 class intervals 2 - 3, 3 - 4, 4 - 6, 6 - 12, 12 - 20, 20 - 30, 30 - 50, 50 - 80, 80 – 120, 120 – 200, 200 – 300, 300 – 600, 600 - 1000 (in units of 10bp block lengths) to include the model predicted dominant peak length scales mentioned earlier. Average class interval-wise percentage frequencies of occurrence of dominant wavelengths are shown in Fig. 2 along with the percentage contribution to total variance in each class interval corresponding to the normalised standard deviation t computed from the average T50 (Fig. 1) for each of the four bases.
The variance spectra for almost all the 280 data sets exhibit the universal inverse power-law form 1/f a of the statistical normal distribution (Fig. 1) where f is the frequency and the spectral slope a decreases with increase in wavelength and approaches 1 for long wavelengths. The above result is also seen in Fig. 2 where the wavelength class interval-wise percentage frequency distribution of dominant wavelengths follow closely the corresponding computed variation of percentage contribution to the total variance as given by the statistical normal distribution. Inverse power-law form for power spectra implies long-range spatial correlations in the frequency distributions of the bases in DNA. Microscopic-scale quantum systems such as the electron or photon exhibit non-local connections or long-range correlations and are visualized to result from the superimposition of a continuum of eddies. Therefore, by analogy, the observed fractal fluctuations of the frequency distributions of the bases exhibit quantumlike chaos in the Human chromosome 1 DNA. The eddy continuum acts as a robust unified whole fuzzy logic network with global response to local perturbations. Therefore, artificial modification of base sequence structure at any location may have significant noticeable effect on the function of the DNA molecule as a whole. Further, the presence of introns, which do not have meaningful code, may not be redundant, but may serve to organize the effective functioning of the coding exons in the DNA molecule as a complete unit2.
The results imply that the DNA base sequence self-organizes spontaneously to generate the robust geometry of logarithmic spiral with the quasiperiodic Penrose tiling pattern for the internal structure. The space filling geometric figure of the Penrose tiling pattern has intrinsic local five-fold symmetry28 and ten fold symmetry. One of the three basic components of DNA, the deoxyribose is a five-carbon sugar and may represent the local five-fold symmetry of the quasicrystalline structure of the quasiperiodic Penrose tiling pattern of the DNA molecule as a whole. The DNA molecule shows ten fold symmetry in the arrangement of 10 bases per turn of the double helix. The study of plant phyllotaxis in Botany shows that quasicrystalline structure provides maximum packing efficiency for seeds, florets, leaves, etc29, 10, 30. Quasicrystalline structure of the quasiperiodic Penrose tiling pattern may be the geometrical structure underlying the packing of 103 to 105 micrometer of DNA in a eukaryotic (higher organism) chromosome into a metaphase structure a few microns long. The spatial geometry of the DNA is therefore organized into a hierarchy of helical structures. Such a concept may explain the observed loops of DNA in metaphase chromosome31. For example, the average class-interval wise percentage distribution of dominant periodicities show a peak in the wavelength interval 6-12 in units of 10bp, i.e. 60 to 120bp for all the four bases (Fig. 2). This predominant wavelength interval 60 to 120bp may correspond to the coil length of each of the two DNA coils on the basic nucleosome unit of the chromatin fibre. Also, the value of T50 ranges from 5 to 6 in units of 10bp, i.e. from 50 to 60bp (Fig. 1) indicating again the predominance of the fundamental coil length in the double coil of DNA in nucleosomes.