r-scan statistics of a Poisson process with events transformed by duplications, deletions, and displacements

Chingfer Chen; Samuel Karlin

doi:10.1239/aap/1189518639

r-scan statistics of a Poisson process with events transformed by duplications, deletions, and displacements

Part of: Stochastic processes Stochastic analysis

Published online by Cambridge University Press: 01 July 2016

Chingfer Chen and

Samuel Karlin

Show author details

Chingfer Chen*: Affiliation:
Stanford University
Samuel Karlin*: Affiliation:
Stanford University
*: ∗ Postal address: Department of Mathematics, Stanford University, Stanford, CA 94305, USA.
∗ Postal address: Department of Mathematics, Stanford University, Stanford, CA 94305, USA.

Article contents

Abstract
References

Rights & Permissions

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

A stochastic model of a dynamic marker array in which markers could disappear, duplicate, and move relative to its original position is constructed to reflect on the nature of long DNA sequences. The sequence changes of deletions, duplications, and displacements follow the stochastic rules: (i) the original distribution of the marker array {…, X−2, X−1, X0, X1, X2, …} is a Poisson process on the real line; (ii) each marker is replicated l times; replication or loss of marker points occur independently; (iii) each replicated point is independently and randomly displaced by an amount Y relative to its original position, with the Y displacements sampled from a continuous density g(y). Limiting distributions for the maximal and minimal statistics of the r-scan lengths (collection of distances between r + 1 successive markers) for the l-shift model are derived with the aid of the Chen-Stein method and properties of Poisson processes.

Keywords

r-scan statistic Poisson process Chen-Stein method Poisson approximation

MSC classification

Primary: 60H30: Applications of stochastic analysis (to PDE, etc.)

Secondary: 60G70: Extreme value theory; extremal processes

Type: General Applied Probability
Information: Advances in Applied Probability , Volume 39 , Issue 3 , September 2007 , pp. 799 - 825

DOI: https://doi.org/10.1239/aap/1189518639 [Opens in a new window]
Copyright: Copyright © Applied Probability Trust 2007

References

Arratia, R., Goldstein, L. and Gordon, L. (1989). Two moments suffice for Poisson approximations: the Chen–Stein method. Ann. Prob. 17, 9–25.Google Scholar

Barbour, A. D., Holst, L. and Janson, S. (1992). Poisson Approximation. Oxford University Press.Google Scholar

Berg, D. E. and Howe, M. M. (1989). Mobile DNA. American Society for Microbiology, Washington, DC.Google Scholar

Bernardi, G. et al. (1985). The mosaic genome of warm-blooded vertebrates. Science 228, 953–958.CrossRef Google Scholar PubMed

Bernardi, G., Mouchiroud, D., Gautier, C., Bernardi, G. (1988). Compositional patterns in vertebrate genomes: conservation and change in evolution. J. Molec. Evol. 28, 7–18.Google Scholar

Bird, A. P. (1986). CpG-rich islands and the function of DNA methylation. Nature 321, 209–213.Google Scholar

Blackburn, E. H. (1991). Structure and function of telomeres. Nature 350, 569–573.Google Scholar

Burge, C., Campbell, A. and Karlin, S. (1992). Over- and under-representation of short oligonucleotides in DNA sequences. Proc. Nat. Acad. Sci. USA 89, 1358–1362.Google Scholar

Chen, L. H. Y. (1975). Poisson approximation for dependent trials. Ann Prob. 3, 534–545.Google Scholar

Dembo, A. and Karlin, S. (1992). Poisson approximations for r-scan processes. Ann. Appl. Prob. 2, 329–337.Google Scholar

Ficket, J. W. (1982). Recognition of protein coding regions in DNA sequences. Nucleic Acids Res. 10, 5303–5318.CrossRef Google Scholar

Gerstein, M. (1997). A structure census of genomes: comparing bacterial, eukaryotic, and archaeal genomes in terms of protein structure. J. Molec. Biol. 274, 562–576.CrossRef Google Scholar PubMed

Gilson, E. et al. (1991). Palindromic units are part of a new bacterial interspersed mosaic element (BIME). Nucleic Acids Res. 19, 1375–1383.CrossRef Google Scholar PubMed

Glaz, J., Naus, J. and Wallenstein, S. (2001). Scan Statistics. Springer, New York.Google Scholar

Josse, J., Kaiser, A. D. and Kornberg, A. (1961). Enzymatic synthesis of deoxyribonucleic acid. J. Biol. Chem. 236, 864–875.Google Scholar

Karlin, S. and Brendel, V. (1992). Chance and statistical significance in protein and DNA sequence analysis. Science 257, 39–49.Google Scholar

Karlin, S. and Cardon, L. R. (1994). Computational DNA sequence analysis. Ann. Rev. Microbiol. 48, 619–654.Google Scholar

Karlin, S. and Macken, C. (1991). Some statistical problems in the assessment of inhomogeneities of DNA sequence data. J. Amer. Statist. Assoc. 86, 27–35.Google Scholar

Karlin, S., Mrázek, J. and Campbell, A. (1996). Frequent oligonucleotides and peptides of the Haemophilus influenzae genome. Nucleic Acids Res. 24, 4263–4272.CrossRef Google Scholar PubMed

Kingman, J. F. C. (1993). Poisson Processes. Oxford University Press.Google Scholar

Krawiec, S. and Riley, M. (1990). Organization of the bacterial chromosome. Microbiol. Rev. 54, 502–539.CrossRef Google Scholar PubMed

Naus, J. I. (1979). An indexed bibliography of clusters, clumps and coincidences. Internat. Statist. Rev. 47, 47–78.Google Scholar

Naus, J. I. (1982). Approximations for distributions of scan statistics. J. Amer. Statist. Assoc. 77, 177–183.Google Scholar

Reinert, G. and Schbath, S. (1998). Compound Poisson and Poisson approximations for occurrences of multiple words in Markov chains. J. Comput. Biol. 5, 223–254.Google Scholar

Willard, H. F. and Waye, J. S. (1987). Hierachical order in chromosome-specific human alpha satellite DNA. Trends Genet. 3, 192–198.Google Scholar

Article contents

r-scan statistics of a Poisson process with events transformed by duplications, deletions, and displacements

Abstract

Keywords

MSC classification

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests