Fast String Matching in Stationary Ergodic Sources

John Shawe-Taylor

doi:10.1017/S0963548300002169

Fast String Matching in Stationary Ergodic Sources

Published online by Cambridge University Press: 12 September 2008

John Shawe-Taylor

Show author details

John Shawe-Taylor: Affiliation:
Department of Computer Science, Royal Holloway and Bedford New College, University of London, Egham, Surrey TW20 0EX, UK e-mail: john@dcs.rhbnc.ac.uk

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

A connection is made between the theory of ergodicity and the expected complexity of string searching. In particular, a substring search algorithm is introduced which, when applied to searching in text that has been produced by an appropriate stationary ergodic source, has an expected running time of O((N/m + m)logm), for a text string of length N and search string of length m. Similar expected complexity results have been obtained before, but the analysis is performed in a significantly more general framework, which models with greater accuracy the statistics of many types of strings, including natural language. The analysis also sheds light on the performance of the Boyer-Moore algorithm and the Sunday algorithm when applied to natural language.

Information

Type: Research Article
Information: Combinatorics, Probability and Computing , Volume 5 , Issue 4 , December 1996 , pp. 415 - 427

DOI: https://doi.org/10.1017/S0963548300002169 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 1996

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

[1]Knuth, D. E., Morris, J. H. and Pratt, V. R. (1977) Fast pattern matching in strings. SIAM. J. Comput. 6 323–350.CrossRef Google Scholar

[2]Smit, G. V. (1982) A comparison of three string matching algorithms. Software - Practice & Experience 12 57–66.CrossRef Google Scholar

[3]Sunday, D. M. (1990) A very fast substring search algorithm. Comm. ACM 33 132–142.CrossRef Google Scholar

[4]Boyer, R. S. and Moore, J. S. (1977) A fast string searching algorithm. Comm. ACM 20 762–772.CrossRef Google Scholar

[5]Guibas, L. J. and Odlyzko, A. M. (1980) A new proof of the linearity of the Boyer-Moore string searching algorithm. SIAM J. Comput. 9 672–682.CrossRef Google Scholar

[6]Baeza-Yates, R. A. (1989) String searching algorithms revisited. Proc. Workshop in Algorithms and Data Structures. Lecture Notes in Computer Science 382, pp. 75–96. Springer-Verlag.Google Scholar

[7]Horspool, N. (1980) Practical fast searching in strings. Software - Practice & Experience 16 501–506.CrossRef Google Scholar

[8]Schaback, R. (1988) On the expected sublinearity of the Boyer-Moore algorithm. SIAM J. Comput. 17 648–658.CrossRef Google Scholar

[9]Yao, A. C-C. (1979) The complexity of pattern matching for a random string. SIAM J. Comput. 8 368–387.CrossRef Google Scholar

[10]Shannon, C. E. (1948) A mathematical theory of communication. Bell. Syst. Tech. J. 27 379–423, 623–656.CrossRef Google Scholar

[11]Kim, J. Y. and Shawe-Taylor, J. S. (1994) Fast expected string matching using an n-gram algorithm. Software - Practice & Experience 24 79–88.CrossRef Google Scholar

[12]Welsh, D. (1988) Codes and Cryptography. Oxford University Press.Google Scholar

[13]Billingsley, P. (1965) Ergodic Theory and Information. Wiley.Google Scholar

[14]Thomasian, A. J. (1960) An elementary proof of the AEP of information theory. Ann. Math. Statist. 31 452–456.CrossRef Google Scholar

[15]Kim, J. Y. and Shawe-Taylor, J. S. (1992) An approximate string matching algorithm. Theor. Comput. Sci. 92 107–117.CrossRef Google Scholar

Article contents

Fast String Matching in Stationary Ergodic Sources

Abstract

Information

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests