Hostname: page-component-77c89778f8-m8s7h Total loading time: 0 Render date: 2024-07-17T14:45:07.890Z Has data issue: false hasContentIssue false

Stochastic Models for Type Counts in a Literary Text

Published online by Cambridge University Press:  05 September 2017

Abstract

This paper studies a Markov chain model for type counts {Xn} in a literary text. First, a homogeneous Markov chain in discrete time is considered. This is then embedded in a continuous time Poisson process; the probability generating function for the resulting continuous time Markov chain is obtained. Expectations and variances of type counts are found for different values of the token count and various sizes M of an author's vocabulary; these results are finally tested against known data for three of Shakespeare's plays.

Type
Part VIII — Probability Models in the Humanities
Copyright
Copyright © 1975 Applied Probability Trust 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Brainerd, B. (1972) On the relation between types and tokens in literary text. J. Appl. Prob. 9, 507518.CrossRefGoogle Scholar
Gani, J. (1975) A Markov chain for type counts in vocabulary subgroups. Trans. 7th Prague Conference (to appear).Google Scholar
Lanke, J. (1974) On the relation between two empirical laws in language statistics. Research Report, Department of Mathematical Statistics, University of Lund.Google Scholar
Mcneil, D. R. (1973) Estimating an author's vocabulary. J. Amer. Statist. Assoc. 68, 9296.Google Scholar
Williams, C. B. (1970) Style and Vocabulary: Numerical Studies. Griffin, London.Google Scholar
Yule, G. U. (1944) The Statistical Study of Literary Vocabulary. Cambridge University Press.Google Scholar