Book contents
- Frontmatter
- Contents
- From the Editors
- Notes on Contributors
- 1 Introduction: Language Variation Studies and Computational Humanities
- 2 Panel Discussion on Computing and the Humanities
- 3 Making Sense of Strange Sounds: (Mutual) Intelligibility of Related Language Varieties. A Review
- 4 Phonetic and Lexical Predictors of Intelligibility
- 5 Linguistic Determinants of the Intelligibility of Swedish Words among Danes
- 6 Mutual Intelligibility of Standard and Regional Dutch Language Varieties
- 7 The Dutch-German Border: Relating Linguistic, Geographic and Social Distances
- 8 The Space of Tuscan Dialectal Variation: A Correlation Study
- 9 Recognising Groups among Dialects
- 10 Comparison of Component Models in Analysing the Distribution of Dialectal Features
- 11 Factor Analysis of Vowel Pronunciation in Swedish Dialects
- 12 Representing Tone in Levenshtein Distance
- 13 The Role of Concept Characteristics in Lexical Dialectometry
- 14 What Role does Dialect Knowledge Play in the Perception of Linguistic Distances?
- 15 Quantifying Dialect Similarity by Comparison of the Lexical Distribution of Phonemes
- 16 Corpus-based Dialectometry: Aggregate Morphosyntactic Variability in British English Dialects
12 - Representing Tone in Levenshtein Distance
Published online by Cambridge University Press: 12 September 2012
- Frontmatter
- Contents
- From the Editors
- Notes on Contributors
- 1 Introduction: Language Variation Studies and Computational Humanities
- 2 Panel Discussion on Computing and the Humanities
- 3 Making Sense of Strange Sounds: (Mutual) Intelligibility of Related Language Varieties. A Review
- 4 Phonetic and Lexical Predictors of Intelligibility
- 5 Linguistic Determinants of the Intelligibility of Swedish Words among Danes
- 6 Mutual Intelligibility of Standard and Regional Dutch Language Varieties
- 7 The Dutch-German Border: Relating Linguistic, Geographic and Social Distances
- 8 The Space of Tuscan Dialectal Variation: A Correlation Study
- 9 Recognising Groups among Dialects
- 10 Comparison of Component Models in Analysing the Distribution of Dialectal Features
- 11 Factor Analysis of Vowel Pronunciation in Swedish Dialects
- 12 Representing Tone in Levenshtein Distance
- 13 The Role of Concept Characteristics in Lexical Dialectometry
- 14 What Role does Dialect Knowledge Play in the Perception of Linguistic Distances?
- 15 Quantifying Dialect Similarity by Comparison of the Lexical Distribution of Phonemes
- 16 Corpus-based Dialectometry: Aggregate Morphosyntactic Variability in British English Dialects
Summary
Abstract Levenshtein distance, also known as string edit distance, has been shown to correlate strongly with both perceived distance and intelligibility in various Indo-European languages (Gooskens and Heeringa, 2004; Gooskens, 2006). We apply Levenshtein distance to dialect data from Bai (Allen, 2004), a Sino-Tibetan language, and Hongshuihe (HSH) Zhuang (Castro and Hansen, accepted), a Tai language. In applying Levenshtein distance to languages with contour tone systems, we ask the following questions: 1) How much variation in intelligibility can tone alone explain? and 2) Which representation of tone results in the Levenshtein distance that shows the strongest correlation with intelligibility test results? This research evaluates six representations of tone: onset, contour and offset; onset and contour only; contour and offset only; target approximation (Xu & Wang, 2001), autosegments of H and L, and Chao's (1930) pitch numbers. For both languages, the more fully explicit onset-contouroffset and onset-contour representations showed significantly stronger inverse correlations with intelligibility. This suggests that, for cross-dialectal listeners, the optimal representation of tone in Levenshtein distance should be at a phonetically explicit level and include information on both onset and contour.
INTRODUCTION
The Levenshtein distance algorithm measures the phonetic distance between closely related language varieties by counting the cost of transforming the phonetic segment string of one cognate into another by means of insertions, deletions and substitutions. After Kessler (1995) first applied the algorithm to dialect data in Irish Gaelic, Heeringa (2004) showed that cluster analysis based on Levenshtein distances agreed remarkably with expert consensus on Dutch dialect groupings.
- Type
- Chapter
- Information
- Computing and Language VariationInternational Journal of Humanities and Arts Computing Volume 2, pp. 205 - 220Publisher: Edinburgh University PressPrint publication year: 2009