Hostname: page-component-7479d7b7d-767nl Total loading time: 0 Render date: 2024-07-13T19:10:57.089Z Has data issue: false hasContentIssue false

Evaluating evidence for the reliability and validity of lexical diversity indices in L2 oral task responses

Published online by Cambridge University Press:  30 August 2023

Kristopher Kyle*
Affiliation:
Department of Linguistics, University of Oregon, Eugene, OR, USA
Hakyung Sung
Affiliation:
Department of Linguistics, University of Oregon, Eugene, OR, USA
Masaki Eguchi
Affiliation:
Department of Linguistics, University of Oregon, Eugene, OR, USA
Fred Zenker
Affiliation:
Department of Second Language Studies, University of Hawaii at Manoa, Honolulu, HI, USA
*
Corresponding author: Kristopher Kyle; Email: kkyle2@uoregon.edu

Abstract

Although lexical diversity is often used as a measure of productive proficiency (e.g., as an aspect of lexical complexity) in SLA studies involving oral tasks, relatively little research has been conducted to support the reliability and/or validity of these indices in spoken contexts. Furthermore, SLA researchers commonly use indices of lexical diversity such as Root TTR (Guiraud’s index) and D (vocd-D and HD-D) that have been preliminarily shown to lack reliability in spoken L2 contexts and/or have been consistently shown to lack reliability in written L2 contexts. In this study, we empirically evaluate lexical diversity indices with respect to two aspects of reliability (text-length independence and across-task stability) and one aspect of validity (relationship with proficiency scores). The results indicated that neither Root TTR nor D is reliable across different text lengths. However, support for the reliability and validity of optimized versions of MATTR and MTLD was found.

Type
Methods Forum
Copyright
© The Author(s), 2023. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

ACTFL-ALC Press. (1996). Standard Speaking Test manual.Google Scholar
ALC Press. (2010). The Standard Speaking Test (SST). http://tsst.alc.co.jp/e/assessment.htmlGoogle Scholar
Alexopoulou, T., Michel, M., Murakami, A., & Meurers, D. (2017). Task effects on linguistic complexity and accuracy: A large-scale learner corpus analysis employing natural language processing techniques. Language Learning, 67, 180208.CrossRefGoogle Scholar
Bartoń, K. (2019). MuMIn: Multi-Model Inference (1.43.6) [Computer software]. https://cran.r-project.org/web/packages/MuMIn/index.htmlGoogle Scholar
Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67, 148.CrossRefGoogle Scholar
Biber, D., Conrad, S. M., Reppen, R., Byrd, P., Helt, M., Clark, V., Cortes, V., Csomay, E., & Urzua, A. (2004). Representing language use in the university: Analysis of the TOEFL 2000 Spoken and Written Academic Language corpus. TOEFL Monograph Series.Google Scholar
Biber, D., Gray, B., & Poonpon, K. (2011). Should we use characteristics of conversation to measure grammatical complexity in L2 writing development? TESOL Quarterly, 45, 535.CrossRefGoogle Scholar
Bulté, B., & Housen, A. (2019). Beginning L2 complexity development in CLIL and non-CLIL secondary education. Instructed Second Language Acquisition, 3, 153180.CrossRefGoogle Scholar
Bulté, B., & Roothooft, H. (2020). Investigating the interrelationship between rated L2 proficiency and linguistic complexity in L2 speech. System, 91, Article 102246.CrossRefGoogle Scholar
Carlson, S. B., Bridgeman, B., Camp, R., & Waanders, J. (1985). Relationship of admission test scores to writing performance of native and nonnative speakers of English. ETS Research Report Series, 1985, i–137.CrossRefGoogle Scholar
Chapelle, C. A., Enright, M. K., & Jamieson, J. M. (2008). Building a validity argument for the Test of English as a Foreign Language. Routledge.Google Scholar
Chotlos, J. W. (1944). Studies in language behavior IV: A statistical and comparative analysis of individual written language samples. Psychological Monographs, 56, 75111.CrossRefGoogle Scholar
Cohen, J. (1988). Statistical power analysis fo the behavioral sciences. Routledge.Google Scholar
Covington, M. A., & McFall, J. D. (2010). Cutting the Gordian knot: The moving-average type–token ratio (MATTR). Journal of Quantitative Linguistics, 17, 94100.CrossRefGoogle Scholar
Cumming, A. H., Kantor, R., Baba, K., Erdosy, U., Eouanzoui, K., & James, M. (2005). Differences in written discourse in independent and integrated prototype tasks for next generation TOEFL. Assessing Writing, 10, 543.CrossRefGoogle Scholar
Engber, C. A. (1995). The relationship of lexical proficiency to the quality of ESL compositions. Journal of Second Language Writing, 4, 139155.CrossRefGoogle Scholar
Explosion AI. (2018). SpaCy language models. https://spacy.io/models/en#en_core_web_smGoogle Scholar
Fergadiotis, G., Wright, H. H., & Green, S. B. (2015). Psychometric evaluation of lexical diversity indices: Assessing length effects. Journal of Speech, Language, and Hearing Research, 58, 840852.CrossRefGoogle ScholarPubMed
Geertzen, J., Alexopoulou, T., & Korhonen, A. (2014). Automatic linguistic annotation of large scale L2 databases: The EF-Cambridge Open Language Database (EFCAMDAT). In Millar, R. T., Martin, K. I., Eddington, C. M., Henery, N. M., & Tseng, A. (Eds.), Selected proceedings of the 31st Second Language Research Forum (pp. 240254). Cascadilla Proceedings Project.Google Scholar
Guiraud, P. (1960). [Problems and methods of linguistic statistics]. Reidel.Google Scholar
Hess, C. W., Sefton, K. M., & Landry, R. G. (1986). Sample size and type-token ratios for oral language of preschool children. Journal of Speech, Language, and Hearing Research, 29, 129134.CrossRefGoogle ScholarPubMed
Hwang, H. (2020). A contrast between VP-Ellipsis and Gapping in English: L1 acquisition, L2 acquisition, and L2 processing [Unpublished doctoral dissertation]. University of Hawaiʻi at Mānoa.Google Scholar
Ishikawa, S. (2011). A new horizon in learner corpus studies: The aim of the ICNALE project. In Weir, G., Ishikawa, S., & Poonpon, K. (Eds.), Corpora and language technologies in teaching, learning and research (pp. 311). University of Strathclyde Press.Google Scholar
Iwashita, N., Brown, A., McNamara, T., & O’Hagan, S. (2008). Assessed levels of second language speaking proficiency: How distinct? Applied Linguistics, 29, 2449.CrossRefGoogle Scholar
Izumi, E., Uchimoto, K., & Isahara, H. (2004). The NICT JLE Corpus: Exploiting the language learners’ speech database for research and education. International Journal of The Computer, the Internet and Management, 12, 119125.Google Scholar
Jarvis, S. (2002). Short texts, best-fitting curves and new measures of lexical diversity. Language Testing, 19, 5784.CrossRefGoogle Scholar
Jarvis, S. (2013). Capturing the diversity in lexical diversity. Language Learning, 63, 87106.CrossRefGoogle Scholar
Jarvis, S. (2017). Grounding lexical diversity in human judgments. Language Testing, 34, 537553.CrossRefGoogle Scholar
Jarvis, S., Grant, L., Bikowski, D., & Ferris, D. (2003). Exploring multiple profiles of highly rated learner compositions. Journal of Second Language Writing, 12, 377403.CrossRefGoogle Scholar
Johnson, W. (1944). Studies in language behavior I: A program of research. Psychological Monographs, 56, 115. https://doi.org/10.1037/h0093508CrossRefGoogle Scholar
Kane, M. T. (2013). Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50, 173.CrossRefGoogle Scholar
Kobayashi, Y., & Abe, M. (2016). Automated scoring of L2 spoken English with random forests. Journal of Pan-Pacific Association of Applied Linguistics, 20, 5573.Google Scholar
Koizumi, R., & Hirai, A. (2012). Comparing the story retelling speaking test with other speaking tests. JALT Journal, 34, 3560.CrossRefGoogle Scholar
Koizumi, R., & In’nami, Y. (2012). Effects of text length on lexical diversity measures: Using short texts with less than 200 tokens. System, 40, 554564.CrossRefGoogle Scholar
Koizumi, R., In’nami, Y., & Jeon, E. H. (2022). L2 speaking and its internal correlates: A meta-analysis. In Jeon, E. H. & In’nami, Y. (Eds.), Understanding L2 proficiency: Theoretical and meta-analytic investigations (pp. 307338). John Benjamins.CrossRefGoogle Scholar
Kyle, K. (2022). Pylats Python package (.37) [Python]. https://pypi.org/project/pylats/Google Scholar
Kyle, K., & Crossley, S. A. (2018). Measuring syntactic complexity in L2 writing using fine-grained clausal and phrasal indices. The Modern Language Journal, 102, 333349.CrossRefGoogle Scholar
Kyle, K., Crossley, S. A., & Jarvis, S. (2021). Assessing the validity of lexical diversity indices using direct judgements. Language Assessment Quarterly, 18, 154170.CrossRefGoogle Scholar
Kyle, K., Crossley, S. A., & McNamara, D. S. (2016). Construct validity in TOEFL iBT speaking tasks: Insights from natural language processing. Language Testing, 33, 319340.CrossRefGoogle Scholar
Kyle, K., Eguchi, M., Choe, A. T., & LaFlair, G. (2022). Register variation in spoken and written language use across technology-mediated and non-technology-mediated learning environments. Language Testing, 39, 618648.CrossRefGoogle Scholar
Lambelet, A. (2021). Lexical diversity development in newly arrived parent-child immigrant pairs: Aptitude, age, exposure, and anxiety. Annual Review of Applied Linguistics, 41, 7694.CrossRefGoogle Scholar
Lennon, P. (2000). The lexical element in spoken second language fluency. In Riggenbach, H. (Ed.), Perspectives on fluency (pp. 2542). University of Michigan Press.Google Scholar
Lenth, R., Singmann, H., Love, J., Buerkner, P., & Herve, M. (2019). R pagage Emmeans: Estimated marginal means, AKA least-squares means (1.47) [Computer software]. https://github.com/rvlenth/emmeansGoogle Scholar
Lu, X. (2012). The relationship of lexical richness to the quality of ESL learners’ oral narratives. The Modern Language Journal, 96, 190208.CrossRefGoogle Scholar
Maas, H. D. (1971). Über den Zusammenhang zwischen Wortschatzumfang und Länge eines Textes [On the connection between vocabulary breadth and text length]. Zeitschrift Für Literaturwissenschaft Und Linguistik, 2, 7396.Google Scholar
MacWhinney, B. (2000). The CHILDES project: Tools for analyzing talk (3rd ed., Vol. 2). Lawrence Erlbaum Associates.Google Scholar
Malvern, D. D., & Richards, B. J. (1997). A new measure of lexical diversity. In Ryan, A. & Wray, A. (Eds.), Evolving models of language (Vol. 12, pp. 5871). Multilingual Matters.Google Scholar
Malvern, D. D., Richards, B., Chipere, N., & Durán, P. (2004). Lexical diversity and language development: Quantification and assessment. Palgrave Macmillan.CrossRefGoogle Scholar
McCarthy, P. M. (2005). An assessment of the range and usefulness of lexical diversity measures and the potential of the measure of textual, lexical diversity (MTLD) [Doctoral dissertation, The University of Memphis].Google Scholar
McCarthy, P. M., & Jarvis, S. (2007). vocd: A theoretical and empirical evaluation. Language Testing, 24, 459488.CrossRefGoogle Scholar
McCarthy, P. M., & Jarvis, S. (2010). MTLD, vocd-D, and HD-D: A validation study of sophisticated approaches to lexical diversity assessment. Behavior Research Methods, 42, 381392.CrossRefGoogle ScholarPubMed
Norris, J. M., & Ortega, L. (2009). Towards an organic approach to investigating CAF in instructed SLA: The case of complexity. Applied Linguistics, 30, 555578.CrossRefGoogle Scholar
Pfenniger, S. (2020). The dynamic multicausality of age of first bilingual language exposure: Evidence from a longitudinal content and language integrated learning study with dense time serial measurements. The Modern Language Journal, 104, 662686.CrossRefGoogle Scholar
Polat, B., & Kim, Y. (2014). Dynamics of complexity and accuracy: A longitudinal case study of advanced untutored development. Applied Linguistics, 35, 184207.CrossRefGoogle Scholar
Read, J. (2000). Assessing vocabulary. Cambridge University Press.CrossRefGoogle Scholar
Révész, A., Ekiert, M., & Torgersen, E. N. (2016). The effects of complexity, accuracy, and fluency on communicative adequacy in oral task performance. Applied Linguistics, 37, 828848.Google Scholar
Tracy-Ventura, N., Huensch, A., & Mitchell, R. (2021). Understanding the long-term evolution of L2 lexical diversity: The contribution of a longitudinal learner corpus. In Le Bruyn, B. & Paquot, M. (Eds.), Learner corpus research meets second language acquisition (pp. 148171). Cambridge University Press.Google Scholar
Tracy-Ventura, N., Mitchell, R., & McManus, K. (2016). The LANGSNAP longitudinal learner corpus. In Alonso-Ramos, M. (Ed.), Spanish learner corpus research: Current trends and future perspectives (Vol. 78, pp. 117142). John Benjamins.CrossRefGoogle Scholar
Treffers-Daller, J. (2013). Measuring lexical diversity among L2 learners of French. In Jarvis, S. & Daller, M. (Eds.), Vocabulary knowledge: Human ratings and automated measures (pp. 79104). John Benjamins.CrossRefGoogle Scholar
Treffers-Daller, J., Mukhopadhyay, L., Balasubramanian, A., Tamboli, V., & Tsimpli, I. (2022). How ready are Indian primary school children for English medium instruction? An analysis of the relationship between the reading skills of low-SES children, their oral vocabulary and English input in the classroom in government schools in India. Applied Linguistics, 43, 746775.CrossRefGoogle Scholar
Treffers-Daller, J., Parslow, P., & Williams, S. (2018). Back to basics: How measures of lexical diversity can help discriminate between CEFR levels. Applied Linguistics, 39, 302327.Google Scholar
Tweedie, F. J., & Baayen, R. H. (1998). How variable may a constant be? Measures of lexical richness in perspective. Computers and the Humanities, 32, 323352.CrossRefGoogle Scholar
Vercellotti, M. L. (2017). The development of complexity, accuracy, and fluency in second language performance: A longitudinal study. Applied Linguistics, 38, 90111.CrossRefGoogle Scholar
Verspoor, M., Schmid, M. S., & Xu, X. (2012). A dynamic usage based perspective on L2 writing. Journal of Second Language Writing, 21, 239263.CrossRefGoogle Scholar
Vidal, K., & Jarvis, S. (2020). Effects of English-medium instruction on Spanish students’ proficiency and lexical diversity in English. Language Teaching Research, 24, 568587.CrossRefGoogle Scholar
Yoon, H.-J. (2017). Linguistic complexity in L2 writing revisited: Issues of topic, proficiency, and construct multidimensionality. System, 66, 130141.CrossRefGoogle Scholar
Zenker, F., & Kyle, K. (2021). Investigating minimum text lengths for lexical diversity indices. Assessing Writing, 47, Article 100505.CrossRefGoogle Scholar