Skip to main content Accessibility help
×
  • Cited by 4
Publisher:
Cambridge University Press
Online publication date:
July 2022
Print publication year:
2022
Online ISBN:
9781009070461
Subjects:
Research Methods in Linguistics, Applied Linguistics, Language and Linguistics

Book description

This Element explores relationships between collocations, writing quality, and learner and contextual variables in a first-year composition (FYC) programme. Comprising three studies, the Element is anchored in understanding phraseological complexity and its sub-constructs of sophistication and diversity. First, the authors look at sophistication through association measures. They tap into how these measures may tell us different types of information about collocation via a cluster analysis. Selected measures from this clustering are used in a cumulative links model to establish relationships between these measures, measures of diversity and measures of task, the language background of the writer and individual writer variation, and writing quality scores. A third qualitative study of the statistically significant predictors helps understand how writers use collocations and why they might be favoured or downgraded by raters. This Element concludes by considering the implications of this modelling for assessment.

References

Ackermann, K., & Chen, Y. H. (2013). Developing the academic collocation list (ACL): A corpus-driven and expert-judged approach. Journal of English for Academic Purposes, 12(4), 235–47.
Appel, R., & Wood, D. (2016). Recurrent word combinations in EAP test-taker writing: Differences between high and low proficiency levels. Language Assessment Quarterly, 13(1), 5571.
Aull, L. L. (2015). First-Year University Writing: A Corpus-Based Study with Implications for Pedagogy. Palgrave Macmillan.
Aull, L. L. (2017). Corpus analysis of argumentative versus explanatory discourse in writing task genres. Journal of Writing Analytics, 1, 147.
Aull, L. L. (2019). Linguistic markers of stance and genre in upper-level student writing. Written Communication, 36(2), 267–95.
Baayen, H., Davidson, D., & Bates, D. (2008). Mixed effects modelling with crossed random effects for subjects and items. Journal of Memory and Language, 59, 390412.
Barkaoui, K. (2008). Effects of scoring method and rater experience on ESL essay rating processes and outcomes. Unpublished Ph.D. thesis. University of Toronto.
Benson, M., Benson, E., & Ilson, R. (2009). The BBI Combinatory Dictionary of English: Your Guide to Collocations and Grammar. John Benjamins.
Berzak, Y., Kenney, J., Spadine, C. et al. (2016). Universal dependencies for learner English. In Erk, K. & Smith, N. A., eds.. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, pp. 737–46.
Bestgen, Y. (2017). Beyond single-word measures: L2 writing assessment, lexical richness and formulaic competence. System, 69, 6578.
Bestgen, Y., & Granger, S. (2014). Quantifying the development of phraseological competence in L2 English writing: An automated approach. Journal of Second Language Writing, 26, 2841.
Brezina, V. (2018). Statistics for Corpus Linguistics. Cambridge University Press.
Brezina, V., McEnery, T., & Wattam, S. (2015). Collocations in context: A new perspective on collocation networks. International Journal of Corpus Linguistics, 20(2), 139–73.
Brock, G., Pihur, V., Datta, S., & Datta, S. (2008). clValid: An R package for cluster validation. Journal of Statistical Software, 25(4), 122.
Brown, J. D. (1991). Do English and ESL faculties rate writing samples differently? TESOL Quarterly, 25, 587603.
Bulté, B., & Housen, A. (2014). Conceptualizing and measuring short-term changes in L2 writing complexity. Journal of Second Language Writing, 26, 4265.
Bychkovska, T., & Lee, J. J. (2017). At the same time: Lexical bundles in L1 and L2 university student argumentative writing. Journal of English for Academic Purposes, 30, 3852.
Carlson, S., & Bridgeman, B. (1986). Testing ESL student writers. In Greenberg, K. L., Weiner, H. S., & Donovan, R. A., eds., Writing Assessment: Issues and Strategies. Longman, pp. 126–52.
Chen, W. (2019). Profiling collocations in EFL writing of Chinese tertiary learners. RELC Journal, 50(1), 5370.
Chen, J., Zhang, M., & Bejar, I. I. (2017). An investigation of the e-rater® automated scoring engine’s grammar, usage, mechanics, and style microfeatures and their aggregation model. ETS Research Report Series, 114. https://doi.org/10.1002/ets2.12131
Christensen, R. H. B. (2019). Ordinal: Regression Models for Ordinal Data. R package version 2019.12-10. www.cran.r-project.org/package=ordinal/
Church, K., & Hanks, P. (1990). Word association norms, mutual information and lexicography. Computational Linguistics, 16, 22–9.
Council of Writing Program Administrators (CWPA), National Council of Teachers of English (NCTE), & National Writing Project (NWP). (2011). Framework for success in postsecondary writing. https://wpacouncil.org/aws/CWPA/pt/sd/news_article/242845/_PARENT/layout_details/false
Council of Writing Program Administrators (CWPA). (2014). Outcomes statement for first-year composition (3.0). https://wpacouncil.org/aws/CWPA/pt/sd/news_article/243055/_PARENT/layout_details/false
Crawley, M. J. (2013). The R Book (2nd ed.). Wiley.
Crossley, S. A. (2020). Linguistic features in writing quality and development: An overview. Journal of Writing Research, 11(3), 415–43.
Crossley, S. A., Cai, Z., & McNamara, D. S. (2012). Syntagmatic, paradigmatic and automatic n-gram approaches to assessing essay quality. In McCarthy, P. M. & Youngblood, G. M., eds., Proceedings of the 25th International Florida Artificial Intelligence Research Society Conference. The AAAI Press, pp. 214–19.
Daller, H., Turlik, J., & Weir, I. (2013). Vocabulary acquisition and the learning curve. In Jarvis, S. & Daller, H., eds., Vocabulary Knowledge: Human Ratings and Automated Measures. John Benjamins, pp. 185218.
De Marneffe, M. C., & Manning, C. D. (2008). The Stanford typed dependencies representation. In Coling 2008: Proceedings of the Workshop on Cross-Framework and Cross-Domain Parser Evaluation. pp. 18.
Debusmann, R. (2000). An introduction to dependency grammar. Hausarbeit fur das Hauptseminar Dependenzgrammatik SoSe, 99, 116.
Dunning, T. (1993). Accurate methods for the statistics of surprise and coincidence. Computational Linguistics, 19(1), 6174.
Durrant, P. (2019). Formulaic language in English for academic purposes. In Siyanova-Chanturia, A. & Pellicer-Sanchez, A., eds., Understanding Formulaic Language: A Second Language Acquisition Perspective. Routledge, pp. 211–28.
Durrant, P. (2020). Association measure calculator. https://phildurrant.net/association-measure-calculator/
Durrant, P., & Brenchley, M. (2021). The development of academic collocations in children’s writing. In Szudarski, P. & Barclay, S., eds., Vocabulary Theory, Patterning and Teaching. Multilingual Matters, pp. 99120.
Durrant, P., Brenchley, M., & McCallum, L. (2021). Understanding Development and Proficiency in Writing: Quantitative Corpus Linguistics Approaches. Cambridge University Press.
Durrant, P., Moxley, J., & McCallum, L. (2019). Vocabulary sophistication in freshman composition assignments. International Journal of Corpus Linguistics, 24(1), 3164.
Durrant, P., & Schmitt, N. (2009). To what extent do native and non-native writers make use of collocations? International Review of Applied Linguistics in Language Teaching, 47, 157–77.
Eckstein, G., & Ferris, D. (2018). Comparing L1 and L2 texts and writers in first-year composition. TESOL Quarterly, 52(1), 137–62.
Eguchi, M., & Kyle, K. (2020). Continuing to explore the multidimensional nature of lexical sophistication: The case of oral proficiency interviews. The Modern Language Journal, 104(2), 381400.
Ellis, N. C. (2008). Phraseology: The periphery and the heart of language. In Meunier, F. & Granger, S., eds., Phraseology in Foreign Language Learning and Teaching. John Benjamins, pp. 113.
Ellis, N. C., & Ferreira-Junior, F. (2009). Constructions and their acquisition: Islands and the distinctiveness of the occupancy. Annual Review of Cognitive Linguistics, 7, 187220.
Evert, S. (2004). The statistics of word cooccurrences: Word pairs and collocations. Unpublished Doctoral dissertation. University of Stuttgart.
Evert, S. (2009). Corpora and collocations. In Lüdeling, A. & Kytö, M., eds., Corpus Linguistics: An International Handbook (Vol. 2). Walter de Gruyter, pp. 121248.
Field, A., Miles, J., & Field, Z. (2012). Discovering Statistics Using R. Sage.
Firth, J. R. (1957). Papers in Linguistics 1934–1951. Oxford University Press.
Firth, J. R. (1968). A synopsis of linguistic theory, 1930–55. In Palmer, F. R., ed., Selected Papers of J.R. Firth 1952–1959. Longman, pp. 168205.
Gablasova, D., Brezina, V., & McEnery, T. (2017a). Collocations in corpus-based language learning research: Identifying, comparing and interpreting the evidence. Language Learning, 67, 155–79.
Gablasova, D., Brezina, V., & McEnery, T. (2017b). MI-score-based collocations in language learning research: A critical evaluation. Paper presented at the Corpus Linguistics conference at the University of Birmingham.
Garner, J., Crossley, S., & Kyle, K. (2019). Ngrams and L2 writing proficiency. System, 80, 176–87.
Garner, J., Crossley, S., & Kyle, K. (2020). Beginning and intermediate L2 writer’s use of ngrams: An association measures study. International Review of Applied Linguistics, 58(1), 5174.
Granger, S., & Bestgen, Y. (2014). The use of collocations by intermediate vs. advanced non-native writers: A bigram-based study. International Review of Applied Linguistics, 52(3), 229–52.
Granger, G., & Paquot, M. (2008). Disentangling the phraseological web. In Granger, S. & Meunier, F., eds., Phraseology: An Interdisciplinary Perspective. John Benjamins, pp. 2749.
Granger, G., & Paquot, M. (2009). Lexical verbs in academic discourse: A corpus-driven study of learner use. In Charles, M., Pecorari, D., & Hunston, S., eds., Academic Writing: At the Interface of Corpus and Discourse. Continuum, pp. 193214.
Gries, S. T. (2013a). 50-something years of work on collocations: What is or should be next. International Journal of Corpus Linguistics, 18(1), 137–65.
Gries, S. T. (2013b). Statistics for Linguists with R: A Practical Introduction (2nd revised ed.). De Gruyter Mouton.
Gries, S. T. (2015). The most under-used statistical method in corpus linguistics: Multi-level (and mixed effects) models. Corpora, 10(1), 95126.
Gries, S. T., & Durrant, P. (2021). Analysing co-occurrence data. In Gries, S. & Paquot, M., eds., A Practical Handbook of Corpus Linguistics. Springer, pp. 141–59.
Gries, S. T., & Ellis, N. C. (2015). Statistical measures for usage-based linguistics. Language Learning, 65(S1), 228–55.
Guo, L., Crossley, S. A., & McNamara, D. S. (2013). Predicting human judgements of essay quality in both integrated and independent second language writing samples: A comparison study. Assessing Writing, 18, 218–38.
Haberman, S. J., & Sinharay, S. (2010). The application of the cumulative logistic regression model to automated essay scoring. Journal of Educational and Behavioral Statistics, 35(5), 586602.
Hawkins, J. A., & Filipovic, L. (2012). Criterial Features in L2 English: Specifying the Reference Levels of the Common European Framework. Cambridge University Press.
Henriksen, B. (2013). Research on L2 learners’ collocational competence and development: A progress report. In Bardel, C., Lindqvist, C., & Laufer, B., eds., L2 Vocabulary Acquisition, Knowledge and Use: New Perspectives on Assessment and Corpus Analysis. EuroSLA, pp. 2956.
Hoey, M. (1991). Patterns of Lexis in Text. Oxford University Press.
Hoey, M. (2005). Lexical Priming: A New Theory of Words and Language. Routledge.
Hosmer, D. W., & Lemeshow, S. (2000). Applied Logistic Regression. Wiley.
Hou, J., Verspoor, M., & Loerts, H. (2016). An exploratory study into the dynamics of Chinese L2 writing development. Dutch Journal of Applied Linguistics, 5(1), 6596.
Hox, J. (2002). Multilevel Analysis: Techniques and Applications. Lawrence Erlbaum.
Huang, J., & Foote, C. J. (2010). Grading between lines: What really impacts professors’ holistic evaluation of ESL graduate student writing? Language Assessment Quarterly, 7(3), 219–33.
Huang, Y., Murakami, A., Alexopoulou, T., & Korhonen, A. (2018). Dependency parsing of learner English. International Journal of Corpus Linguistics, 23(1), 2854.
Jeffery, J. V., & Wilcox, K. C. (2013). How do I do it if I don’t like writing? Adolescents’ stance toward writing across disciplines. Reading & Writing, 27(6), 1095–117.
Jiang, J., Bi, P., Xie, N., & Liu, H. (2021). Phraseological complexity and low-and intermediate-level L2 learners’ writing quality. International Review of Applied Linguistics in Language Teaching. https://doi.org/10.1515/iral-2019-0147
Jones, S., & Sinclair, J. M. (1974). English lexical collocations: A study in computational linguistics. Cahiers de Lexicologie, 24, 1561.
Kim, J. (2014). Predicting L2 writing proficiency using linguistic complexity measures: A corpus-based study. English Teaching, 69(4), 2751.
Kim, M., Crossley, S. A., & Kyle, K. (2018). Lexical sophistication as a multidimensional phenomenon: Relations to second language lexical proficiency, development and writing quality. The Modern Language Journal, 102(1), 120–41.
Kyle, K. (2020). The relationship between features of source text use and integrated writing quality. Assessing Writing, 45, 112. https://doi.org/10.1016/j.asw.2020.100467
Kyle, K. (2021). Natural language processing for learner corpus research. International Journal of Learner Corpus Research, 7(1), 116. https://doi.org/10.1075/ijlcr.00019.int
Kyle, K., & Crossley, S. A. (2016). The relationship between lexical sophistication and independent and source-based writing. Journal of Second Language Writing, 34, 1224.
Kyle, K., Crossley, S. A., & Berger, C. (2018). The tool for the analysis of lexical sophistication (TAALES): Version 2.0. Behavior Research Methods, 50(3), 1030–46. https://doi.org/10.3758/s13428-017-0924-4
Kyle, K., Crossley, S., & Verspoor, M. (2021). Measuring longitudinal writing development using indices of syntactic complexity and sophistication. Studies in Second Language Acquisition, 43(4), 781812. https://doi.org/10.1017/S0272263120000546
Kyle, K., & Eguchi, M. (2021). Automatically assessing lexical sophistication using word, bigram, and dependency indices. In Granger, S., ed., Perspectives on the Second Language Phrasicon: The View from Learner Corpora. Multilingual Matters, 126–151. www.multilingual-matters.com/page/detail/?k=9781788924863
Lee, J. (2019). A comparison of writing tasks in ESL writing and first-year composition courses: A case study of one university. Language Teaching Research, 25(3), 118.
Levshina, N. (2015). How to Do Linguistics with R: Data Exploration and Statistical Analysis. John Benjamins.
Liu, X. (2016). Applied Ordinal Logistic Regression Using Stata: From Single-level to Multilevel Modeling. Sage.
Llanes, À., Tragant, E., & Serrano, R. (2018). Examining the role of learning context and individual differences in gains in L2 writing performance: The case of teenagers on an intensive study-abroad programme. The Language Learning Journal, 46(2), 201–16. DOI:10.1080/09571736.2015.1020332
Loerts, H., Lowie, W., & Seton, B. (2020). Essential Statistics for Applied Linguistics: Using R or JASP. Bloomsbury.
Lorenzo, F., & Rodríguez, L. (2014). Onset and expansion of L2 cognitive academic language proficiency in bilingual settings: CALP in CLIL. System, 47, 6472.
Maechler, M., Rousseeuw, P., Struyf, A., Hubert, M., & Hornik, K. (2015). Cluster: Cluster analysis basics and extensions. R package version 2.0.1.
Manning, C. D., Surdeanu, M., Bauer, J. et al. (2014). The Stanford CoreNLP natural language processing toolkit. In Bontcheva, K.., & Zhu, J., eds., Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations. Association for Computational Linguistics, pp. 5560.
Marcus, M., Marcinkiewicz, M., & Santorini, B. (1993). Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19, 313–30.
Matsuda, P. K., Saenkhum, T., & Accardi, S. (2013). Writing teachers’ perceptions of the presence and needs of second language writers: An institutional case study. Journal of Second Language Writing, 22, 6886.
McCallum, L. (2019). Modelling score variation in student writing with a big data system: Benefits, challenges, and ways forward. Journal of Writing Analytics, 3, 286311.
McCallum, L. (2021). The role of lexical collocations and learner and course variables in determining writing quality in assignments from a first year composition programme. Unpublished EdD thesis. University of Exeter.
Michigan Corpus of Upper-Level Student Papers (MICUSP). (2009). MICUSP Fair Use https://micusp.elicorpora.info/
Moore, T., & Morton, J. (2005). Dimensions of difference: A comparison of university writing and IELTS writing. Journal of English for Academic Purposes, 4(1), 4366.
Moxley, J. M., & Eubanks, D. (2015). On keeping score: Instructors’ vs. students’ rubric ratings of 46,689 essays. Writing Program Administration, 39(2), 5380.
Nesselhauf, N. (2005). Collocations in a Learner Corpus. Studies in Corpus Linguistics (Vol. 14). John Benjamins.
O’Connell, A. A. (2006). Logistic Regression Models for Ordinal Response Variables. Sage.
Öksuz, D., Brezina, V., & Rebuschat, P. (2021). Collocational processing in L1 and L2: The effects of word frequency, collocational frequency, and association. Language Learning, 71(1), 5598.
Osgood, C. E. (1952). The nature and measurement of meaning. Psychological Bulletin, 49, 197237.
Paquot, M. (2018). The phraseological dimension in interlanguage complexity research. Second Language Research, 35(1), 121–45.
Paquot, M. (2019). Phraseological competence: A useful toolbox to delimitate CEFR levels in higher education? Insights from a study of EFL learners’ use of statistical collocations. Language Assessment Quarterly, 15(1), 2943.
Pecina, P. (2005). An extensive empirical study of collocation extraction methods. In Proceedings of the ACL Student Research Workshop. pp. 1318.
Pecina, P. (2010). Lexical association measures and collocation extraction. Language Resources & Evaluation, 44, 137–58.
Picoral, A., Staples, S., & Reppen, R. (2021). Automated annotation of learner English: An evaluation of software tools. International Journal of Learner Corpus Research, 7(1), 1752.
University of South Florida (USF). (2018). Points of Pride USF. www.usf.edu/about-usf/points-of-pride.aspx
Quellmalz, E. S., Capell, F. J., & Chov, C. P. (1982). Effects of discourse and response mode on the measurement of writing competence. Journal of Educational Measurement, 19, 241–58.
R Core Development Team. (2014). R: A language and environment for statistical computing. R Foundation for Statistical Computing. www.R-project.org
Römer, U. (2009). English in academia: Does nativeness matter? Anglistik: International Journal of English Studies, 20, 89100.
Römer, U., & O’Donnell, M. B. (2011). From student hard drive to web corpus (part 1): The design, compilation and genre classification of the Michigan Corpus of Upper-level Student Papers (MICUSP). Corpora, 6(2), 159–77.
Ruth, L., & Murphy, S. (1988). Designing Writing Tasks for the Assessment of Writing. Ablex.
Schmitt, N., & Schmitt, D. (2020). Vocabulary in Language Learning (2nd ed.). Cambridge University Press.
Schneider, U. (2020). ΔP as a measure of collocation strength: Considerations based on analyses of hesitation placement in spontaneous speech. Corpus Linguistics and Linguistic Theory, 16(2), 249–74.
Seretan, V. (2011). Syntax-Based Collocation Extraction: Text, Speech and Language Technology Series (Vol. 44). Springer Science and Business Media. https://doi.org/10.1007/978-94-007-0134-2_4
Sinclair, J. M. (1987). Collocation: A progress report. In Steele, R. & Threadgold, T., eds., Language Topics: Essays in Honour of Michael Halliday (Vol. 2). John Benjamins, pp. 319–31.
Sinclair, J. (1991). Corpus, Concordance, Collocation. Oxford University Press.
Siyanova-Chanturia, A., & Pellicer-Sánchez, A. (2019). Formulaic language: Setting the scene. In Siyanova-Chanturia, A. & Pellicer-Sánchez, A., eds., Understanding Formulaic Language. Routledge, pp. 115.
Smadja, F. (1993). Retrieving collocations from text: Xtract. Computational Linguistics, 19(1), 143–77.
Tabachnick, B. G., & Fidell, L. S. (2014). Using Multivariate Statistics (6th ed.). Pearson Education Limited.
Tedick, D. (1990). ESL writing assessment: Subject-matter knowledge and its impact on performance. English for Specific Purposes, 9, 123–43.
Treffers-Daller, J., Parslow, P., & Williams, S. (2018). Back to basics: How measures of lexical diversity can help discriminate between CEFR Levels. Applied Linguistics, 39(3), 302–27.
Verspoor, M., Lowie, M., Chan, H. P., & Vahtrick, L. (2017). Linguistic complexity in second language development: Variability and variation at advanced stages. Recherches en didactique des langues et des cultures, 14(1), 128.
Ward, J. (2007). Collocation and technicality in EAP engineering. Journal of English for Academic Purposes, 6(1), 1835.
Weigle, S. C. (1999). Investigating rater/prompt interactions in writing assessment: Quantitative and qualitative approaches. Assessing Writing, 6, 145–78.
Wiechmann, D. (2008). On the computation of collostruction strength: Testing measures of association as expressions of lexical bias. Corpus Linguistics and Linguistic Theory, 4(2), 253–90.
Winter, B. (2020). Statistics for Linguists: An Introduction Using R. Routledge.
Wray, A. (2002). Formulaic Language and the Lexicon. Cambridge University Press.
Wray, A. (2006). Formulaic language. In Brown, K., ed., Encyclopedia of Language and Linguistics (Vol. 4). Elsevier, pp. 590–7.
Wray, A. (2019). Concluding question: Why don’t second language learners more proactively target formulaic sequences? In Siyanova-Chanturia, A. & Pellicer-Sánchez, A., eds., Understanding Formulaic Language. Routledge, pp. 248–69.

Metrics

Altmetric attention score

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Book summary page views

Total views: 0 *
Loading metrics...

* Views captured on Cambridge Core between #date#. This data will be updated every 24 hours.

Usage data cannot currently be displayed.