Hostname: page-component-7479d7b7d-q6k6v Total loading time: 0 Render date: 2024-07-09T13:03:06.809Z Has data issue: false hasContentIssue false

Language choice and gender in a Nordic social media corpus

Published online by Cambridge University Press:  15 July 2019

Steven Coats*
Affiliation:
English Philology, Faculty of Humanities, University of Oulu, 90014 Oulu, Finland.
*
Email for correspondence: Steven.Coats@oulu.fi
Get access

Abstract

This study analyzes language choice, bi- and multilingualism, and gender in a corpus of over 22 million Twitter messages by almost 36,000 authors from the Nordic countries and territories. Author location, gender, and tweet language are identified using a novel method. Three principal findings are discussed: First, gendered preference for particular languages in the Nordics can be explained in part by patterns of gendered migration. Second, a distinct geographical pattern of female/male preference for the national languages of the region and for English is evident for users who are likely native users of a Nordic language: Females are more likely to use English, while males are more likely to use a Nordic language. Third, while high rates of bi- and multilingualism are found across the whole sample, males are more likely to use more than one language in all the Nordic countries/territories. The latter two findings are interpreted in light of sociolinguistic considerations as evidence for incipient language shift towards English for Nordic users on the Twitter platform.

Type
Research Article
Copyright
© Nordic Association of Linguistics 2019 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Ajao, Oluwaseun, Hong, Jun & Liu, Weiru. 2015. A survey of location inference techniques on Twitter. Journal of Information Science 41(6), 855864.CrossRefGoogle Scholar
Andersen, Margrethe Heideman. 2004. Engelsk i dansk: Sprogholdninger i Danmark. Helt vildt sjovt eller wannabeagtigt og ejendomsmæglerkækt? [English and Danish: Language attitudes in Denmark. Really great or wannabe-ish and real-estate-agent screechy?]. Copenhagen: Dansk Sprognævn.Google Scholar
Arnbjörnsdóttir, Birna. 2011. Exposure to English in Iceland: A quantitative and qualitative study. In Sigurgeirsson, Ingvar, Ásgeir, Ingólfur Jóhannesson & Gretar LMarinósson, . (eds.), Ráðstefnurit Netlu: Menntakvika 2011 [Netla´s Conference Journal: Menntakvika 2011]. Reykjavík: Menntavísindasvið Háskóla Íslands. http://netla.hi.is/menntakvika2011/004.pdf (accessed 13 March 2019).Google Scholar
Audience Project. 2016. Audience Project device study 2016: Social media across the Nordics. https://www.audienceproject.com/wp-content/uploads/study_social_media_across_the_nordics.pdf (accessed 10 October 2018).Google Scholar
Avoindata.fi. 2017. Etunimitilasto 2017-09-04 VRK [Given name statistics 2017-09-04 date]. https://www.avoindata.fi/data/dataset (accessed 10 October 2018).Google Scholar
Bamman, David, Eisenstein, Jacob & Schnoebelen, Tyler. 2014. Gender identity and lexical variation in social media. Journal of Sociolinguistics 18(2), 135160.CrossRefGoogle Scholar
Bilaniuk, Laada. 2003. Gender, language attitudes, and language status in Ukraine. Language in Society 32, 4778.CrossRefGoogle Scholar
Bird, Steven, Loper, Edward & Klein, Ewan. 2009. Natural Language Processing with Python. Newton, MA: O’Reilly Media.Google Scholar
Björklund, Mikaela, Björklund, Siv & Sjöholm, Kaj. 2013. Multilingual policies and multilingual education in the Nordic countries. International Electronic Journal of Elementary Education 6(1), 122.Google Scholar
Bolton, Kingsley & Meierkord, Christiane. 2013. English in contemporary Sweden: Perceptions, policies, and narrated practices. Journal of Sociolinguistics 17, 93117.CrossRefGoogle Scholar
Burger, John D., Henderson, John, Kim, George & Zarrella, Guido. 2011. Discriminating gender on Twitter. In Barzilay, Regina & Johnson, Mark (eds.), Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 13011309. New York: Association for Computing Machinery. http://aclweb.org/anthology//D/D11/D11-1120.pdf.Google Scholar
Cheshire, Jenny. 2002. Sex and gender in variationist research. In Chambers, J. K., Trudgill, Peter & Schilling-Estes, Natalie (eds.), The Handbook of Language Variation and Change, 423443. Oxford: Blackwell.Google Scholar
Christensen, Dennis. 2017. Media Development 2017: DR Audience Research Department’s Annual Report on the Development of Use of Electronic Media in Denmark. Copenhagen: Danmarks Radio. https://www.dr.dk/om-dr/about-dr/media-development-2010-2017 (accessed 10 October 2018).Google Scholar
Coats, Steven. 2016. Grammatical feature frequencies of English on Twitter in Finland. In Squires, Lauren (ed.), English in Computer-mediated Communication: Variation, Representation, and Change, 179210. Berlin: De Gruyter.Google Scholar
Coats, Steven. 2017a. Gender and lexical type frequencies in Finland Twitter English. In Säily, Tanja, Hiltunen, Turo & McVeigh, Joseph (eds.), Big and Rich Data in English Corpus Linguistics: Methods and Explorations (Studies in Variation, Contacts and Change in English 19). Helsinki: Varieng.Google Scholar
Coats, Steven. 2017b. Gender and grammatical frequencies in social media English from the Nordic countries. In Fišer, Darja & Beißwenger, Michael (eds.), Investigating Social Media Corpora, 102121. Ljubljana: University of Ljubljana Academic Publishing.Google Scholar
Danmarks Statistik. 2015a. Fornavne 2015– Kvinder [Given names 2015: Women]. Copenhagen: Danmarks Statistik.Google Scholar
Danmarks Statistik. 2015b. Fornavne 2015– Mænd [Given names 2015: Men]. Copenhagen: Danmarks Statistik.Google Scholar
Eisenstein, Jacob, O’Connor, Brendan, Smith, Noah A. & Xing, Eric P.. 2014. Diffusion of lexical change in social media. PLoS ONE 9(1), e113114. https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0113114 (accessed 13 March 2019).CrossRefGoogle ScholarPubMed
Eleta, Irene & Golbeck, Jennifer. 2014. Multilingual use of Twitter: Social networks at the language frontier. Computers in Human Behavior 41, 424432.CrossRefGoogle Scholar
European Commission. 2018. Education and Training Database. https://ec.europa.eu/eurostat/web/education-and-training/data/database (accessed 10 October 2018).Google Scholar
Gal, Susan. 1979. Language Shift: Social Determinants of Linguistic Change in Bilingual Austria. New York: Academic Press.Google Scholar
Görlach, Manfred. 2002. Still More Englishes. Amsterdam: John Benjamins.CrossRefGoogle Scholar
Graedler, Anne-Line. 2014. Attitudes towards English in Norway: A corpus-based study of attitudinal expressions in newspaper discourse. Multilingua 33(3–4), 291312.CrossRefGoogle Scholar
Graham, Mark, Hale, Scott A. & Gaffney, Devin. 2014. Where in the world are you? Geolocation and language identification in Twitter. The Professional Geographer 66(4), 568578.CrossRefGoogle Scholar
Gries, Stefan. 2010. Dispersions and adjusted frequencies in corpora: Further explorations. In Gries, Stefan, Wulff, Stefanie & Davies, Mark (eds.), Corpus Linguistic Applications: Current Studies, New Directions, 197212. Amsterdam: Rodopi.CrossRefGoogle Scholar
Grosjean, François. 2008. Studying bilinguals: Methodological and conceptual issues. In K. Bhatia, Tej & William Ritchie, C. (eds.), Handbook of Bilingualism, 3263. Malden, MA: Wiley-Blackwell.Google Scholar
Hagiwara, Masato. 2014. Tinysegmenter: Tokenizer Specified for Japanese. https://github.com/SamuraiT/tinysegmenter (accessed 10 October 2018).Google Scholar
Hale, Scott. 2014. Global connectivity and multilinguals in the Twitter network. In Jones, Matt & Palanque, Philippe (eds.), Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 833842. New York: Association for Computing Machinery.Google Scholar
Haustein, Stefanie, Bowman, Timothy D., Holmberg, Kim, Tsou, Andrew, Sugimoto, Cassidy R. & Larivière, Vincent. 2015. Tweets as impact indicators: Examining the implications of automated ‘bot’ accounts on Twitter. Journal of the Association for Information Science and Technology 67(1), 232238.CrossRefGoogle Scholar
Hawelka, Bartosz, Sitko, Izabela, Beinat, Euro, Sobolevsky, Stanislav, Kazakopoulos, Pavlos & Ratti, Carlo. 2014. Geo-located Twitter as proxy for global mobility patterns. Cartography and Geographic Information Science 41(3), 260271.CrossRefGoogle ScholarPubMed
Hochmair, Hartwig H., Juhász, Levente & Cvetojevic, Sreten. 2018. Data quality of points of interest in selected mapping and social media platforms. In Kiefer, Peter, Huang, Haosheng, Van de Weghe, Nico & Raubal, Martin (eds.), Progress in Location Based Services 2018, 293313. Cham: Springer.CrossRefGoogle Scholar
Hong, Lichan, Convertino, Gregorio & Chi, Ed H.. 2010. Language matters in Twitter: A large scale study. In Lada Adamic, Ricardo Baeza-Yates & Scott Counts (eds.), International AAAI Conference on Weblogs and Social Media, 518521. Menlo Park, CA: Association for the Advancement of Artificial Intelligence.Google Scholar
Jeeves, Anna. 2011. Learning English in contemporary Iceland: The attitudes and perceptions of Icelandic youth. In Linn et al. (eds.), 271296.Google Scholar
Jørgensen, Anna Katrine, Hovy, Dirk & Søgaard, Anders. 2015. Challenges of studying and processing dialects in social media. In Xu, Wei, Han, Bo & Ritter, Alan (eds.), Proceedings of the ACL 2015 Workshop on Noisy User-generated Text, 918. Stroudsburg, PA: Association for Computational Linguistics. http://aclweb.org/anthology/W15-4302 (accessed 10 October 2018).CrossRefGoogle Scholar
Kokkos, Athanasios & Tzouramanis, Theodoros. 2014. A robust gender inference model for online social net-works and its application to LinkedIn and Twitter. First Monday 19(9). https://firstmonday.org/ojs/index.php/fm/article/view/5216/4113 (accessed 13 March 2019).CrossRefGoogle Scholar
Kristiansen, Tore & Sandøy, Helge. 2010. Conclusion. Globalization and language in the Nordic countries: Conditions and consequences. International Journal of the Sociology of Language 204, 151159.Google Scholar
Kytölä, Samu & Westinen, Elina. 2015. ‘I be da reel gansta’: A Finnish footballer’s Twitter writing and metapragmatic evaluations of authenticity. Discourse, Context & Media 8, 619.CrossRefGoogle Scholar
Labov, William. 1990. The intersection of sex and social class in the course of linguistic change. Language Variation and Change 2, 205254.CrossRefGoogle Scholar
Labov, William. 2001. Principles of Linguistic Change, vol. 2: Social Factors. Oxford: Blackwell.Google Scholar
Lai, Mee-Ling. 2007. Gender and language attitudes: A case of postcolonial Hong Kong. International Journal of Multilingualism 4(2), 83116.CrossRefGoogle Scholar
Lakoff, Robin. 1973. Language and woman’s place. Language in Society 2(1), 4580.CrossRefGoogle Scholar
Laylavi, Farhad, Rajabifard, Abbas & Kalantari, Moshen. 2016. A multi-element approach to location inference of Twitter: A case for emergency response. International Journal of Geo-Information 5(5), 56. https://www.mdpi.com/2220-9964/5/5/56 (accessed 13 March 2019).CrossRefGoogle Scholar
Lee, Carmen. 2016. Multilingual resources and practices in digital communication. In Georgakopoulou, Alexandra & Spilioti, Tereza (eds.), The Routledge Handbook of Language and Digital Communication, 118132. London & New York: Routledge.Google Scholar
Leetaru, Kalev H., Wang, Shaowen, Cao, Guofeng, Padmanabhan, Anand & Shook, Eric. 2013. Mapping the global Twitter heartbeat: The geography of Twitter. First Monday 18(5/6). https://firstmonday.org/article/view/4366/3654 (accessed 13 March 2019).CrossRefGoogle Scholar
Leppänen, Sirpa, Pitkänen-Huhta, Anne, Nikula, Tarja, Kytölä, Samu, Törmäkangas, Timo, Nissinen, Kari, Kääntä, Leila, Räisänen, Tiina, Laitinen, Mikko, Koskela, Heidi, Lähdesmäki, Salla & Jousmäki, Henna. 2011. National Survey on the English Language in Finland: Uses, Meanings and Attitudes (Studies in Variation, Contacts and Change in English 5). Helsinki: Varieng.Google Scholar
Leppänen, Sirpa, Pitkänen-Huhta, Anne, Piirainen-Marsh, Arja, Nikula, Tarja & Peuronen, Saija. 2009. Young people’s translocal new media uses: A multiperspective analysis of language choice and heteroglossia. Journal of Computer-Mediated Communication 14(4), 10801107.CrossRefGoogle Scholar
Linn, Andrew. 2016. The Nordic experience. In Linn, Andrew (ed.), Investigating English in Europe: Contexts and Agendas, 201258. Berlin & Boston, MA: De Gruyter Mouton.CrossRefGoogle Scholar
Linn, Andrew, Bermel, Neil & Ferguson, Gibson (eds.). 2011. Attitudes towards English in Europe: English in Europe, vol. 1. Berlin & Boston, MA: De Gruyter Mouton.Google Scholar
Lønsmann, Dorte. 2009. From subculture to mainstream: The spread of English in Denmark. Journal of Pragmatics 41(6), 11391151.CrossRefGoogle Scholar
Lui, Marco & Baldwin, Timothy. 2014. Accurate language identification of Twitter messages. In Farzindar, Atefeh, Inkpen, Diana, Gamon, Michael & Nagarajan, Meena (eds.), Proceedings of the 5th Workshop on Language Analysis for Social Media (LASM) EACL 2014, 1725. Stroudsburg, PA: Association for Computational Linguistics.Google Scholar
Mislove, Alan, Lehmann, Sune, Ahn, Yong-Yeol, Onnela, Jukka-Pekka & Rosenquist, J. Niels. 2011. Understanding the demographics of Twitter users. In Nicolov, Nicolas & Shanahan, James G. (eds.), Proceedings of the Fifth International Conference on Weblogs and Social Media, 554557. Menlo Park, CA: Association for the Advancement of Artificial Intelligence.Google Scholar
Mortensen, Bjarma. 2011. Policies and attitudes towards English in the Faroes today. In Linn et al. (eds.), 71–96.Google Scholar
NRK [Norsk Rikskringkasting]. 2015. Oppsummeringen 2015: NRK Analyse [Summary 2015: NRK Analysis]. Oslo: NRK. https://fido.nrk.no/3059e4aff03749086d752a93b64cee618921d5c7bc51bd87b2e07bd8703fef69/medier_norge_2015_nrk_analyse.pdf (accessed 10 October 2018).Google Scholar
OECD [Organisation for Economic Co-operation and Development]. 2018. International Migration Database. http://dx.doi.org/10.1787/data-00342-en (accessed 10 October 2018).CrossRefGoogle Scholar
Preisler, Bent. 1999. Danskerne og det engelske sprog [The Danes and the English language]. Frederiksberg: Roskilde Universitetsforlag.Google Scholar
Rao, Delip, Yarowsky, David, Shreevats, Abhishek & Gupta, Manaswi. 2010. Classifying latent user attributes in Twitter. In Carlos, Jose Cortizo, Francisco M. Carrero, Ivan Cantador, Jose Antonio Troyano & Rosso, Paolo (eds.), Proceedings of the 2nd International Workshop on Search and Mining User-generated Contents, 3744. New York: Association for Computing Machinery.CrossRefGoogle Scholar
Rindal, Ulrikke. 2010. Constructing identity with L2: Pronunciation and attitudes among Norwegian learners of English. Journal of Sociolinguistics 14, 240261.CrossRefGoogle Scholar
Rindal, Ulrikke & Piercy, Caroline. 2013. Being ‘neutral’? English pronunciation among Norwegian learners. World Englishes 32(2), 211229.CrossRefGoogle Scholar
Roesslein, Joshua. 2015. Tweepy: Python programming language module. https://github.com/tweepy/tweepy (accessed 10 October 2018).Google Scholar
Ronen, Shahar, Gonçalves, Bruno, Hu, Kevin Z., Vespignani, Alessandro, Pinker, Steven & Hidalgo, César A.. 2014. Links that speak: The global language network and its association with global fame. Proceedings of the National Academy of Sciences of the United States of America (PNAS) 111(52), E5616E5622.CrossRefGoogle ScholarPubMed
Sandøy, Helge. 2003a. Moderne importord i Norden. Ei gransking av bruk, normer og språkholdningar [Modern import words in the North: A review of usage, norms, and language attitudes]. In Sandøy (ed.), 73100.Google Scholar
Sandøy, Helge (ed.). 2003b. Med ‘bil’ i Norden i 100 år. Ordlaging og tilpassing av utalandske ord [100 years with ‘car’ in the North: Construction and adaptation of foreign words]. Moderne importord i språka i Norden 1 [Modern import words in the Nordics 1]. Oslo: Novus.Google Scholar
Schulz, Axel, Hadjakos, Aristoteles, Paulheim, Heiko, Nachtwey, Johannes & Mühlhäuser, Max. 2013. A multi-indicator approach for geolocalization of tweets. In Kiciman, Emre (ed.), Proceedings of the Seventh International Conference on Weblogs and Social Media (ICWSM), 573582. Menlo Park, CA: Association for the Advancement of Artificial Intelligence.Google Scholar
Sites, Dick. 2013. Compact Language Detector 2. https://github.com/CLD2Owners/cld2 (accessed 10 October 2018).Google Scholar
Smith-Hefner, Nancy. 2009. Language shift, gender, and ideologies of modernity in central Java, Indonesia. Journal of Linguistic Anthropology 19(1), 5777.CrossRefGoogle Scholar
Sperstad, Tormod. 2018. Oppdatert sosiale medier-statistikk fra Norge [Updated social media statistics from Norway]. https://www.tormodsperstad.no/oppdatert-sosiale-medier-statistikk-norge/ (accessed 10 October 2018).Google Scholar
Squires, Lauren. 2015. Twitter: Design, discourse, and implications of public text. In Georgakopoulou, Alexandra & Spilioti, Tereza (eds.), The Routledge Handbook of Language and Digital Communication, 239256. London & New York: Routledge.Google Scholar
Stæhr, Andreas & Madsen, Lian M.. 2014. Standard language in urban rap: Social media, linguistic practice and ethnographic context. Tilburg Papers in Culture Studies, Paper 94. [Tilburg University]CrossRefGoogle Scholar
Statistics Greenland. 2017. De hyppigst anvendte (fem eller flere bærere) fornavne i Grønland. 1. juli 2011 [The most frequent given names (five or more bearers) in Greenland. 1 July 2011]. http://www.stat.gl/dialog/main.asp?lang=da&version=201102&sc=NA&colcode=b (accessed 10 October 2018).Google Scholar
Statistics Iceland. 2017. Population and elections. http://px.hagstofa.is/pxen/pxweb/en/Ibuar/Ibuar__Faeddirdanir__Nofn__Nofnkk/ (accessed 10 October 2018).Google Scholar
Statistics Norway. 2017b. Guttenavn alfabetisk 2008–2017 [Boy names, alphabetized, 2008–2017]. https://www.ssb.no/befolkning/statistikker/navn/aar (accessed 10 October 2018).Google Scholar
Statistics Norway. 2017a. Jentenavn, alfabetisk 2006–2017 [Girl names, alphabetized, 2006–2017]. https://www.ssb.no/befolkning/statistikker/navn/aar (accessed 10 October 2018).Google Scholar
Sun, Junyi. 2014. Jieba: Chinese Word Segmentation Module. https://github.com/fxsjy/jieba (accessed 10 October 2018).Google Scholar
Thøgersen, Jacob. 2004. Attitudes towards the English influx in the Nordic countries: A quantitative investigation. Nordic Journal of English Studies 3(2), 2338.CrossRefGoogle Scholar
Trudgill, Peter. 1974. The Social Differentiation of English in Norwich. London: Cambridge University Press.Google Scholar
Trudgill, Peter. 1998. Sex and covert prestige. In Jennifer Coates (ed.), Language and Gender: A Reader, 2128. Oxford & Malden, MA: Blackwell.Google Scholar
Twitter. 2013. Introducing new metadata for Tweets. https://blog.twitter.com/2013/introducing-new-metadata-for-tweets (accessed 10 October 2018).Google Scholar
Twitter. 2015. Evaluating language identification performance. https://blog.twitter.com/engineering/en_us/a/2015/evaluating-language-identification-performance.html (accessed 10 October 2018).Google Scholar
Vikør, Lars. 2003. Nordiske språkhaldningar: Presentasjon av ei meiningsmåling [Nordic language attitudes: Presentation of survey results]. In Sandøy (ed.), 4251.Google Scholar
Volkova, Svitlana, Bachrach, Yoram, Armstrong, Michael & Sharma, Vijay. 2015. Inferring latent user properties from texts published in social media. In Bonet, Blai & Koenig, Sven (eds.), Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, 42964297. Menlo Park, CA: Association for the Advancement of Artificial Intelligence.Google Scholar
Wikström, Peter. 2014. #srynotfunny: Communicative functions of hashtags on Twitter. SKY Journal of Linguistics 27, 127152.Google Scholar
Woolard, Kathryn A. 1997. Between friends: Gender, peer group structure, and bilingualism in urban Catalonia. Language in Society 26(4), 533560.CrossRefGoogle Scholar
Zubiaga, Arkaitz, Vicente, Iñaki San, Gamallo, Pablo, Pichel, José Ramom, Alegria, Iñaki, Aranberri, Nora, Ezeiza, Aitzol & Fresno, Víctor. 2016. TweetLID: A benchmark for tweet language identification. Language Resources and Evaluation 50(4), 729766.CrossRefGoogle Scholar