Data visualization and analysis in second language research grew out of workshops developed and delivered by Guilherme D. Garcia at McGill, Concordia, and Ball State universities. These origins are evidenced by the conversational tone and exercise-based approach used throughout the book. Garcia aims to train readers to use the R programming language (through the R Studio interface) to construct robust statistical analyses while producing clear and informative plots and figures. Instead of traditional t-tests and ANOVAs, this book focuses on regression analyses, hierarchical models (a.k.a. mixed-effects models), and Bayesian statistics.
Part I, “Getting Ready”, includes a straightforward introduction detailing some of the author's rationale and offers a brief review of some fundamental notions: Garcia assumes his readers have some basic knowledge of statistics (e.g., sample vs. population means, p-values, effect sizes, confidence intervals, standard errors, t-tests, ANOVA) but no prior experience with R or the more advanced statistics detailed in later sections. The second chapter, “R Basics”, explains why this specific program is preferred over others and walks readers through the R/R Studio setup, including the installation of important packages and some initial calculations and script blocks, as well as more general concepts relating to data organization (e.g., tidy data, as in Wickham Reference Wickham2014).
Part II, “Visualizing the Data”, includes chapters 3–5: “Continuous Data”, “Categorical Data” and “Aesthetics: Optimizing Your Figures”. It focuses on the presentation of data through plots and figures, which Garcia rightly insists need to communicate clearly the focus of the research and should help inform decisions relevant to later analyses. In this section the author aspires to convince readers of “the numerous advantages of visualizing patterns before statistically analyzing them” (p. 239). Some of the technical programming aspects found within these chapters include transforming binary/categorical data into continuous (hence plottable) variables, and the use of facets (i.e., layers) that allow the combination of plot types and the improvement of figures’ explanatory potential.
Part III, “Analyzing the Data”, constitutes the most dense and demanding portion of the book. It begins with generalized linear models in chapters 6–8: “Linear Regression”, “Logistic Regression”, and “Ordinal Regression”. Here Part III runs parallel to Part II, first handling continuous data before adapting the relevant techniques to binary and ordinal data. Chapters 9 and 10 may seem to come late, considering that they represent the author's stated end goals for his readers: “Hierarchical Models” and “Going Bayesian”. However, by the time these are introduced, they flow quite naturally and convincingly from earlier chapters. As Garcia reassures readers, “Running hierarchical versions of our models in R is actually quite easy, i.e., the code is nearly identical to what we have already seen. The tricky part is not the code, but the statistical concept behind it” (p. 194) – which the author works to explain through comparisons and his usual walkthrough-style. He does the same when bringing Bayesian statistics into the fold, insisting on the shortcomings of the Frequentist statistics that constitute the norm, and the intuitiveness of the alternative. Finally, the essence of the author's intent for this book is summarized in chapter 11, which contains brief “Final Remarks”.
The book packs quite a lot of information in its less than 300 pages. Garcia has a good sense of how much material is digestible at one time and leaves more in-depth explanations to others through several references to additional readings (e.g., Wickham and Grolemund Reference Wickham and Grolemund2016, for R; Sonderegger et al. Reference Sonderegger, Wagner and Torreira2018, or Winter Reference Winter2019, ch. 14–15, for mixed-effects models; Kruschke, Reference Kruschke2015, or McElreath Reference McElreath2020, for Bayesian stats, etc.). Indeed, there exist other, more thorough (and voluminous) guides to R or statistics (or statistics using R), but this book's strength lies in giving readers just enough to enable them to quickly apply their newly acquired knowledge and skills to their own data in order to produce complex, journal-worthy analyses. The book is timely, with increasing expectations for more refined accounts of the diverse populations and intricate results stemming from studies of second language acquisition and bi/plurilingualism, as well as other fields of linguistic research. That said, the author's general disregard for standard reporting of p-values (his preference being towards confidence intervals and effect sizes) seems overly dismissive: though p-values have been misreported or misrepresented on occasion, they have not lost all usefulness.
Thorough and well-structured, the book encourages readers to adopt good organizational habits and includes extensive details on file management (summarized in Appendix D). Each chapter ends with a recap of its content and some extra exercises to try on one's own. Each section builds on the previous following a logical progression, assuming readers have processed what came before. This cumulative construction could, however, limit the book's relevance for those who might have wished to learn about specific subtopics; it can also lead to significant time spent leafing back and forth. For example, the link to access the book's associated data files (required to follow along with the explanations and exercises, and available through the Open Science Framework) is given in the preface, with the mention that said files will not be of use until chapter 2; the link is never repeated, nor is there a later reminder where it can be found. Fortunately, references to previous material are generally clearly indicated.
While the data files, once located, are freely available for download, such is not the case for the R scripts and code blocks – nor can they be copy-pasted from the book's electronic version. The author acknowledges this as a conscious decision: “You should manually type the code in R every time: it can be a little time-consuming, but it's certainly the best way to learn R – make sure you double-check your code, as it's very easy to miss a symbol or two.” (p. xix). While arguably pedagogical, this choice may again reduce the book's audience to those who have the time and energy to fully commit to Garcia's approach.
The code transcriptions work as they should for those using the versions of R and R Studio indicated in the book, that is, the latest available at time of publication. Users with previously downloaded versions will probably want to update theirs; otherwise, some commands may not operate exactly as stated. Future users of the book are also likely to encounter such issues: R evolves rather quickly, being open-sourced and widely used. However, it should not lead to anything but minor adjustments.
If some of these observations seem petty, it is that such irritants may aggravate what could already be an exercise in frustration for some users: R remains a programming language, the syntax of which is rarely straightforward for non-programmers. If we truly wish to strive for across-the-board better statistical analyses in linguistic research, more user-friendly tools will be required. While Garcia lauds the merits of R over, say, IBM's Statistical Package for the Social Sciences (SPSS), he does not mention options such as JASP (JASP Team 2021): a free, open-source program with a point-and-click interface that allows for Bayesian calculations and hierarchical modeling. While staunch R advocates will argue that the inability to personalize code in such programs means sacrificing too much control, they may suffice for a great number of users.
Still, though requiring the patience to tackle what may be an intimidating learning curve, an investment towards mastering R will pay off for the career academic. Garcia's book can assist with this process: the author succeeds in making accessible to a broader audience the coding of scripts yielding nicely designed plots and well-conceived statistical analyses. If the hand-holding approach and informal style of writing may put off some more established researchers, the explicit instructions as well as the casual and supportive language (such as the frequent “don't worry”) is meant to comfort those daunted by the book's subject matter.
While directed at second language researchers and graduate students, with examples and sample data pertaining to their field, the material covered could easily be adapted to other specializations who also handle quantitative data. Data visualization and analysis in second language research would serve well as a main course book for a data analysis class or a specialized seminar, or for a lab or research team's reading group. The lone reader will also benefit, though may need to consult with the large, “active online community” (p. 15) of R users – one of the selling points for using this program. Overall, those who are relatively new to hierarchical models and/or Bayesian statistics, and are willing to take on R as long as they are guided through it, should seriously consider Garcia's workshops-cum-handbook.