INTRODUCTION
Beginning in infancy, humans share with other animals the ability to perceive objects, to chunk objects into arrays, and to discriminate these arrays on the basis of their approximate number (Feigenson, Dehaene & Spelke, Reference Feigenson, Dehaene and Spelke2004). However, unlike other animals, humans have repeatedly invented external symbolic systems for representing number through the course of history (Menninger, Reference Menninger1969; Ifrah, Reference Ifrah, Vellos, Harding, Wood and Monk2000). These systems – which include verbal count lists, body counts, written numerals, and physical calculators like the abacus – allow us to go well beyond the limits of perception to express and manipulate precise numerosities, and to describe mathematical relations. Why only humans create such systems – and how we do so – is a topic of intense debate, which bridges research in anthropology, comparative psychology, linguistics, philosophy, and human development. In developmental psychology, this debate has often focused on the role of natural language, and how evolutionarily ancient mechanisms might be exploited during language acquisition to represent exact number. According to some accounts, language might allow us to combine different types of representations that don't themselves express exact number to generate concepts that represent the positive integers (e.g. Spelke & Tsivkin, Reference Spelke, Tsivkin, Bowerman and Levinson2001). Others argue that the logic of number words is innate, and explained in part by a mapping between linguistic symbols and perceptual representations of number (Leslie, Gelman & Gallistel, Reference Leslie, Gelman and Gallistel2008). Others argue that children learn the logic of number via an inductive inference over relations between labels of perceptual sets – e.g. by mapping words like one, two, and three onto small sets, and noticing that each successive number differs in quantity by exactly 1 (Carey, Reference Carey2009). In each case, core systems of number perception provide the primitive building blocks from which number word meanings are acquired.
In the present paper, I propose an alternative to this general approach, according to which number word meanings are not wholly innate, or derived from core systems of numerical perception. Instead, I will argue that perception provides humans with an explanatory problem that the creation of symbolic number systems is meant to solve. This problem, confronted by humans from the beginning of our shared cultural history, can be expressed as follows: whereas our perception of quantity is noisy and subject to error, our perception of individual things is not. Consequently, despite our noisy representation of number, we have a strong intuition that collections in the world are made up of distinct individuals, such that they must contain determinate numbers of things that are subject to precise measurement. We might know, for example, that a basket of fruit contains a specific number of individual pieces, even if our only means of comparing this quantity to other baskets of fruit is noisy and approximate, or based on a rough ratio of items in each set. Counting systems, I propose, were constructed by our ancestors to resolve this explanatory gap – to measure and keep track of the precise quantities that we knew to exist in the world, but otherwise are unable to precisely quantify. Whether as learned today by children or as created over historical time – counting systems do not get their content from perception, but instead arise to explain it.
To make this case, I focus on how children learn the meanings of number words in development. I argue that children's meanings for number words are not constructed from perceptual representations of number. Instead, drawing on evidence from the historical record, anthropology, and child development, I argue that number word meanings are defined by their logical role in blind counting procedures, which is inductively inferred by children through extensive use of counting, by around age six. The logic of large number word meanings is not constructed from knowledge of smaller numbers, contrary to constructivist accounts (Spelke & Tsivkin, Reference Spelke, Tsivkin, Bowerman and Levinson2001; Carey, Reference Carey2009). Instead, small numbers and large numbers are learned by completely distinct mechanisms that are developmentally unrelated. Also, the meanings of large number words are not defined by their relations to the approximate number system (Gallistel & Gelman, Reference Gallistel and Gelman1992; Dehaene, Reference Dehaene1997), or a domain-specific mathematical logic, contrary to extreme nativist views (Leslie, Gelman & Gallistel, Reference Leslie, Gelman and Gallistel2008). The logic of counting is learned, without appeal to perception, from the counting procedure, and from logical representations that are domain general, and not specific to mathematics.
SOME EMPIRICAL FACTS
Most current accounts of number word learning seek to explain how children acquire knowledge, albeit implicit, of the logical principles which sit at the foundation of human mathematical knowledge. These principles are related to the axioms laid out by Peano, Dedekind, and contemporaries in an effort to explain the logical foundations of arithmetic (e.g. Frege, Reference Frege and Austin1974 [1884]; Leibniz, Reference Leibniz1886; inter alia). Below is a subset of these principles which are most relevant to our discussion:
-
1. 1 is a natural number.
-
2. All natural numbers exhibit logical equality (e.g. x = x; if x = y, then y = x, etc.).
-
3. For every natural number n, S(n) (the successor of n) is a natural number.
-
4. Every natural number has a successor.
In addition to explaining how knowledge of this logic arises, theories of how children acquire the positive integers also seek to explain how number words become associated to perceptual experience. Numerate humans readily assign approximate estimates to large quantities. For example, if shown an array of 16 dots on a computer screen, subjects assign a larger number word to this array than to an array of 8 or 12. Also, their estimates exhibit systematic error: on average, estimates exhibit a mean that approaches the target value, but the range of values exhibits greater variability for larger sets (Figure 1). These facts together have been taken as evidence that number words are associated to representations in what's been called the “Approximate Number System” or ANS (for review, see Dehaene, Reference Dehaene1997), an evolutionarily ancient system found in non-human primates, pigeons, mice, fish, and in humans of all ages, including neonates (Whalen, Gallistel & Gelman, Reference Whalen, Gallistel and Gelman1999; Brannon & Terrace, Reference Brannon and Terrace2000; Xu & Spelke, Reference Xu and Spelke2000; Barth, Kanwisher & Spelke, Reference Barth, Kanwisher and Spelke2003; Feigenson et al., Reference Feigenson, Dehaene and Spelke2004; Halberda & Feigenson, Reference Halberda and Feigenson2008; Halberda, Mazzocco & Feigenson, Reference Halberda, Mazzocco and Feigenson2008).
Finally, theories of number word learning also seek to explain the stages by which learning transpires. Although early reports argued that children exhibit mastery of counting principles from very early in development – by two or three years of age (Gelman, Reference Gelman1972; Gelman & Gallistel, Reference Gelman and Gallistel1978; 1992; Greeno, Riley & Gelman, Reference Greeno, Riley and Gelman1984) – later work has suggested a difficult and protracted learning sequence. These later studies have found that children typically begin by memorizing a partial count list – e.g. one, two, three, four, five, etc. – beginning sometime around the age of two in the US. As children acquire this list, they learn to recite it while pointing to objects, and to place number words in one-to-one correspondence with individual things (see Fuson & Hall, Reference Fuson, Hall and Ginsburg1983; Briars & Siegler, Reference Briars and Siegler1984; Fuson, Reference Fuson1988). However, during this early stage, they generally have little to no knowledge of what the number words mean (Wynn, Reference Wynn1990, Reference Wynn1992; Le Corre & Carey, Reference Le Corre and Carey2007). For example, using a test that has come to be known as the Give-a-Number task, Karen Wynn showed that many children who can recite a count list are nevertheless unable to reliably give one object when asked for one (Wynn, Reference Wynn1990, Reference Wynn1992). These children are often called ‘non-knowers’, since they appear to know little about the meanings of words in their count list. Eventually, however, children become able to give one object in response to requests for one, while not giving one for higher numbers as often, at which point they are called ‘one-knowers’. Between six and nine months later, children learn an exact meaning for two, and are called ‘two-knowers’, and then eventually learn three, at which point they are called ‘three-knowers’. Some children likely also pass through a ‘four-knower’ stage. Critically, however, there do not appear to be five-, six-, or seven-knowers – e.g. kids who give precisely five things when asked for five, but not for higher numbers that are within their productive count list.
In the process of learning one, two, and three, during which they are collectively called ‘subset-knowers’ (since they have meanings for only a subset of their number words), children exhibit strikingly poor understanding of counting. Generally, subset knowers, who range in age from around two to four years of age, do not attempt to count when asked to give a particular number of objects (Wynn, Reference Wynn1990, Reference Wynn1992; Le Corre, Van de Walle, Brannon & Carey, Reference Le Corre, Van de Walle, Brannon and Carey2006). However, when subset knowers do count, they make remarkable errors. For example, after correctly counting an array of six things, subset knowers who are immediately asked how many things there are either begin counting all over again, or instead utter a random number – generally not the number they just uttered at the end of their count (Schaeffer, Eggleston & Scott, Reference Schaeffer, Eggleston and Scott1974; Markman, Reference Markman1979; Fuson, Reference Fuson1988, Reference Fuson, Bideaud, Meljac and Fischer1992; Frye, Braisby, Lowe, Maroudas & Nicholls, Reference Frye, Braisby, Lowe, Maroudas and Nicholls1989; Wynn, Reference Wynn1990, Reference Wynn1992; Rittle-Johnson & Siegler, Reference Rittle-Johnson, Siegler and Donlan1998; for discussion, see Greeno et al., Reference Greeno, Riley and Gelman1984; Gelman, Reference Gelman1993). Further, even children who do respond correctly to the ‘how many’ question are unable to give this amount in the Give-a-Number task (e.g. Sarnecka & Carey, Reference Sarnecka and Carey2008). On the basis of such facts, most researchers have concluded that subset knowers deploy counting as a blind procedure, without much understanding of how it relates to cardinality, or an appreciation of the logic that relates numbers in the list.
Eventually, however, children appear to learn that counting can be used to construct large sets. At around the age of three-and-a-half or four, children in the US learn that, when asked to give a large number like seven, they can count items up to seven and give all objects implicated in their count (Wynn, Reference Wynn1990, Reference Wynn1992). At this point, these children are typically called ‘Cardinal Principle Knowers’ (CP-knowers), since they appear to know that the last word in a count labels the cardinality of the set as a whole. Beyond this, however, there is substantial controversy about what these so-called CP-knowers actually know. By some accounts, these children have mastered not only how to count, but also have learned the logic that relates numbers in their count list – e.g. that every natural number n has a successor, defined as n + 1 (e.g. Spelke & Tsivkin, Reference Spelke, Tsivkin, Bowerman and Levinson2001; Le Corre & Carey, Reference Le Corre and Carey2007; Condry & Spelke, Reference Condry and Spelke2008; Sarnecka & Carey, Reference Sarnecka and Carey2008; Carey, Reference Carey2009; etc.). Others, however, have argued that this logic emerges many years after children become CP-knowers, and that during this long delay, children deploy yet another blind tally procedure (Davidson, Eng & Barner, Reference Davidson, Eng and Barner2012; Wagner, Kimura, Cheung & Barner, Reference Wagner, Kimura, Cheung and Barner2015; Cheung, Rubenson & Barner, Reference Cheung, Rubenson and Barner2017). In the sections that follow I return to this issue.
THEORIES OF NUMBER WORD LEARNING
Although many different accounts of number word learning have been described, here I will present two broad alternatives that adopt nativist and constructivist positions, respectively. In the interest of proceeding quickly to my own proposal, and because these theories have been well described elsewhere, I will review these alternatives quickly, with a focus on their core properties and differences.
Nativist accounts: the approximate magnitudes and innate counting principles
Nativist accounts are perhaps the easiest to describe. Early nativist accounts, like that of Gelman and Gallistel (Reference Gelman and Gallistel1978), argued for innate counting principles. According to this view, when children are exposed to a count list, they exhibit an innate predisposition to always count in the same order (stable order principle), apply only one label to each individual counted (one-to-one principle), and infer that the last word used in a count labels the cardinality of the set as a whole (the cardinal principle), inter alia. Later versions of their hypothesis focused on couching the content of number words in the ANS (Gallistel & Gelman, Reference Gallistel and Gelman1992). And more recent proposals from this group (Leslie et al., Reference Leslie, Gelman and Gallistel2008) have argued that approximate number representations are supplemented by innate, domain-specific, logical knowledge, roughly equivalent to the principles described by Peano (for discussion of alternative nativist hypotheses, see Butterworth, Reeve, Reynolds & Lloyd, Reference Butterworth, Reeve, Reynolds and Lloyd2008; Rips, Bloomfield & Asmuth, Reference Rips, Bloomfield and Asmuth2008; Piantadosi, Tenenbaum & Goodman, Reference Piantadosi, Tenenbaum and Goodman2012). For example, Leslie et al. (Reference Leslie, Gelman and Gallistel2008) propose that children have “an innately given recursive rule S(x) = x + ONE … also known as the successor function” (p. 216).
On the view described by Leslie et al. (Reference Leslie, Gelman and Gallistel2008), it is argued that this innate logic is not sufficient (although it exhausts the knowledge that most theories seek to explain), and that an additional appeal to the ANS is required. This hybrid view, while possibly providing all of the pieces that could explain the origin of the positive integers, unfortunately isn't well supported by available data. First, this particular nativist hypothesis, wherein the Peano axioms are innate, fails to explain attested stages of number word learning and why children learn small numbers in a protracted sequence without much understanding of counting, and why, even after learning the counting procedures (and becoming CP-knowers), they struggle for years to learn its logic (see Davidson et al., Reference Davidson, Eng and Barner2012; Wagner et al., Reference Wagner, Kimura, Cheung and Barner2015; Cheung et al., Reference Cheung, Rubenson and Barner2017; see also discussion of the successor principle, below). A further problem with the hybrid approach is that, once an innate logic is invoked, there's little reason left to also invoke the ANS. As computational models like Piantadosi et al. (Reference Piantadosi, Tenenbaum and Goodman2012) show quite convincingly, an appeal to the ANS is unnecessary once the successor function and notions like exact equality and ‘one’ are built in.
An alternative is to posit that the positive integers get their meaning directly from the ANS, similar to Gallistel and Gelman (Reference Gallistel and Gelman1992). However, as the later proposal of Leslie et al. (Reference Leslie, Gelman and Gallistel2008) implicitly recognizes, the problem with this idea is that the ANS lacks the relevant content. It is difficult – if not impossible – to explain how analog, approximate representations could provide the content of discrete, precise number words. Critically, the problem is not simply that the ANS is noisy, unlike number words as argued by Halberda (Reference Halberda, Barner and Baron2016). Instead, it is that the ANS lacks the type of logical content that children must ultimately learn. Most models of the ANS assume that its representations are analog in nature, making it incapable of defining even the simplest of logical relations that children must ultimately acquire, like the successor function, which is defined in terms of discrete, whole numbers, and logical relations like ‘successor’. According to some proposals (Gallistel & Gelman, Reference Gallistel and Gelman2000; Halberda, Reference Halberda, Barner and Baron2016), the ANS represents the real numbers, which children then use to acquire the positive integers (a proposal which encounters very serious obstacles, as noted by Laurence & Margolis, Reference Laurence, Margolis, Carruthers, Laurence and Stich2005). A more recent proposal by Gallistel (Reference Gallistel2016) suggests that the ANS might actually represent number discretely, but that to explain extant empirical data the bit rate of the ANS would need to be finer than that of the positive integers, such that some additional transformation of these discrete bits would be required to package them into units differing by exactly 1. Very generally, if the ANS is invoked to explain the quantification of continuous amounts, as it often is (Pinel, Piazza, Le Bihan & Dehaene, Reference Pinel, Piazza, Le Bihan and Dehaene2004; Cantlon, Platt & Brannon, Reference Cantlon, Platt and Brannon2009; Lourenco & Longo, Reference Lourenco, Longo, Dehaene and Brannon2011), then a separate discretizing function must be required even if the ANS represents quantity in terms of bits (Gallistel, Reference Gallistel2016). It is the origin of this function that generates discrete whole numbers which is the problem to be explained, and to which the ANS itself has nothing to add (for additional discussion of this discretizing function in the context of grammar and the mass–count distinction, see Bale & Barner, Reference Bale and Barner2009; Barner & Snedeker, Reference Barner and Snedeker2005; Barner, Li & Snedeker, Reference Barner, Li and Snedeker2010).
A further reason to believe that the ANS does not define the positive integers comes from studies of estimation, which test the strength and nature of associations between number words and representations in the ANS. First, although the facts are still emerging, our current knowledge of number word learning suggests that associations between number words and the ANS are slow to develop, and are weak even among children who are competent counters (e.g. Lipton & Spelke, Reference Lipton and Spelke2005; Le Corre & Carey, Reference Le Corre and Carey2007). This is important, because if children lack strong associations between number words and the ANS before they learn to count, then it is unlikely that the ANS could be the basis for learning the logic of counting. Relevant to this, Le Corre and Carey (Reference Le Corre and Carey2007) showed that, when shown random dot arrays between 5 and 10, many three- to five-year-old children who are competent counters (CP-knowers) do not provide larger verbal estimates for larger numbers, suggesting that they have not yet mapped their count list to the ANS. More recent studies have questioned this conclusion, arguing that associations may emerge earlier in development (Wagner & Johnson, Reference Wagner and Johnson2011; Gunderson, Spaepen & Levine, Reference Gunderson, Spaepen and Levine2015; Odic, Le Corre & Halberda, Reference Odic, Le Corre and Halberda2015). However, as argued by Wagner, Chu, and Barner (unpublished data), none of the studies which purport to show these earlier mappings to the ANS actually provide conclusive evidence (these studies either do not classify children according to standard knower levels, making comparison impossible, or fail to show evidence of ANS signatures – i.e. increasing error with larger sets sizes – or they fail to correctly model the null hypothesis, leading to invalid statistical comparisons). Furthermore, there is strong evidence that even when children do map number words to approximate magnitudes, between the ages of five and seven years, these mappings are highly malleable, making them unsuitable for defining the positive integers (Sullivan & Barner, Reference Sullivan and Barner2013, Reference Sullivan and Barner2014).
For example, in a study by Sullivan and Barner (Reference Sullivan and Barner2013), adult subjects saw dot arrays flash on a screen and were told that the largest number of dots they would see would be either 75, 350, or 750, depending on the condition they were in. However, in all cases the maximum number was actually 350 (for a similar method, see Izard & Dehaene, Reference Izard and Dehaene2008). What Sullivan found was that subjects across these conditions provided significantly different estimates not only for large numbers, but for all numbers right down to about 10 or 12, which seemed to be strongly associated with approximate magnitudes and resistant to calibration. Furthermore, she found that when subjects were provided a verbal label and asked to map it to one of two dot arrays that stood in either a 2:1 or 3:4 ratio, subjects were barely better than chance for many numbers. Both results were also replicated in five- to seven-year-old children (Sullivan & Barner, Reference Sullivan and Barner2014), except that here Sullivan found even weaker associations between number words and approximate magnitudes. Children as old as seven years of age were completely at chance when asked to map number words larger than 12 to one of two dot arrays, and calibration shifted their estimates significantly for all numbers down to 5 or 6. Based on these results, Sullivan argued that subjects do not have rigid associations between magnitudes and most number words – as would be required for the ANS to define the positive integers. Although perceptual error in estimation predicts that estimates for a particular number should vary a little around a correct response (according to Weber's law), errors like those described by Sullivan suggest a more fundamental source of variability, and that estimates are ad hoc and constructed on the fly, not rooted in stable associations between individual words and specific magnitudes. Thus, even if mappings between number words and approximate magnitudes did emerge early, these mappings could not provide the kind of fixed semantic definitions required for learning number words (for further discussion of this general point, see Laurence & Margolis, Reference Laurence, Margolis, Carruthers, Laurence and Stich2005; Carey, Reference Carey2009; Lyons, Nuerk & Ansari, Reference Lyons, Nuerk and Ansari2015).
Constructivist accounts: objects and approximate magnitudes
One alternative to this nativist proposal, from Susan Carey (Carey, Reference Carey2004, Reference Carey2009), argues that children construct the concepts ‘one’, ‘two’, and ‘three’ from object representations, and then infer the logic of counting from these early meanings. Regarding one, two, and three, Carey appeals to evidence that, when humans track objects in a visual display, we are limited to tracking three or four things at a time. This evidence comes not only from adult studies of object tracking and object-based attention (for review see Pylyshyn, Reference Pylyshyn1989; Kahneman, Treisman & Gibbs, Reference Kahneman, Treisman and Gibbs1992; Scholl, Reference Scholl2001), but also evidence that human infants can keep track of up to three individuals when hidden from view (e.g. behind an occluder, in a bucket, or in a box; see Wynn, Reference Wynn1992a; Feigenson, Carey & Hauser, Reference Feigenson, Carey and Hauser2002; Feigenson & Carey, Reference Feigenson and Carey2005; etc.). Following Gordon (Reference Gordon2004), Carey calls this object tracking ability “parallel individuation” (or PI), since objects are individuated via distinct, parallel indexes in visual working memory. Critically, this system can represent objects and their properties but not sets per se, with the important consequence that it cannot represent the properties of sets, either, such as cardinality. Consequently, for Carey, learning the meanings of one, two, and three requires enriching PI (Le Corre & Carey, Reference Le Corre and Carey2007), with set representations like those found in natural language, which include atomic individuals, plural sets composed of these atoms, and a logical language that describes relations between these sets. The meanings of one, two, and three are thus defined by associations between the words and different sets – i.e. those including either one, two, or three atomic individuals.
Having acquired meanings for one, two, and three in this way, the child becomes a CP-knower, on Carey's (Reference Carey2009) account, by noticing an isomorphism between the meanings of these numbers and the structure of the count list. Specifically, according to Le Corre and Carey (Reference Le Corre and Carey2007):
“… the child makes an analogy between two very different ordering relations: sequential order in the count list (e.g. “two” after “one” and “three” after “two”), and sets related by addition of a single individual ({i x }, {i x iy }, {i x iy iz }). This analogy then supports the induction that each numeral refers to a set that can be put into 1–1 correspondence with a set of a given cardinality, with cardinalities individuated by additional individuals. It also supports the induction that for each numeral on the list that refers to a set of cardinality n, the next numeral on the list refers to a set with cardinality n + 1. (p. 432)
In some ways, this general framework resembles a much older proposal from John Stuart Mill. Though less refined in its assumptions about human cognition, Mill's (Reference Mill1884) idea is nevertheless similar to Carey's in assuming that small numbers can be learned by associating them to small sets, and that larger number words must be learned via inductive inference. According to Mill:
“… we may call, ‘Three is two and one,’ a definition of three; but the calculations which depend upon that proposition do not follow from the definition itself, but from an arithmetical theorem presupposed in it, namely, that collections of objects exist, which while they impress the senses thus, ∴, may be separated into two parts, thus, • • •. This proposition being granted, we term all such parcels Threes, after which the enunciation of the above-mentioned physical fact will serve also for a definition of the word Three.” (Mill, Reference Mill1884, pp. 336–337).
From here, Mill (Reference Mill1884) argues that mathematical knowledge is “altogether inductive” and that two foundational aspects of number – i.e. exact equality and the successor principle – are known inductively from experience with things in the world. Thus, like modern constructivists, Mill believed that the logical meanings of larger number words were learned via an inductive inference rooted in perception, which begins with observations regarding small sets of objects (for a critical review of this hypothesis from Mill's time, see Frege, Reference Frege and Austin1974 [1884]).
A second constructivist account, due to Liz Spelke (e.g. Spelke & Tsivkin, Reference Spelke, Tsivkin, Bowerman and Levinson2001), also rejects the idea that the logic of counting is innate, and, like Carey (Reference Carey2009), posits a role for object representations. However, unlike Carey, Spelke also believes that the approximate number system must also play a role in early learning. Specifically, Spelke and colleagues (e.g. Spelke & Tsivkin, Reference Spelke, Tsivkin, Bowerman and Levinson2001) argue that while the object tracking system can explain why children's knower level stages are limited to 3–4, it can't explain how object representations are transformed into representations of sets, or how larger numbers get their content. To remedy this, Spelke argues that natural number emerges from a combination of parallel individuation – which provides the notion of precise number – and the approximate number system – which, unlike parallel individuation, can represent sets and properties like cardinality, and is not limited to small quantities (for discussion see Izard, Streri & Spelke, Reference Izard, Streri and Spelke2014). Thus, by combining the systems via the symbolic representations provided by natural language, the limitations of each are overcome. Specifically, according to Spelke and Tsivkin (Reference Spelke, Tsivkin, Bowerman and Levinson2001), the child begins the learning process by mapping the words one through four onto corresponding representations in both parallel individuation and the ANS, thereby relating the two systems symbolically for each numeral learned. Next, the child notices that, for the numerals one through four, moving from one number word to the next corresponds to changes in the representations generated by both PI and the ANS. This observation then allows them to learn how verbal counting encodes number – i.e. that each individual step in the count list corresponds to a step from one number to its successor, where the successor of a number is one greater than its predecessor. Thus, much like Carey, Spelke proposes that the meanings of larger number words come about by an inductive inference over one, two, and three when children become CP-knowers. But unlike Carey, she believes that this inference is only possible if the content of one, two, and three is defined in terms of both parallel individuation and the ANS.
These two constructivist theories share two basic attributes. First, they argue that learning the meanings of one, two, and three involves the construction of new conceptual resources on the basis of perceptual representations that do not individually have this content. Second, they argue that the logic of counting is inductively inferred from knowledge of the small numbers, and thus that there is a strong causal link between learning small and large number words. Below, I will show that neither of these claims is empirically supported: that number word meanings are not rooted in perception, and that the logic of large numbers is not learned from small numbers.
LANGUAGE, PERCEPTION, AND LOGIC
At the core of my approach is a four-way distinction between levels of representation relevant to number word learning, and to language acquisition more generally. These four levels are as follows:
-
1. Perception
-
2. Verbal labels
-
3. The logical hypothesis space
-
4. Meanings defined in the logical hypothesis space
In this schema, ‘Perception’ refers to representations of individual objects and sets, and our ability to compare sets on the basis of their approximate magnitude. With only these data, we can notice rough differences in quantity, but lack the ability to make precise measurements or computations, or to keep accurate records in the service of trade. ‘Verbal Labels’ include the words that label the positive integers, like one, two, and three. Following Fodor (Reference Fodor and Piattelli-Palmarini1980), I assume that these first two levels of representation are not alone sufficient to explain the origin of children's logical representations of number, since such a logic cannot be expressed in these levels. Also, I take number word learning to be in part an inductive process, and therefore assume that new logical resources cannot be constructed from a hypothesis space that does not already have the relevant representational power. For example, a quantificational logic (one that includes existential and universal quantifiers) cannot be built from a simple predicate logic, since any inductive inference that involves positing new symbols would need to include these symbols as inputs to learning (much like learning that a triangular object is called a blicket requires both the prior concept ‘triangular object’ and the label blicket).
On the basis of this, I therefore assume that any meaning which can be expressed must be definable in terms of a hypothesis space, which is distinct from both perception and the verbal labels. Thus, I make a distinction between the ‘Logical hypothesis space’ and the ‘Meanings defined in the logical hypothesis space’, and distinguish both of these from the perceptual phenomena in the world that they seek to describe and explain. Whereas the hypothesis space is populated by a collection of primitive representations (i.e. representations that cannot be further decomposed into smaller parts), actual meanings can take the form of either simple primitives, combinations of primitives, or learned relations between primitives and/or their combinations. Critically, primitive representations enter into logical propositions, which are not present in the perceptual data themselves. This is what differentiates meanings from the data that they explain. Although the data – whether characterized in terms of objects or magnitudes – can readily be described by many logics, this does not make them logical in and of themselves. To learn a logic of counting, a logical hypothesis space of some form is required above and beyond perception.
More specifically, I propose that the hypothesis space that supports number word learning is the same space which supports other aspects of language acquisition, like quantifier acquisition, and the learning of number morphology, which emerge both independent of number words, and often several months earlier. This simple logic is one that includes representations of atomic individuals and plural sets, as well as simple Boolean operators (like conjunction and disjunction), inter alia. However, I do not propose that the logic of the positive integers is innate. Instead, I propose that (i) the numbers one, two, and three map onto innate primitive concepts (plural sets of one, two, or three atomic individuals), and (ii) larger numbers are defined by learned relations between primitive concepts. Thus, I assume an innate logical hypothesis space – as I believe any coherent theory must ultimately do – but I propose nothing beyond what is already required for learning the fundamental components of natural language. Specific to mathematics is only the successor function and its induction to all possible numbers, both of which I argue are learned from the use of counting procedures.
In the two sections that follow, I first describe the evidence that one, two, and three are learned by mapping verbal labels onto concepts that are routinely encoded by natural language when children learn singular and plural morphology. These meanings are not constructed, and do not derive from perception of objects or approximate magnitudes. I then describe how children learn the logic of counting – and in particular the successor function – by drawing on years of experience with blind counting procedures, a process that is totally independent of small number word knowledge and perceptual systems.
FIRST PROPOSAL: ONE, TWO, AND THREE ARE ACQUIRED FROM INNATE CONCEPTS, INDEPENDENT OF COUNTING
The first component of my proposal is that learning one, two, and three is fundamentally a problem of mapping words to pre-existing concepts. Learning these words does not require constructing new domain-specific conceptual resources from perception of objects or approximate magnitudes. Also, their meanings are unrelated to counting or innate counting principles. Instead, the meanings of one, two, and three are grounded in the same conceptual resources that support the acquisition of quantifying expressions in natural language like singular and plural nouns, or quantifiers like several and many.
Cross-cultural and historical variability
Over human history, languages have routinely featured grammatical forms for expressing precise quantities up to three even in absence of explicit counting systems. Some languages, like English, distinguish between singular and plural forms, which agree with numerals like one, two, etc.
-
a. One red button is lying on the table.
-
b. Two red buttons are lying on the table.
-
c. Five red buttons are lying on the table.
Others, like Slovenian Arabic, Hebrew, Sanskrit, and Ancient Greek, make a singular, dual, and plural distinction. And although less common, some languages, like Larike, make a distinction between singular, dual, trial, and plural (Corbett, Reference Corbett2000). No languages have grammatical markers for four or above (see Corbett, Reference Corbett2000, for review). However, there are many languages, including Japanese and Chinese, which have no obligatory singular–plural distinction, despite having numerals, and thus do not feature grammatical agreement with numerals. For example, the Japanese sentences describing one, two, or five buttons lying on a table differ only with respect to the numeral used, and otherwise are grammatically identical.
The historical record contains many instances of humans who can precisely express quantities up to three or four but who lack linguistic symbols for larger precise quantities. These include speakers of languages like Pirahã (Gordon, Reference Gordon2004; Frank, Everett, Fedorenko & Gibson, Reference Frank, Everett, Fedorenko and Gibson2008), Mundurucu (Pica, Lemer, Izard & Dehaene, Reference Pica, Lemer, Izard and Dehaene2004), Nicaraguan homesign (Spaepen, Coppola, Spelke, Carey & Goldin-Meadow, Reference Spaepen, Coppola, Spelke, Carey and Goldin-Meadow2011; Coppola, Spaepen & Goldin-Meadow, Reference Coppola, Spaepen and Goldin-Meadow2013; Spaepen, Flaherty, Coppola, Spelke & Goldin-Meadow, Reference Spaepen, Flaherty, Coppola, Spelke and Goldin-Meadow2013), Jarawara (Dixon, Reference Dixon2004), Krenak (Loukotka, Reference Loukotka1955), Warlpiri (Hale, Reference Hale1975), Aranda (Sommerfelt, Reference Sommerfelt1938), Botocudos (Tylor, Reference Tylor1871), etc. In such languages, small numbers are often represented as part of a morphological paradigm much like the singular–plural distinction in English. Just as often, however, they are instead independent word forms that are subject to grammatical recombination. For example, Haddon (Reference Haddon1890) reports Melanesian dialects spoken in the Torres Straight in which the word for ‘one’ is urapun, ‘two’ is okosa, and higher numbers are derived via combination – e.g. okosa-urapun (2 + 1), okosa-okosa (2 + 2), okosa-okosa-urapan (2 + 2 + 1), etc. Similarly, Donohue (Reference Donohue2008) describes the Melanesian language One, which has words for singleton (ara) and dual (plana) sets, and allows combination to describe larger quantities. Two facts about One are especially remarkable. First, according to Donohue, speakers generally do not use their system to count beyond 6 (which, interestingly is expressed as 3 + 3, rather than 2 + 2 + 2), presumably because they quickly lose track of where they are in the counting sequence as numbers grow larger. Second, despite this limitation, speakers of One apparently recognize that larger sets can be precisely quantified. According to Donohue, this limitation “does not mean that people are not capable of keeping careful track of precisely how much is owed to which parties in any transaction, with quantities reckoned routinely extending up to and beyond 50” (p. 424).
The prevalence of restricted number systems extends to accounts in popular culture, like the rabbit language Lapine, spoken by the fictional rabbits of the county of Hampshire, UK. According to Richard Adams, author of the popular (1972) children's novel Watership Down, “Rabbits can count up to four. Any number above four is hrair – ‘a lot,’ or ‘a thousand.’ Thus they say U Hrair – ‘The Thousand’ – to mean, collectively, all the enemies (or elil, as they call them) of rabbits – fox, stoat, weasel, cat, owl, man, etc.” (p. 19). According to Adams, the Lapine counting system explains the name of his book's chief protagonist, ‘Fiver’. On this topic, Adams notes “There were probably more than five rabbits in the litter when Fiver was born, but his name, Hrairoo, means “Little Thousand” – i.e. the little one of a lot or, as they say of pigs, the runt’” (Adams, Reference Adams1972, p. 19).
While many languages have restricted number word systems, it is not unusual for these systems to be supplemented by independent tally systems that make use of either the body, notches on wood, stones (‘calculi’), string, or other media to keep track of precise quantities (for a general overview, see Ifrah, Reference Ifrah, Vellos, Harding, Wood and Monk2000, and Menninger, Reference Menninger1969). Here I would like to observe two properties of these systems. First, consistent with my hypothesis that small and large number words are learned by distinct and independent mechanisms, tally systems are often completely independent of the language's quantificational system, and are constructed precisely with the goal of compensating for the limits of natural language quantification. Second, they speak to the origin of verbal counting systems. Although tally systems initially begin as distinct from natural language, labels for positions in tallies are sometimes co-opted and become used as verbal counting systems, independent of the original tally systems.
One especially informative case of this can be found in the Amazonian language group Nadahup, described by Epps (Reference Epps2006), in which there are three related groups, the Nadëb, Hup, and Dâw, each of whom uses a different, but related, number system. The Nadëb dialect has words akin to one (roughly ‘unity’), two (‘a couple’), and three, as well as words similar to several, all, and many (literally, ‘not one’). Nearby speakers of the Hup dialect have unrelated words for 1–4, which translate roughly as that (‘one’), eye quantity (‘two’), without sibling (‘three’), and with sibling (‘four’). Of relevance to the current discussion, the third language described by Epps, Dâw, uses the expressions ‘with sibling’ and ‘without sibling’ differently, to label body counts. In Dâw, the first three number words translate roughly as ‘unity’ (‘one’), ‘eye quantity’ (‘two’), and ‘rubber seed tree quantity’ (‘three’). To label a set of four, one hand is held up with fingers grouped into pairs, as the expression ‘with sibling’ is uttered. To label five, the same gesture is made, but with the thumb extended, accompanied by the expression ‘without a sibling’ (see Figure 2A). This system allows enumeration up to 10, where both hands form the gesture for 5 and are held up together, while ‘with a sibling’ is uttered in conjunction (see Figure 2B).
Dâw is particularly interesting for two reasons. First, it highlights the very common differentiation that cultures make between the enumeration of small and large quantities. When cultures have tally systems, these systems generally are used to extend a verbal system that is restricted to either a singular, dual type system, or a system of that contains words for ‘one’, ‘two’, and ‘three’ that can be recombined in only very limited ways. Second, these tallies – whether gestures, body counts, or other – sometimes acquire labels of their own: labels for tallies in Dâw (has no sibling / has a sibling) have been co-opted in the neighboring Hup dialect to label the quantities ‘three’ and ‘four’, without requiring hand gestures to be used at the same time. Very generally, a common solution to the limited expressive power of natural language is to create external symbol systems that can be used to precisely tally individuals, and, on occasion, to extend the linguistic system by labeling the values in the physical system in order to create a verbal counting system.
A precise, but non-exact, semantics for ‘one’, ‘two’, and ‘three’
The point of this review is to notice that labels for ‘one’, ‘two’, and ‘three’ have routinely emerged in human history (and sometimes in rabbits) as linguistic expressions quite independent of full-fledged systems for counting, and likewise that tally systems often emerge as independent complements to small number morphology. Flowing from this, my hypothesis is that children in the US – and in other groups who are exposed to a counting system – initially analyze small numbers using the same logic that supports learning singular, dual, and trial forms. In particular, small number words can be treated as properties of pluralities (Krifka Reference Krifka and Turner1999; Landman, Reference Landman2004; Bale & Barner, Reference Bale and Barner2009; Chierchia Reference Chierchia2010), such that their denotations can be represented by join semi-lattices with minimal parts corresponding to countable individuals (as in Link, Reference Link, Portner and Partee2002). Consider, for example, the lattice structure in Figure 3.
In a context (or domain) that includes exactly four individuals, a, b, c, and d, each object made available by perception can be represented in the logical hypothesis space as a singleton atomic individual, and can be labeled either by singular nouns or by the numeral one. These individuals can also be composed into plural sets comprising two, three, or four individuals. Critically, these pluralities form a partial ordering. If we write a singleton set as a, then the set containing the singleton set – i.e. having one element – can be written as {a}, while a dual set will contain two elements, which might correspond to any of {ab}, {ac}, or {ad} in Figure 3. A set containing three atomic individuals can therefore be written as either abc, abd, acd, or bcd, creating a partial order. As I show later in the paper, together with simple predicates and logical operators like equality, universal and existential quantifiers, and conditionals, this partial order can be used as a hypothesis space for inferring successor principle (for one demonstration of this, see Partee, ter Meulen & Wall, Reference Partee, ter Meulen and Wall2012).
In English, pluralities like those shown in Figure 3 can all be described using plural nouns. However, in dual languages like Central Slovenian, pluralities containing two individuals can be labeled using expressions with dual agreement, leaving sets of three or more to be labeled using plural forms. Meanwhile, these same set representations can be used to represent the meanings of numerals like one, two, and three in an identical fashion: one corresponds to singleton sets, two to pluralities containing two individuals, and three to pluralities containing three (see Figure 3). For example, a sentence like “The rabbit has two carrots” might be represented using this approach as: ∃x[has(x)(rabbit) ∧ carrots(x) ∧ #(x) = 2]. As noted by Kennedy (Reference Kennedy, Caponigro and Cecchetto2013), this type of analysis assigns a precise meaning to each lexical item, one, two, and three, much as it would for singular, dual, and trial forms. This logic therefore proposes a uniform treatment of numerals and morphological forms, unlike alternative approaches, which posit distinct representations for these different linguistic forms. On this view, if these concepts are innate, they are not specific to integers, but instead are the same innate representations used to support the acquisition of natural language. And if they are constructed, they must be built before children begin learning integers, in order to explain the acquisition of number morphology.
On this proposed view, although numerals have precise meanings (one means ‘one individual’, two means ‘two individuals’), each word can nevertheless be interpreted as ‘lower bounded’ when used in an existentially quantified sentence and therefore isn't treated as ‘exact’ by default. For example, the sentences in (1a) and (1b) can each be written as ‘∃x[has(x)(rabbit) ∧ carrot(x)]’ and thus are both true in a context in which a rabbit has three carrots:
-
(1)
-
a. The rabbit has one carrot
-
b. The rabbit has a carrot.
-
Although both the singular and one denote singletons, when used in sentences like those in (1) they do not rule out the existence of larger sets, since their semantics is satisfied by the existence of singleton sets. As argued by Barner and Bachrach (Reference Barner and Bachrach2010), children may arrive at upper bounded ‘exact’ interpretations of numerals via a type of pragmatic inference, called scalar implicature (Grice, Reference Grice1989), whereby they implicitly reason that a stronger statement, like the one in (2), must not be true, since if it were then a cooperative speaker would have uttered it instead.
-
(2) The rabbit has two carrots.
Reasoning in this way, the listener infers that (3a) is true but that stronger statements including (2) is false, resulting in the inference in (3):
-
(3) The rabbit has one carrot, but not two (or three, four, etc.).
On this account, children's first meaning for one is identical to the meaning of the singular form, two is identical to a dual, and three is like a trial. In each case, the meaning is precise, in that it denotes a set containing a specific number of individuals. But the expressions are not exact (i.e. they don't logically rule out the existence of larger sets). To assign a number word, n, an exact meaning, children must first learn the meaning of its immediate successor, n + 1, which places an upper bound on the size of sets to which n can apply – e.g. to become a two-knower, they must learn the precise meaning of three. To understand this distinction between precise and exact meanings, consider how children interpret singular nouns. Although children as young as two years of age only give an experimenter one banana when asked to “put a banana in the circle”, they nevertheless respond “yes” when shown two bananas in the circle and asked, “Is there a banana in the circle?” (Barner, Chow, and Yang, Reference Barner, Chow and Yang2009). In contrast, as soon as children become one-knowers, they give one object for one and reject sets of two when asked, “Is there one banana in the circle?” These facts suggest that these children associate singular expressions with singleton sets, but nevertheless don't treat them as exact – i.e. they accept the description as true even if some other description might be more informative (e.g. there are “some bananas in the circle”). A similar failure to compute implicatures can be found for the contrast between some and all, in which children as old as nine or ten years of age accept sentences like, “Some of the horses jumped the fence” when all of them did, whereas adults judge such statements to be bad, since the speaker should have said all (Smith, Reference Smith1980; Noveck, Reference Noveck2001; Papafragou & Musolino, Reference Papafragou and Musolino2003; Barner, Brooks & Bale, Reference Barner, Brooks and Bale2011; Barner, Reference Barner2012; Hochstein, Bale, Fox & Barner, Reference Hochstein, Bale, Fox and Barner2016; etc.).
On analogy, my proposal is that children's first number words have precise meanings – like singular, dual, or trial – that only become exact only when their successors are learned. Consistent with this, Barner and Bachrach (Reference Barner and Bachrach2010) showed that n-knowers (e.g. one-knowers) actually associate the numeral n + 1 with sets of n + 1 before treating it as exact. This result is true across multiple languages including English, Japanese, and Russian, and has now been replicated in two additional studies (Gunderson et al., Reference Gunderson, Spaepen and Levine2015; Wagner, Chu & Barner, unpublished data). On this analysis, children who are classified by Wynn's Give-a-Number task as one-knowers actually know the precise meaning of two, but aren't yet called two-knowers because they lack a meaning for three.
Critically, abstract semantic representations of atomic individuals emerge well before number word learning begins in earnest. A now substantial literature suggests that preverbal infants can track not only small sets of individual objects (e.g. Wynn, Reference Wynn1992a; Feigenson & Carey, Reference Feigenson and Carey2005), but also non-objects like actions and collections (Starkey, Spelke & Gelman, Reference Starkey, Spelke and Gelman1983, Reference Starkey, Spelke and Gelman1990; Wynn, Reference Wynn1996; Wynn, Bloom & Chiang, Reference Wynn, Bloom and Chiang2002; Wood & Spelke, Reference Wood and Spelke2005; for a review, see Cantrell & Smith, Reference Cantrell and Smith2013). Further, children use number words to quantify jumps, sounds, holes, and other non-objects from the time they begin using number words (Bloom, Reference Bloom1996; Giralt & Bloom, Reference Giralt and Bloom2000). Also, sometime between the age of 20 and 24 months – before they learn the meaning of one – children begin to make use of and comprehend the singular–plural distinction in language, and to deploy this distinction in non-verbal object-tracking tasks (Cazden, Reference Cazden1968; Brown, Reference Brown1973; Mervis & Johnson, Reference Mervis and Johnson1991; Dale & Fenson, Reference Dale and Fenson1996; Kouider, Halberda, Wood & Carey, Reference Kouider, Halberda, Wood and Carey2006; Barner, Thalwitz, Wood, Yang & Carey, Reference Barner, Thalwitz, Wood, Yang and Carey2007; Li, Ogura, Barner, Yang & Carey, Reference Li, Ogura, Barner, Yang and Carey2009; Wood, Kouider & Carey, Reference Wood, Kouider and Carey2009). Finally, children begin to use quantifiers and logical expressions like some, all, and, or, no, etc. several months before they begin to learn number word meanings (Fenson et al., 1994; Barner, Libenson, Cheung & Takasaki, Reference Barner, Libenson, Cheung and Takasaki2009; Feiman, Reference Feiman2015).
The acquisition of these linguistic forms – like the number morphology, logical connectives, and quantifiers – suggests that children have access to a rich hypothesis space of abstract individuals and sets prior to the onset of number word learning. My suggestion is that, rather than causing such concepts to emerge, number word learning may build on pre-existing set-relational concepts. Although it is possible that the logic expressed by natural language arose in humans due to natural language (and can only arise developmentally in children who learn a language), this remains an open question. It is just as likely that these logical represents emerge independent of language, and are available even to preverbal infants, and act as a basis for learning number morphology and other forms. However, the nature and origin of infants’ preverbal logical representations remains a profound puzzle that we have only begun to explore, and the next great frontier in the language acquisition literature.
Evidence from syntactic bootstrapping
Thus far we have seen that languages often feature expressions for ‘one’, ‘two’, and ‘three’ that are independent of counting. We've also seen that children might plausibly acquire one, two, and three using the same semantics that supports singular, dual, and trial forms. This semantics – which emerges beginning in infancy – includes representations of atomic individuals and plural sets, and allows for distinctions between pluralities of different sizes.
Empirical evidence for this hypothesis comes from cross-cultural studies of syntactic bootstrapping. The basic idea of bootstrapping theories is that, when learning a language, children might acquire one type of knowledge – e.g. semantic knowledge – from knowledge of an entirely different form – e.g. syntactic knowledge. For example, semantic bootstrapping theories, which take many forms (e.g. Schlesinger, Reference Schlesinger and Slobin1971, Reference Schlesinger, Levy, Schlesinger and Braine1988; Grimshaw, Reference Grimshaw, Baker and McCarthy1981; Macnamara, Reference Macnamara1982; Pinker, Reference Pinker1984), posit that children might use innate categories like ‘object’ and ‘action’ to identify syntactic categories like ‘noun’ and ‘verb’. On nativist versions of semantic bootstrapping, children might assume that words which label objects belong to an innate category ‘noun’, and then go about learning the language-specific syntactic properties of these words, thereby acquiring the syntactic category noun (e.g. Grimshaw, Reference Grimshaw, Baker and McCarthy1981; Pinker, Reference Pinker1984). For constructivists, the semantic categories ‘object’ and ‘action’ form the basis for distributional learning, from which syntactic categories emerge. Regardless of which version is adopted, the common thread is that semantic representations are used to learn syntax. Syntactic bootstrapping theories (e.g. Brown, Reference Brown1958; Gleitman, Reference Gleitman1990) reverse this logic, and argue that children might make inferences about the meanings of expressions based on syntactic evidence.
In the case of number word learning, various versions of syntactic bootstrapping have been proposed. According to one early proposal by Bloom and Wynn (Reference Bloom and Wynn1997), children begin the process of acquiring number word meanings by inferring that this class of words encodes the properties of sets, rather than of individual things. On their view, children first notice that number words occur in similar syntactic contexts as other words that encode set-relational properties, like quantifiers. For example, they might notice that, like the words many and several, number words almost always modify count nouns (e.g. table, cup) rather than mass nouns (e.g. water, mud). As a preliminary test of this idea, two previous studies asked whether children's comprehension of quantifiers is predictive of their number knower level, using a task nearly identical to Give-a-Number, wherein children are asked to, e.g. “give an orange”, “give all of the bananas”, or “give some of the strawberries” from a larger set. (Barner, Chow & Yang, Reference Barner, Chow and Yang2009; Barner, Libenson, Cheung & Takasaki, Reference Barner, Libenson, Cheung and Takasaki2009). These studies found that two- to five-year-old children's comprehension of quantifiers is correlated with their number knower level, even when controlling for age. The problem with this finding, however, is that this correlation might be explained by many factors, including individual differences in children's rate of learning, or in their exposure to language input. Consistent with this, even vocabulary size is predictive of number knower level (Negen & Sarnecka, Reference Negen and Sarnecka2012).
A more specific form of the bootstrapping hypothesis is proposed by Carey (Reference Carey2004, Reference Carey2009). According to this idea, children might learn the meanings of specific number words like one and two from specific morphological forms, like the singular and plural in English. On this hypothesis, if the words one, two, and three denote the same conceptual content as grammatical forms like singular, dual, and trial, then a child who has acquired the semantics of the grammatical forms, and who hears numerals used with grammatical agreement, might use this information to speed their learning of the numeral meanings. For example, a child learning English, and who has already acquired the semantics of the singular–plural distinction, might use this knowledge to infer that one cat refers to a singleton cat, whereas two cats and three cats refer to pluralities (i.e. sets larger than one). However, a child learning Japanese, who is not exposed to obligatory singular–plural morphology, would not benefit from this syntactic evidence, and thus might be slower to learn the difference between ‘one’ and larger numbers.
Consistent with this hypothesis, cross-linguistic studies of number word learning have found that two- to five-year-old children learning singular–plural languages, including English and Russian, are substantially faster to acquire the meaning of the word for ‘one’ than are children learning Japanese and Mandarin Chinese, which lack obligatory singular–plural marking (Sarnecka, Kamenskaya, Yamana, Ogura & Yudovina, Reference Sarnecka, Kamenskaya, Yamana, Ogura and Yudovina2007; Barner, Libenson, Cheung & Takasaki, Reference Barner, Libenson, Cheung and Takasaki2009). Critically, English-speaking children in the US begin to produce and comprehend the singular–plural distinction sometime between 20 and 24 months, several months before they begin to differentiate the meanings of the words one and two in English (Cazden, Reference Cazden1968; Brown, Reference Brown1973; Mervis & Johnson, Reference Mervis and Johnson1991; Fenson et al., 1994; Kouider et al., Reference Kouider, Halberda, Wood and Carey2006; Wood et al., Reference Wood, Kouider and Carey2009). Even more striking, additional studies have found that two- to five-year-old children learning Central Slovenian and Saudi Arabic (Almoammer, Sullivan, Donlan, Marušič, O'Donnell & Barner, Reference Almoammer, Sullivan, Donlan, Marušič, O'Donnell and Barner2013), both of which feature dual morphology, are faster to learn the meanings of words for ‘one’ and ‘two’ than children learning English – or any other language studied thus far. For example, whereas almost no English-speaking children in the US know the meanings of one or two by 24 months, roughly half of Slovenian children are either one- or two-knowers by this age. Remarkably, in this study, Slovenian children were faster to learn the meanings of words for ‘one’ and ‘two’ despite the fact that they had almost no knowledge of counting, and could barely count to 10 by the age of four (compared to counts of 30–40 in US children). Thus, their faster learning was clearly not attributable to greater exposure to number words or other correlates of number word leaning, like vocabulary size. More remarkable still, not all Slovenian children are faster to learn words for one and two: although children in many regions of the country acquire dialects of Slovenian that feature dual marking, many children also learn non-dual dialects. According to Marušič, Plesničar, Razboršek, Sullivan & Barner (Reference Marušič, Plesničar, Razboršek, Sullivan and Barner2016), Slovenian children not exposed to dual morphology are no faster than English-speaking children to learn their first number words.
Together, these studies provide evidence that children can leverage singular, dual, and plural agreement to acquire the meanings of number words, a finding which is consistent with the hypothesis that these forms encode similar – if not identical – semantic content.
Leaning ‘one’, ‘two’, and ‘three’ twice: evidence from bilingual learners
A final piece of evidence that one, two, and three label domain-general linguistic concepts, and are not specifically constructed in the process of number word learning, comes from bilingual language learners. Recall that, according to the proposals of Carey (Reference Carey2009) and Spelke (Spelke & Tsivkin, Reference Spelke, Tsivkin, Bowerman and Levinson2001), the meanings of one, two, and three are constructed either by combining object representations with linguistic sets, or by combining objects, sets, and approximate magnitudes. On each hypothesis, we might expect that though learning these words should be difficult the first time around – which indeed it appears to be – it should be substantially easier the second time around, when children learn a second language. Surprisingly, however, this is not the case.
In a recent study of French/English and Spanish/English bilinguals, Wagner et al. (Reference Wagner, Kimura, Cheung and Barner2015) tested two- to five-year-old bilingual children twice, once in English, and once in their second language. What they found was that children often had different knower levels in their two languages, so long as they were not CP-knowers in one of the languages. Subset knowers – i.e. children who knew the meanings of labels for up to ‘one’, ‘two’, or ‘three’ – were more likely to have a different knower level in their second language than to have the same knower level. Thus, for example, some children who were one-knowers in Spanish were three-knowers in English. In fact, statistical models that predicted children's L2 knower levels from age and counting ability (i.e. how high they could count), were unimproved when their L1 knower level was included as a predictor. In other words, there was no evidence of transfer among subset knowers. However, in contrast, almost 100% of children who were CP-knowers in one language were also CP-knowers in the second language. Knowledge of the counting procedure did appear to transfer.
What this study suggests is that the well-attested delays that children exhibit between learning the meanings ‘one’, ‘two’, and ‘three’ are probably not due to the problem of constructing new concepts. More likely, instead, is that children struggle with a more banal inductive problem – common to all word learning – of mapping labels onto existing concepts. Whereas learning how to use the counting procedure appears to be a real breakthrough that spreads from one language to the next (allowing children to skip knower levels in the weaker language), learning one, two, and three is every bit as hard if you've already learned un, deux, and trois. In this regard, number words do not differ from other quantity expressions in natural language, or from words like dog and cat: just as learning how to label cats in English is unlikely to speed the learning of chat in French, learning two is unlikely to speed deux. This conclusion is consistent with the broad idea that learning number words probably doesn't require a profound conceptual change, but instead relies on mapping words to existing conceptual structures, which also support the acquisition of grammatical forms like the singular, dual, and plural.
Why are subset knowers limited to learning ‘one’, ‘two’, and ‘three’ in absence of counting?
The view proposed thus far eschews a role for domain-specific logic or core number systems in the acquisition of one, two, and three. The basic logical hypothesis space of individuals and sets is in place before number word learning begins, and is entirely sufficient to explain the meanings of one, two, and three without invoking additional structures like parallel individuation or the approximate number system. Parsimony therefore suggests that invoking these additional systems is not necessary to explain learning. However, one puzzle on this account is why children – and indeed languages more generally – are limited to words for up to three or four in the absence of counting.
On the constructivist accounts of Carey (Reference Carey2009) and Spelke (Spelke & Tsivkin, Reference Spelke, Tsivkin, Bowerman and Levinson2001), this set limit falls out of the fact that one, two, and three are constructed from representations in parallel individuation, which itself has a set limit. The problem with this logic, however, is that it conflates two distinct questions: (i) What are the primitive building blocks that make up the child's conceptual hypothesis space when learning? and (ii) What are the domain-general perceptual and processing limits that constrain learning? Stated simply, a limit on visual working memory does not itself provide evidence that object representations in visual working memory are the conceptual building blocks of number word learning. Indeed, any representation that interfaces with visual attention will exhibit such a limit. As already noted, preverbal children exhibit a set limit of three to four not only for objects, but also for non-objects like actions (Starkey et al., Reference Starkey, Spelke and Gelman1983, Reference Starkey, Spelke and Gelman1990; Wynn, Reference Wynn1996; Wood & Spelke, Reference Wood and Spelke2005), and older children use number words to quantify jumps, sounds, holes, and other non-objects from the time they begin using number words (Bloom, Reference Bloom1996; Giralt & Bloom, Reference Giralt and Bloom2000). Attentional limits affect all kinds of tasks that require attention – not just object tracking. Thus, the fact that a particular domain of content like object tracking exhibits a set limit in no way demonstrates that this domain provides the hypothesis space within which number words are learned.
My proposal is that object representations do not constitute the hypothesis space from which number words are constructed, but instead are the phenomena which number words describe and explain. Still, in order to be characterized by a logic, these perceptual representations must be accessible to it. Here is where perception exerts its effect on the learning of small number words: because attention limits humans to representing only three or four individuals in parallel, children are limited to constructing logical representations of small sets, even though the logic, in principle, can represent much larger quantities. This attentional limit should restrict number word learning whether the logical hypothesis space is parallel individuation, as Carey and Spelke argue, or any representational system downstream from perception which uses object representations as inputs. Consequently, insofar as a separate logic of individuals and sets is required to explain other phenomena, as argued by Carey (Reference Carey2009) and as I've argued here, there is no independent reason to also invoke parallel individuation as part of the child's hypothesis space from which meanings are constructed. Objects, collections, and their properties are the things described and explained by numbers, not the logic that constitutes their meanings.
SECOND PROPOSAL: LEARNING THE SUCCESSOR PRINCIPLE FROM BLIND TALLIES
Thus far we have reviewed how children might acquire one, two, and three without appeal to innate domain-specific principles or conceptual change. However, this discussion has remained neutral to the question of how children acquire the logic of the positive integers. Though nativist theories, which posit an innate logic, have little to say about the knower level stages, it remains possible that their theory is nevertheless right in the case of counting, and that learning to count involves relating the count list to an innate, domain-specific logic. Meanwhile, although I have argued that no special constructivist story is needed to explain how children learn one, two, and three, I also haven't ruled out their account of how children learn the logic of counting – that it is inductively inferred from children's knowledge of small number words.
The protracted emergence of the successor function
As already noted, across all cultures previously studied, children who are exposed to a count list begin by learning words for ‘one’, ‘two’, and ‘three’ in sequence before eventually figuring out that counting can be used to generate any countable set. Children who have figured this out are generally called ‘Cardinal Principle knowers’ on the premise that they have understood that the last numeral in a count represents the cardinality of the set taken as a whole.
According to constructivist theories of counting, this sudden change in children's counting ability suggests that they have made a wild inductive leap, using their knowledge of the meanings of the small number words to infer the logic of the count list. Specifically, these accounts propose that children notice that the meanings of the words one and two differ by exactly one, and that similarly two and three differ by one. Having noticed this, they then infer that all numbers in their count list differ by exactly one from the preceding number, such that for every number n the successor of n has a cardinality of n + 1. This, in turn, permits children to use counting to evaluate cardinality. In keeping with this, Sarnecka and Carey (Reference Sarnecka and Carey2008), argue:
“The cardinal principle is often informally described as stating that the last numeral used in counting tells how many things are in the whole set. If we interpret this literally, then the cardinal principle is a procedural rule about counting and answering the question ‘how many.’ … Alternatively, the cardinal principle can be viewed as something more profound – a principle stating that a numeral's cardinal meaning is determined by its ordinal position in the list. This means, for example, that the fifth numeral in any count list – spoken or written, in any language – must mean five. And the third numeral must mean three, and the ninety-eighth numeral must mean 98, and so on. If so, then knowing the cardinal principle means having some implicit knowledge of the successor function – some understanding that the cardinality for each numeral is generated by adding one to the cardinality for the previous numeral.” (p. 665)
What Sarnecka and Carey (Reference Sarnecka and Carey2008) propose is that a semantic generalization concerning the relations between number words is what causes children to become CP-knowers. By noticing that two is equal to one + 1, that three is equal to two + 1, and that four is equal to three + 1, the child is able to infer that five must be equal to four + 1, and more generally that for any number n its successor is equal to n + 1.
The appeal of this account is that it provides a story for how children might acquire the meanings of large number words – and the basic logical foundations of arithmetic – all via one powerful semantic induction. However, as noted by Davidson et al. (Reference Davidson, Eng and Barner2012), the actual evidence that Sarnecka and Carey (Reference Sarnecka and Carey2008) present in favor of this semantic induction hypothesis appears to favor the weaker alternative they describe, that initially children accurately count and give large sets using a blind procedure, and only much later learn its logic. In describing their account, Sarnecka and Carey reason that if children become competent counters by learning the successor relation between numbers – i.e. giving exactly five depends on understanding that five is equal to four + 1 – then they should be able to predict that adding one object to a set of four results in five, rather than some other number, like six. To test this, they presented children with a box, told the children that it contained some amount (e.g. four frogs). Next, they added one object, and then asked children, “Are there five or six?” Consistent with their hypothesis, Sarnecka and Carey reported that children who were classified as Cardinal Principle Knowers using the Give-a-Number task performed better, on average, than children categorized as one-, two-, or three-knowers, who responded randomly on this task.
However, although CP-knowers performed well as a group, this relative success was driven by a small number of highly competent children; most children in Sarnecka and Carey's (Reference Sarnecka and Carey2008) dataset performed no better than subset knowers, and randomly guessed. Corroborating this, Davidson et al. (Reference Davidson, Eng and Barner2012) showed that most young children classified as CP-knowers fail to understand the + 1 rule for even very small numbers – e.g. five and six – until after they have acquired substantial experience counting. To show this, Davidson et al., analyzed children's data according to how high they could count. What they found is that the least experienced counters – who could count up to 19 or less – performed almost uniformly at chance for small numbers like four and five. Children who could count slightly higher – up to 30 – performed only slightly better with small numbers, and were at chance for numbers in the teens, still well within their productive counting range. Only children who could count past 30 succeeded at very small numbers, as a group, though they still struggled with numbers bigger than four or five, suggesting that they had not yet generalized the successor principle to all numbers in their count list, as an inductive inference account would predict. Further, these results were not merely due to children's inability to identify the successors of numbers. When asked in a separate task what number comes after four, for example, even low counters were at ceiling in choosing between five and six (for replications of this effect, see Wagner et al., Reference Wagner, Kimura, Cheung and Barner2015; Cheung et al., Reference Cheung, Rubenson and Barner2017; for similar results using different methods, see Fuson, Reference Fuson1988).
How do children infer the successor principle?
If the successor principle is not learned when children become CP-knowers, when does this logical foundation of natural number emerge, and how do children learn it? This question can be distilled into two sub-questions: (i) What evidence informs learning? and (ii) What is the nature and origin of representations that form the hypothesis space for learning?
Building on the work by Davidson et al. (Reference Davidson, Eng and Barner2012) and others, a recent study by Cheung et al. (Reference Cheung, Rubenson and Barner2017) explored these questions by asking when children appear to acquire more than simple item-based knowledge of successor relations – i.e. when they can reason about the successors of all numbers in their count list, as well as all possible numbers. Cheung et al., point out that most studies of counting state the successor principle in a way that deviates importantly from the Peano axioms. For example, according to Sarnecka and Carey (Reference Sarnecka and Carey2008), children know the successor principle if they understand that, for any number n, the successor of n equals n + 1. This, however, is substantially weaker than how the successor principle is described by the Peano axioms, which state that every natural number n has a successor defined as n + 1. This is stronger because, whereas the former definition is potentially consistent with the hypothesis that numbers are finite, the latter is not.
Based on this, Cheung et al. (Reference Cheung, Rubenson and Barner2017) tested four- to six-year-old children with Sarnecka and Carey's (Reference Sarnecka and Carey2008) successor task on numbers ranging up to 100, and found that children do not exhibit knowledge of item-based successors for all numbers in their productive count list until around the age of five-and-a-half – a full two years after most children in the US become CP-knowers. Also, Cheung et al. (Reference Cheung, Rubenson and Barner2017) showed that this knowledge emerged at around the same time that children begin to show adult-like intuitions of infinity. Using questions adapted from earlier studies of infinity understanding, by Evans (Reference Evans1983) and Evans and Gelman (Reference Evans and Gelman1982), Cheung asked children if there exists a highest number, and also whether it is always possible to add 1 to any number. Like Evans and Gelman (Reference Evans and Gelman1982), she found that children exhibited adult-like intuitions – that numbers never end – only around the age of five-and-a-half or six – around the same time that they realize that for any number n, the successor of n is equal to n + 1 (see also Harnett & Gelman, Reference Hartnett and Gelman1998).
Still, the data from Cheung et al. (Reference Cheung, Rubenson and Barner2017) do not address how children learn that every natural number has a successor. Given the lack of evidence for nativist options thus far, my lab has been exploring the possibility that children learn the successor principle according to an inductive inference like that proposed by Carey (Reference Carey2009) and others, but that the inputs to this inference extend well beyond the numbers one, two, and three. Our basic idea breaks down into three parts. First, we propose that children require substantial item-based experience regarding the relations between many large numbers before they can make the induction to all possible numbers. Second, we propose that the inductive inference is highly constrained by the child's discovery that the count list describes differences between large approximate magnitudes, and that these magnitudes are clearly ordered and unbounded in size. Third, and finally, we propose that children infer that the number words themselves are unbounded when they discover the structure of the count list, and that numbers are generated by a recursive base system.
A key observation guiding this proposal is that, to succeed at Sarnecka and Carey's (Reference Sarnecka and Carey2008) successor task, children must begin with a known cardinality – e.g. that a set is equal to four – and then compute a new cardinality after one object is added. This computation closely resembles what mathematics education researchers label ‘counting on’ (Groen & Resnick, Reference Groen and Resnick1977; Fuson, Reference Fuson, Carpenter, Moser and Romsberg1982; Fuson & Hall, Reference Fuson, Hall and Ginsburg1982; Secada, Fuson & Hall, Reference Secada, Fuson and Hall1983). Initially, when children are shown two sets (e.g. 5 and 3), and are told that the first set contains five, they nevertheless re-count the 5 items, followed by the remaining 3 – a strategy referred to as ‘counting all’. To ‘count on’, children can either begin by labeling the set of 5 as five and then continuing – e.g. six, seven, eight – or they can simply begin with six. Remarkably, this ability is acquired quite late – generally in first grade at the age of five or six – and is difficult to train, requiring multiple sessions over several weeks (see Secada et al., Reference Secada, Fuson and Hall1983). Not only does ‘counting on’ require the acquisition of new sub-routines – e.g. counting from arbitrary points in the count list – but it also requires suppressing the prepotent disposition to blindly count all, beginning from one. For this reason, training strategies which remove the possibility of counting all – e.g. by occluding the first set or providing only a verbal label for it – may be most effective (for some evidence for this, see Secada et al., Reference Secada, Fuson and Hall1983).
These facts suggest that children might learn that the successor of n is n + 1 by repeatedly trying to solve exactly this problem, over and over again, as part of learning formal rules of addition. Still, even if such learning did inform the discovery of the successor principle, substantial problems would remain. As Rips, Asmuth, and Bloomfield (Reference Rips, Asmuth and Bloomfield2006) point out, the type of data that children get regarding counting is not sufficiently strong to differentiate between different types of possible inductive inferences. For example, Rips et al., argue that, even given item-based knowledge of the successors for all numbers up to 50, children might still infer that the next number denotes a cardinality of 1 (much as a clock cyclically returns to 1 after 12). Nothing in a finite set of item-based math facts could alone justify an inductive inference that numbers continue to infinity. Likewise, knowledge of individual math facts – e.g. that fifty-two + one equals fifty-three – could not explain the origin of the logical vocabulary that could express more general principles like the successor function – that for every n, its successor is n + 1. This is because this logical vocabulary is more general – and thus more powerful – than the facts that it describes. No current inductive learning model can explain the origin of more powerful logical structures (Fodor, Reference Fodor and Piattelli-Palmarini1980).
Here, I return to the premise with which I began this paper, that counting was created by humans historically to account for an explanatory gap – an intuition that there exist precise sets of things in the world, composed of discrete individuals – as well as a noisy perceptual system for representing magnitudes of any size. Though there is dispute regarding when children begin to associate numerals with approximate magnitudes, it is uncontroversial that this has begun in earnest before the age of five (e.g. see Le Corre & Carey, Reference Le Corre and Carey2007; Wagner & Johnson, Reference Wagner and Johnson2011; Davidson et al., Reference Davidson, Eng and Barner2012; Gunderson et al., Reference Gunderson, Spaepen and Levine2015; Odic, Lisboa, Eisinger, Olivera, Maiche & Halberda, Reference Odic, Lisboa, Eisinger, Olivera, Maiche and Halberda2016; Wagner, Chu & Barner, unpublished data). Although it seems unlikely that approximate magnitudes could define the content of the positive integers – since the approximate number system lacks any logical symbols in which generalizations might be stated (Laurence & Margolis, Reference Laurence, Margolis, Carruthers, Laurence and Stich2005) – it remains possible that it could constrain inductive inference if children assume that the meanings of the numerals are in some way isomorphic to states of the approximate number system (as per Gallistel & Gelman, Reference Gallistel and Gelman1992). Specifically, if children know that the ANS represents a monotonically increasingly set of magnitudes, and that the count list is meant to explain this ordered set, then they could restrict their hypotheses regarding the logic of counting to only those models that result in a monotonically increasing set of precise cardinalities.
Still, even given this important constraint, the problem remains of explaining how children might infer that numbers are infinite, and that every number has a successor. Children are only exposed to a finite count list, and in many languages the adult count list is in fact finite, such that no inductive inference of the successor principle is licensed. Further, as already noted, most children appear to initially assume that numbers are in fact finite, and explicitly say so when asked. These facts suggest that something else is required to explain children's insight – at around six years of age – that numbers are infinite. One possible source of this insight, which my lab is currently exploring, is children's growing familiarity with the recursive structure of counting. In English, as in many languages, counting observes a recursive base 10 structure, though many other base systems are also attested, from base 2 to vigesimal systems (i.e. base 20; Hammarström, Reference Hammarström, Wohlgemuth and Cysouw2010). Understanding that the count list is generated by a recursive system might itself provide evidence to children that it is, in principle, unbounded in size. Although the first twenty numbers in English provide little evidence for repeating structure, as children learn to count beyond 30 they gain increasing evidence for the base 10 system. After 100, they learn that the system is truly recursive, and that the entire count list from 1 to 100 can be recycled for labeling larger numbers. Currently our lab is attempting to understand how these forms of evidence might inform children's insight that numbers never end, though evidence from past work provides some preliminary clues. For example, Cheung et al.’s (Reference Cheung, Rubenson and Barner2017) study finds that children who perform at ceiling on the successor task and who know numbers are infinite can count to a minimum of 80 on average. If children's insight regarding the successor principle is driven by understanding the recursive structure of the count list, they may require evidence from multiple iterations of the decade structure to make this inference, though perhaps not evidence that the entire list from 1 to 99 can be recursively embedded under 100. Consistent with this, Yang (Reference Yang2016) reports that once children can count to approximately 70, they generally can count upward indefinitely, suggesting that they have identified the generative structure of the count list, and can apply it to generate ever-larger numbers.
Thus far I have addressed the types of experience that children might require to make an induction regarding the successor principle, but not the nature of the representations over which this induction is made (i.e. the logical hypothesis space). Here, I draw on the same logical representations invoked to explain the acquisition of one, two, and three – i.e. representations that are much more general than the domain-specific logics posited by nativist proposals like Leslie et al. (Reference Leslie, Gelman and Gallistel2008), but which nevertheless fully satisfy the requirements of learning the successor function. First, this logic – which we know children must master to learn quantifiers and grammatical marking of singular and plural forms – presupposes the existence of atomic individuals, sets, properties of sets, quantifiers, and set-relational operations like union and equality. Without this kind of logic, meanings for quantifiers like all and some or connectives like and could not get off the ground. Critically, beyond allowing us to state the basic rules of equality (e.g. that if the successors of two numbers are equal, then those two numbers are also equal), such a logic also provides a hypothesis space in which the successor function can be learned, such that it need be innate, but can nevertheless be described by more general logical representations that are required to explain simpler linguistic phenomena.
To acquire the successor function, children must observe that for any number, n, the next number in the count list is equal to n + 1. To acquire this, the child would require (i) representations of sets composed of atomic individuals, and (ii) an ordering of these representations, as in Figure 3. As noted by, e.g. Partee et al. (Reference Partee, ter Meulen and Wall2012), this ordering of sets of individuals implicitly represents successor relations, since sets in the ordering differ by a quantity of exactly one individual (given that atomic individuals are primitive) and are partially ordered (as per Link, Reference Link, Portner and Partee2002). To represent this relation in natural language, the child would therefore need to make this relation explicit, by noticing how the ordered set of number words maps onto the ordered set of sets. This in no way means that the successor function is innate, since many different semantic functions can be described in a set-based semantics, including variants on the successor function (e.g. where each numeral denotes only even numbers, or only odd numbers). Significant inductive learning is still required. However, no new logical representations are required. Given this, it is possible to specify something less than a full-fledged innate, domain-specific, successor function without requiring that the logical resource required for counting be entirely constructed de novo. Instead, the very same logical resources that support other aspects of natural language – like singular–plural morphology – can be used as a hypothesis space for learning the semantic relations between numerals.
CONCLUSION
In this paper I have proposed that, when children learn to count, they acquire a system that explains perception, but which is not composed of perceptual building blocks. As part of this, I have pushed against both nativist and constructivist theories of number word learning, each of which assumes that perceptual representations of some sort – whether objects or magnitudes – are building blocks of number word meanings.
A key piece of my argument has involved dissociating the acquisition of small number words from the acquisition of counting. These are two distinct problems. Against most constructivist accounts, the logic of counting is not inferred from knowledge of small number words. And against nativists, there is not a single innate logic that defines all number words from the beginning. Instead, I've argued that one, two, and three are learned using the same logic of atoms and pluralities that supports the acquisition of number morphology, and does not require conceptual change: these concepts are innate. Meanwhile, counting is learned as children acquire a series of blind procedures, which remain relatively blind until around the age of six, around the time that they receive formal training on ‘counting on’, and also have sufficient counting experience to know that the count list exhibits a recursive structure capable of generating an unbounded set of labels.
Historically, counting emerged from tally systems, which were designed to fill an explanatory gap. It makes sense to design a tally system only if the designer is able to recognize that precise quantities exist in the world in the first place and that these sets cannot be reliably enumerated via perception. Perception is not only imprecise, but is transient and subjective, making it a poor tool for tracking debts, where multiple parties are involved and disagreement is likely. Earlier generations of humans repeatedly recognized these shortcomings, and understood that beyond the noisy veil of perception existed a world of discrete individual things, worthy of precise enumeration. Historically, counting didn't emerge from this noisy veil, but in spite of it. Likewise, children do not learn the meanings of number words – or the logic of counting – from noisy perceptual systems. Instead, counting is learned first as a blind procedure, and only becomes reliably mapped to the perception of magnitudes late in childhood, when children learn to make analogical mappings between counting and magnitudes. In this way, counting provides a system for reasoning about magnitudes that otherwise would remain inscrutable to humans, and thus opens up a world of reasoning and discovery that is impossible with perception alone.