Perceptual (roots of) core knowledge

Brian J. Scholl

doi:10.1017/S0140525X23003023

Perceptual (roots of) core knowledge

Published online by Cambridge University Press: 27 June 2024

Brian J. Scholl

Show author details

Brian J. Scholl*: Affiliation:
Department of Psychology, Yale University, New Haven, CT, USA brian.scholl@yale.edu https://perception.yale.edu/
*: *Corresponding author.

Article contents

Abstract
Financial support
Competing interest
References

Rights & Permissions

Abstract

Some core knowledge may be rooted in – or even identical to – well-characterized mechanisms of mid-level visual perception and attention. In the decades since it was first proposed, this possibility has inspired (and has been supported by) several discoveries in both infant cognition and adult perception, but it also faces several challenges. To what degree does What Babies Know reflect how babies see and attend?

Type: Open Peer Commentary
Information: Behavioral and Brain Sciences , Volume 47 , 2024 , e140

DOI: https://doi.org/10.1017/S0140525X23003023 [Opens in a new window]
Copyright: Copyright © The Author(s), 2024. Published by Cambridge University Press

Introduction: What babies see?

As the various subfields of cognitive science have become ever more distinct and specialized, the notion of core knowledge has acted as a sort of intellectual glue – synergizing research from its intellectual origins in developmental psychology, to studies of animal cognition, adult visual perception, linguistic representation, computational modeling and AI, and beyond. Here I focus on one particular form of synergy, between What Babies Know (Elizabeth Spelke's brilliant and groundbreaking book summarizing one of the most productive research programs in all of science; Spelke, Reference Spelke2022; henceforth WBK) and the study of what and how we see (as explored in studies of adult visual perception and attention).

What kinds of mental representations and processes characterize core knowledge? Once upon a time, the answer was unambiguous: Higher-level thought. As Spelke once suggested, “Humans come to know about an object's unity, boundaries, and persistence in ways like those by which we come to know about its material composition or its market value” (Spelke, Reference Spelke and Yonas1988, p. 198). This view was inspired by a (now-obsolete) characterization of perception as relatively unsophisticated. Chapter 2 of WBK, for example, involves discrete objects, but as Spelke once suggested: “Perceptual systems do not package the world into units…. The parsing of the world into things may point to the essence of thought and to its essential distinction from perception” (Reference Spelke and Yonas1988, p. 229). And since infants continue to represent objects that are not currently in view, the responsible mechanisms must therefore “carry infants beyond the world of immediate perception” (p. 172). By the 1990s, however, advances in the study of adult perception had made it clear that visual processing does in fact “package the world into units” on its own, independent of higher-level thought – into representations of both surfaces (for an early review, see Nakayama, He, & Shimojo, Reference Nakayama, He, Shimojo and Kosslyn1995) and objects (for an early review, see Scholl, Reference Scholl2001), which then persist through time, occlusion, and featural change (for a review, see Scholl, Reference Scholl2007).

These discoveries led to a proposal, first articulated in the late 1990s and early 2000s (Leslie, Xu, Tremoulet, & Scholl, Reference Leslie, Xu, Tremoulet and Scholl1998; Scholl, Reference Scholl2001, Section 7.2; Scholl & Leslie, Reference Scholl, Leslie, Lepore and Pylyshyn1999; see also Carey & Xu, Reference Carey and Xu2001) that at least some types of core knowledge may be rooted in the mechanisms and representations of mid-level visual processing and object-based attention (henceforth mid-level vision). Infants may have expectations about the behaviors of objects not because of considered deliberation or conceptual theories, but because that is simply how they experience the world in the first place, in terms of their brute visual percepts. “[S]urprising parallels between recent results in cognitive developmental psychology and the study of object-based visuospatial attention suggest that the two areas of inquiry may have something to do with each other” (Scholl & Leslie, Reference Scholl, Leslie, Lepore and Pylyshyn1999, p. 60) – and although “visual processing in adults may seem relatively unrelated to the study of core knowledge in infant cognition, … recent work has suggested that these two seemingly different fields may in fact be studying the same underlying representations and constraints” (Strickland & Scholl, Reference Strickland and Scholl2015, p. 571).

Progress: Sophisticated seeing!

The ultimate value of any theoretical proposal lies in the concrete progress it inspires. How has the proposal that core knowledge is rooted in mid-level vision fared in the decades since it was first introduced? Here are three examples of how this view has fueled new discoveries in both domains:

Cohesion and persistence: Many core knowledge principles apply to objects but not non-solid substances, and early work showed how cohesion violations (failures to maintain rigid boundaries and internal connectedness) frustrate infants' object tracking (e.g., Huntley-Fenner, Carey, & Solimando, Reference Huntley-Fenner, Carey and Solimando2002; Spelke & Van de Walle, Reference Spelke, Van de Walle, Eilan, McCarthy and Brewer1993). This led directly to the discoveries in adult vision that cohesion violations also frustrate attentional tracking (vanMarle & Scholl, Reference vanMarle and Scholl2003) and the maintenance of object-file representations – even when just viewing a single object split into two (Mitroff, Scholl, & Wynn, Reference Mitroff, Scholl and Wynn2004). And this adult vision work then directly inspired the demonstration that even a single object (e.g., a cracker) splitting into two destroys infants' ability to track quantity (Cheries, Mitroff, Wynn, & Scholl, Reference Cheries, Mitroff, Wynn and Scholl2008).

Attentional prioritization: Categorizing a stimulus into a particular “event type” (such as occlusion or containment) biases infants to remember features that are especially diagnostic for that type (such as the width of an object, with a vertical container; e.g., Hespos & Baillargeon, Reference Hespos and Baillargeon2001; Wang, Baillargeon, & Brueckner, Reference Wang, Baillargeon and Brueckner2004). This led directly to the discovery that such prioritization also occurs spontaneously in adults' visual working memory: While viewing dynamic containment (but not occlusion) events, change detection is better for those changes in that affect whether objects will “fit” (Strickland & Scholl, Reference Strickland and Scholl2015) – and the subtle details of this were then subsequently also seen in infants' object tracking (Goldman & Wang, Reference Goldman and Wang2019).

Seeing agency: Chapter 7 of WBK reviews many studies showing how infants automatically treat certain motion patterns (e.g., involving pursuit) as cues to agency and intentionality – and how they expect agents to behavior rationally, for example by following direct paths (e.g., Gergeley, Nadasdy, Csibra, & Biro, Reference Gergeley, Nadasdy, Csibra and Biro1995; Southgate & Csibra, Reference Southgate and Csibra2009). This led directly to the discovery that adults' mid-level vision also spontaneously (and even irresistibly) extracts properties such as agency and goal-directedness when viewing “chasing” displays (Gao, Newman, & Scholl, Reference Gao, Newman and Scholl2009, Reference Gao, McCarthy and Scholl2010; van Buren, Uddenberg, & Scholl, Reference van Buren, Uddenberg and Scholl2016) – and that violations of rational action similarly destroy adults' ability to spontaneously see chasing (Gao & Scholl, Reference Gao and Scholl2011).

These examples demonstrate how taking connections between infant cognition and adult perception seriously can drive empirical progress – showing how these two domains employ similar representations (e.g., of agency), are constrained by similar principles (e.g., of cohesion), and have similar downstream consequences (e.g., of orienting attention). At the least, such parallels suggest that one domain may help to fuel the other – that core knowledge may be rooted in, and partially grow out of, mid-level vision. At the most extreme, such connections suggest that these two domains could be one and the same.

Challenges: Prosociality, language, and beyond

The essence of the progress reviewed above is a striking match between the results of experiments in infant cognition and adult perception. And such matches may go far beyond these three case studies (Bai, Reference Bai2023), extending even into the nuances and mechanics of habituation itself (Turk-Browne, Scholl, & Chun, Reference Turk-Browne, Scholl and Chun2008). But just how close is this match? The biggest challenges to the view sketched above may lie in cases where the match is imperfect, in either direction.

The suggestion that core knowledge in infancy transcends mid-level vision in adults seems especially salient for at least two domains, each of which is the focus of a key chapter of WBK. First, as reviewed in Chapter 8, young infants may already have expectations and preferences related to prosociality – as when they observe one shape help (or hinder) another shape from climbing an incline (Hamlin, Wynn, & Bloom, Reference Hamlin, Wynn and Bloom2007). But no work has yet suggested that visual processing itself directly extracts representations of helping, hindering, or prosociality in general (even though properties such as [im]morality in certain visual scenes may be correlated with lower-level cues; De Freitas & Alvarez, Reference De Freitas and Alvarez2018). Second, as reviewed in Chapter 9, several aspects of core knowledge seem intimately related to language. Even infants' object tracking, for example, can depend on how people linguistically refer to the objects (Dewar & Xu, Reference Dewar and Xu2007; Xu, Reference Xu2002). But mid-level vision seems largely encapsulated from linguistic processing, and vice versa (Firestone & Scholl, Reference Firestone and Scholl2016) – and so if core knowledge reflects the operation of mid-level vision, then such linguistic connections may be rendered mysterious or inexplicable.

Potential mismatches may also loom large in the other direction – when adults' mid-level vision seems more sophisticated than infants' core knowledge. The studies reviewed in the previous section are all examples in which visual representations have been found to be especially sophisticated – encompassing properties and constraints (such as agency and cohesion) more closely associated with higher-level thought. But this trend in vision research goes far beyond the classical domains of core knowledge. Additional work, for example, has suggested that mid-level vision automatically and spontaneously extracts representations of causal history (i.e., of how objects came to look the way that they do; Chen & Scholl, Reference Chen and Scholl2016), soft-material intuitive physics (e.g., inferring the shape of objects under cloths; Wong, Bi, Soltani, Yildirim, & Scholl, Reference Wong, Bi, Soltani, Yildirim and Scholl2023), and even unfinishedness (as when an object appears not to have ended its movement; Ongchoco, Wong, & Scholl, Reference Ongchoco, Wong and Scholl2023). But competence involving such seemingly sophisticated domains is nowhere to be found in most characterizations of core knowledge.

On one hand, some of these challenges could be dissolved with further research. After all, when the current proposal was first articulated in the late 1990s, nobody yet suspected that mid-level vision might match infant cognition in the ways reviewed in the previous section. And so we might still discover that mid-level vision extracts representations of prosociality, or that infant core knowledge also encompasses representations of causal history. On the other hand, some of these challenges remain despite having been recognized long ago (e.g., Scholl & Leslie, Reference Scholl, Leslie, Lepore and Pylyshyn1999, Section 5.5) – and without a principled way to demarcate which results we expect to “match” and which we do not (e.g., only spatiotemporal processing, but not contact-mechanical processing; Cheries, Mitroff, Wynn, & Scholl, Reference Cheries, Mitroff, Wynn, Scholl, Hood and Santos2009; Scholl & Leslie, Reference Scholl, Leslie, Lepore and Pylyshyn1999), these challenges continue to be acute.

The state of the art

The view of core knowledge sketched here contrasts in some ways with that from WBK. On one hand, Spelke notes that “I believe there is truth to this view” that “the object representations [involved in core knowledge] are the products of perceptual processes” (p. 78) – and throughout the book she masterfully reviews relevant work on adults' mid-level vision (including much of the work discussed here). She also notes that her views on these issues have changed over time: “I once proposed, wrongly, that objects are not grasped by a perceptual system but by … a system of central cognition…. Research … provided decisive evidence against this proposal…: Adults were found to share the representational system found in infants” (precis, sect. 1). As such, the view sketched here is meant as more of a friendly extension than a criticism – perhaps just placing a sharper focus on certain themes from WBK.

On the other hand, WBK also provides several additional arguments against this view, which seem less compelling. Spelke suggests that some aspects of core knowledge cannot have perceptual origins because (a) core knowledge representations are abstract (p. xxi) – but many aspects of mid-level vision also abstract over many surface variables (Scholl, Reference Scholl2007); (b) perception involves “detectable surfaces” rather than “the entities that those surfaces belong to” (p. 198) – but recent work in mid-level vision argues for exactly the opposite view (Wong et al., Reference Wong, Bi, Soltani, Yildirim and Scholl2023); and (c) core knowledge representations have a time-course that transcends momentary perception (p. 78) – but at least some object-file representations in mid-level vision have been shown to persist throughout interruptions for at least 8 seconds, and possibly much longer (a result that was also explicitly motivated by connections to infant cognition; Noles, Scholl, & Mitroff, Reference Noles, Scholl and Mitroff2005).

Some of the most recent discussions of core knowledge also still seem to veer away from the possibility of substantive connections to mid-level vision in other ways. In WBK, for example, Spelke champions the notion of the infant mind as implementing a type of “physics engine” (Ullman, Spelke, Battaglia, & Tenenbaum, Reference Ullman, Spelke, Battaglia and Tenenbaum2017) – but such frameworks do not necessarily abide by the constraints of encapsulated mid-level vision, as they also readily accommodate higher-level knowledge (e.g., about how the colors of blocks may arbitrarily signal their masses; Battaglia, Hamrick, & Tenenbaum, Reference Battaglia, Hamrick and Tenenbaum2013). As a result, the physics-engine framework may be true and important – but also simply orthogonal to the distinction between mid-level vision and higher-level thought. And whereas WBK ultimately characterizes core knowledge (somewhat ambiguously) as occupying “a middle ground between perceptual systems and belief systems” (p. xxi), I hope that we might also continue to take seriously the more direct possibility that core knowledge is (rooted in) mid-level vision – and that What Babies Know may largely reflect how babies see and attend.

Financial support

This research received no specific grant from any funding agency, commercial, or not-for-profit sectors.

Competing interest

None.

References

Bai, D. (2023). Intuitive physics in visual perception. PhD dissertation, l'Ecole Normale Supérieure.Google Scholar

Battaglia, P. W., Hamrick, J. B., & Tenenbaum, J. B. (2013). Simulation as an engine of physical scene understanding. Proceedings of the National Academy of Sciences, 110, 18327–18332.CrossRef Google Scholar PubMed

Carey, S., & Xu, F. (2001). Infants knowledge of objects: Beyond object-files and object tracking. Cognition, 80, 179–213.CrossRef Google Scholar PubMed

Chen, Y.-C., & Scholl, B. J. (2016). The perception of history: Seeing causal history in static shapes induces illusory motion perception. Psychological Science, 27, 923–930.CrossRef Google Scholar PubMed

Cheries, E. W., Mitroff, S. R., Wynn, K., & Scholl, B. J. (2008). Cohesion as a constraint on object persistence in infancy. Developmental Science, 11, 427–432.CrossRef Google Scholar PubMed

Cheries, E. W., Mitroff, S. R., Wynn, K., & Scholl, B. J. (2009). Do the same principles constrain persisting object representations in infant cognition and adult perception?: The cases of continuity and cohesion. In Hood, B. & Santos, L. (Eds.), The origins of object knowledge (pp. 107–134). Oxford University Press.CrossRef Google Scholar

De Freitas, J., & Alvarez, G. (2018). Your visual system provides all the information you need to make moral judgments about generic visual events. Cognition, 178, 133–146.CrossRef Google Scholar PubMed

Dewar, K., & Xu, F. (2007). Do 9-month-old infants expect distinct words to refer to kinds? Developmental Psychology, 43, 1227–1238.CrossRef Google Scholar PubMed

Firestone, C., & Scholl, B. J. (2016). Cognition does not affect perception: Evaluating the evidence for “top-down” effects. Behavioral and Brain Sciences, 39(E229), 1–77.CrossRef Google Scholar

Gao, T., McCarthy, G., & Scholl, B. J. (2010). The wolfpack effect: Perception of animacy irresistibly influences interactive behavior. Psychological Science, 21, 1845–1853.CrossRef Google Scholar PubMed

Gao, T., Newman, G. E., & Scholl, B. J. (2009). The psychophysics of chasing: A case study in the perception of animacy. Cognitive Psychology, 59, 154–179.CrossRef Google Scholar PubMed

Gao, T., & Scholl, B. J. (2011). Chasing vs. stalking: Interrupting the perception of animacy. Journal of Experimental Psychology: Human Perception & Performance, 37, 669–684.Google Scholar PubMed

Gergeley, G., Nadasdy, Z., Csibra, G., & Biro, S. (1995). Taking the intentional stance at 12 months of age. Cognition, 56, 165–193.CrossRef Google Scholar

Goldman, E. J., & Wang, S.-h. (2019). Comparison facilitates the use of height information by 5-month-olds in containment events. Developmental Psychology, 55, 2475–2482.CrossRef Google Scholar PubMed

Hamlin, J. K., Wynn, K., & Bloom, P. (2007). Social evaluation by preverbal infants. Nature, 450, 557–559.CrossRef Google Scholar PubMed

Hespos, S. J., & Baillargeon, R. (2001). Infants’ knowledge about occlusion and containment events: A surprising discrepancy. Psychological Science, 12, 141–147.CrossRef Google Scholar PubMed

Huntley-Fenner, G., Carey, S., & Solimando, A. (2002). Objects are individuals but stuff doesn't count: Perceived rigidity and cohesiveness influence infants’ representations of small numbers of discrete entities. Cognition, 85, 203–221.CrossRef Google Scholar

Leslie, A. M., Xu, F., Tremoulet, P., & Scholl, B. J. (1998). Indexing and the object concept: Developing “what” and “where” systems. Trends in Cognitive Sciences, 2, 10–18.CrossRef Google Scholar

Mitroff, S. R., Scholl, B. J., & Wynn, K. (2004). Divide and conquer: How object files adapt when a persisting object splits into two. Psychological Science, 15, 420–425.CrossRef Google Scholar

Nakayama, K., He, Z. J., & Shimojo, S. (1995). Visual surface representation: A critical link between lower-level and higher-level vision. In Kosslyn, S. (Ed.), Visual cognition (pp. 1–70). Volume 2 of An Invitation to Cognitive Science, 2nd ed. MIT Press.Google Scholar

Noles, N. S., Scholl, B. J., & Mitroff, S. R. (2005). The persistence of object file representations. Perception & Psychophysics, 67, 324–334.CrossRef Google Scholar PubMed

Ongchoco, J. D. K., Wong, K., & Scholl, B. J. (2023). The “unfinishedness” of dynamic events is spontaneously extracted in visual processing: A new “Visual Zeigarnik Effect”. Journal of Vision, 23, 4974.CrossRef Google Scholar

Scholl, B. J. (2001). Objects and attention: The state of the art. Cognition, 80, 1–46.CrossRef Google Scholar PubMed

Scholl, B. J. (2007). Object persistence in philosophy and psychology. Mind & Language, 22, 563–591.CrossRef Google Scholar

Scholl, B. J., & Leslie, A. M. (1999). Explaining the infant's object concept: Beyond the perception/cognition dichotomy. In Lepore, E. & Pylyshyn, Z. (Eds.), What is cognitive science? (pp. 26–73). Blackwell.Google Scholar

Southgate, V., & Csibra, G. (2009). Inferring the outcome of an ongoing novel action at 13 months. Developmental Psychology, 45, 1794–1798.CrossRef Google Scholar PubMed

Spelke, E. (1988). Where perceiving ends and thinking begins: The apprehension of objects in infancy. In Yonas, A. (Ed.), Perceptual development in infancy (pp. 197–234). Erlbaum.Google Scholar

Spelke, E. (2022). What babies know. Oxford University Press.CrossRef Google Scholar

Spelke, E., & Van de Walle, G. (1993). Perceiving and reasoning about objects: Insights from infants. In Eilan, N., McCarthy, R. & Brewer, W. (Eds.), Spatial representation (pp. 132–161). Basil Blackwell.Google Scholar

Strickland, B., & Scholl, B. J. (2015). Visual perception involves “event type” representations: The case of containment vs. occlusion. Journal of Experimental Psychology: General, 144, 570–580.CrossRef Google Scholar

Turk-Browne, N. B., Scholl, B. J., & Chun, M. M. (2008). Babies and brains: Habituation in infant cognition and functional neuroimaging. Frontiers in Human Neuroscience, 2, Article 16.Google Scholar PubMed

Ullman, T. D., Spelke, E. S., Battaglia, P., & Tenenbaum, J. B. (2017). Mind games: Game engines as an architecture for intuitive physics. Trends in Cognitive Sciences, 21, 649–665.CrossRef Google Scholar PubMed

van Buren, B., Uddenberg, S., & Scholl, B. J. (2016). The automaticity of perceiving animacy: Goal-directed motion in simple shapes influences visuomotor behavior even when task-irrelevant. Psychonomic Bulletin & Review, 23, 797–802.CrossRef Google Scholar PubMed

vanMarle, K., & Scholl, B. J. (2003). Attentive tracking of objects versus substances. Psychological Science, 14, 498–504.CrossRef Google Scholar PubMed

Wang, S., Baillargeon, R., & Brueckner, L. (2004). Young infants’ reasoning about hidden objects: Evidence from violation-of-expectation tasks with test trials only. Cognition, 93, 167–198.CrossRef Google Scholar PubMed

Wong, K. W., Bi, W., Soltani, A., Yildirim, I., & Scholl, B. J. (2023). Seeing soft materials draped over objects: A case study of intuitive physics in perception, attention, and memory. Psychological Science, 34, 111–119.CrossRef Google Scholar PubMed