Why psychologists should embrace rather than abandon DNNs

Galit Yovel; Naphtali Abudarham

doi:10.1017/S0140525X2300167X

Why psychologists should embrace rather than abandon DNNs

Published online by Cambridge University Press: 06 December 2023

Galit Yovel

and

Naphtali Abudarham

Show author details

Galit Yovel: Affiliation:
School of Psychological Sciences, Tel Aviv University, Tel Aviv, Israel gality@tauex.tau.ac.il; https://people.socsci.tau.ac.il/mu/galityovel/ naphtool@gmail.com Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, Israel
Naphtali Abudarham: Affiliation:
School of Psychological Sciences, Tel Aviv University, Tel Aviv, Israel gality@tauex.tau.ac.il; https://people.socsci.tau.ac.il/mu/galityovel/ naphtool@gmail.com

Article contents

Abstract
Financial support
Competing interest
References

Rights & Permissions

Abstract

Deep neural networks (DNNs) are powerful computational models, which generate complex, high-level representations that were missing in previous models of human cognition. By studying these high-level representations, psychologists can now gain new insights into the nature and origin of human high-level vision, which was not possible with traditional handcrafted models. Abandoning DNNs would be a huge oversight for psychological sciences.

Type: Open Peer Commentary
Information: Behavioral and Brain Sciences , Volume 46 , 2023 , e414

DOI: https://doi.org/10.1017/S0140525X2300167X [Opens in a new window]
Copyright: Copyright © The Author(s), 2023. Published by Cambridge University Press

Computational modeling has long been used by psychologists to test hypotheses about human cognition and behavior. Prior to the recent rise of deep neural networks (DNNs), most computational models were handcrafted by scientists who determined their parameters and features. In vision sciences, these models were used to test hypotheses about the mechanisms that enable human object recognition. However, these handcrafted models used simple, engineered-designed features (e.g., Gabor wavelets), which produced low-level representations that did not account for human-level, view-invariant object recognition (Biederman & Kalocsai, Reference Biederman and Kalocsai1997; Turk & Pentland, Reference Turk and Pentland1991). The main advantage of DNNs over these traditional models is not only that they reach human-level performance in object recognition, but that they do so through hierarchical processing of the visual input that generates high-level, view-invariant visual features. These high-level features are the “missing link” between the low-level representations of the hand crafted models and human-level object classification. They therefore offer psychologists an unprecedented opportunity to test hypotheses about the origin and nature of these high-level representations, which were not available for exploration so far.

In the target article, Bowers et al. propose that psychologists should abandon DNNs as models of human vision, because they do not produce some of the perceptual effects that are found in humans. However, many of the listed perceptual effects that DNNs fail to produce are also not produced by the traditional handcrafted computational vision models, which have been prevalently used to model human vision. Furthermore, although current DNNs are primarily developed for engineering purposes (i.e., best performance), there are myriad of ways in which they could and should be modified to better resemble the human mind. For example, current DNNs that are often used to model human face and object recognition (Khaligh-Razavi & Kriegeskorte, 2014; O'Toole & Castillo, Reference O'Toole and Castillo2021; Yamins & DiCarlo, Reference Yamins and DiCarlo2016) are trained on static images (Cao, Shen, Xie, Parkhi, & Zisserman, Reference Cao, Shen, Xie, Parkhi and Zisserman2018; Deng et al., Reference Deng, Dong, Socher, Li, Li and Fei-Fei2009), whereas human face and object recognition are performed on continuous streaming of dynamic, multi-modal information. One way that has recently been suggested to close this gap is to train DNNs on movies that are generated by head-mounted cameras attached to infants’ forehead (Fausey, Jayaraman, & Smith, Reference Fausey, Jayaraman and Smith2016), to better model the development of human visual systems (Smith & Slone, Reference Smith and Slone2017). Training DNNs initially on blurred images also provided insights into the potential advantage of the initial low acuity of infants’ vision (Vogelsang et al., Reference Vogelsang, Gilad-Gutnick, Ehrenberg, Yonas, Diamond, Held and Sinha2018). Such and many other modifications (e.g., multi-modal self-supervised image-language training, Radford et al., Reference Radford, Kim, Hallacy, Ramesh, Goh, Agarwal and Sutskever2021) in the way DNNs are built and trained may generate perceptual effects that are more human-like (Shoham, Grosbard, Patashnik, Cohen-Or, & Yovel, Reference Shoham, Grosbard, Patashnik, Cohen-Or and Yovel2022). Yet even current DNNs can advance our understanding of the nature of the high-level representations that are required for face and object recognition (Abudarham, Grosbard, & Yovel, Reference Abudarham, Grosbard and Yovel2021; Hill et al., Reference Hill, Parde, Castillo, Colón, Ranjan, Chen and O'Toole2019), which are still undefined in current neural and cognitive models. This significant computational achievement should not be dismissed.

Bowers et al. further claim that DNNs should be used to test hypotheses rather than to solely make predictions. We fully agree and further propose that psychologists are best suited to apply this approach by utilizing the same procedures they have used for decades to test hypotheses about the hidden representations of the human mind. Since the early days of psychological sciences, psychologists have developed a range of elegant experimental and stimulus manipulations to study human vision. The same procedures can now be used to explore the nature of DNNs’ high-level hidden representations as potential models of the human mind (Ma & Peters, Reference Ma and Peters2020). For example, the face inversion effect is a robust, extensively studied, and well-established effect in human vision, which refers to the disproportionally large drop in performance that humans show for upside-down compared to upright faces (Cashon & Holt, Reference Cashon and Holt2015; Farah, Tanaka, & Drain, Reference Farah, Tanaka and Drain1995; Yin, Reference Yin1969). Because the low-level features extracted by, handcrafted algorithms are similar for upright and inverted faces, these traditional models do not reproduce this effect. Interestingly, a human-like face inversion effect that is larger than an object inversion effect is found in DNNs (Dobs, Martinez, Yuhan, & Kanwisher, Reference Dobs, Martinez, Yuhan and Kanwisher2022; Jacob, Pramod, Katti, & Arun, Reference Jacob, Pramod, Katti and Arun2021; Tian, Xie, Song, Hu, & Liu, Reference Tian, Xie, Song, Hu and Liu2022; Yovel, Grosbard, & Abudarham, Reference Yovel, Grosbard and Abudarham2023). Thus, we can now use the same stimulus and task manipulations that were used to study this effect in numerous human studies, to test hypotheses about the mechanism that may underlie this perceptual effect. Moreover, by manipulating DNNs’ training diet, we can examine what type of experience is needed to generate this human-like perceptual effect, which is impossible to test in humans where we have no control over their perceptual experience. Such an approach has recently been used to address a long-lasting debate in cognitive sciences about the domain-specific versus the expertise hypothesis in face recognition (Kanwisher, Gupta, & Dobs, Reference Kanwisher, Gupta and Dobs2023; Yovel et al., Reference Yovel, Grosbard and Abudarham2023).

It was psychologists, not engineers, who first designed these neural networks to model human intelligence (McClelland, McNaughton, & O'Reilly, Reference McClelland, McNaughton and O'Reilly1995; Rosenblatt, Reference Rosenblatt1958; Rumelhart, Hinton, & Williams, Reference Rumelhart, Hinton and Williams1986). It took more than 60 years since the psychologist, Frank Rosenblatt published his report about the perceptron, for technology to reach its present state where these hierarchically structured algorithms can be used to study the complexity of human vision. Abandoning DNNs would be a huge oversight for cognitive scientists, who can contribute considerably to the development of more human-like DNNs. It is therefore pertinent that psychologists join the artificial intelligence (AI) research community and study these models in collaboration with engineers and computer scientists. This is a unique time in the history of cognitive sciences, where scientists from these different disciplines have shared interests in the same type of computational models that can advance our understanding of human cognition. This opportunity should not be missed by psychological sciences.

Financial support

This study was funded by an ISF 971/21 and Joint NSFC-ISF 2383/18 to G. Y.

Competing interest

None.

References

Abudarham, N., Grosbard, I., & Yovel, G. (2021). Face recognition depends on specialized mechanisms tuned to view-invariant facial features: Insights from deep neural networks optimized for face or object recognition. Cognitive Science, 45(9), e13031. https://doi.org/10.1111/cogsCrossRef Google Scholar PubMed

Biederman, I., & Kalocsai, P. (1997). Neurocomputational bases of object and face recognition. Philosophical Transactions of the Royal Society B: Biological Sciences, 352(1358), 1203–1219. https://doi.org/10.1098/rstb.1997.0103CrossRef Google Scholar PubMed

Cao, Q., Shen, L., Xie, W., Parkhi, O. M., & Zisserman, A. (2018). VGGFace2: A dataset for recognising faces across pose and age. In Proceedings of the 13th IEEE international conference on automatic face and gesture recognition, FG 2018 (pp. 67–74). https://doi.org/10.1109/FG.2018.00020CrossRef Google Scholar

Cashon, C. H., & Holt, N. A. (2015). Developmental origins of the face inversion effect. In Advances in child development and behavior (1st ed., Vol. 48, pp. 117–150). Elsevier. https://doi.org/10.1016/bs.acdb.2014.11.008Google Scholar

Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., & Fei-Fei, L. (2009). ImageNet: A large-scale hierarchical image database (pp. 248–255). https://doi.org/10.1109/cvprw.2009.5206848CrossRef Google Scholar

Dobs, K., Martinez, J., Yuhan, K., & Kanwisher, N. (2022). Behavioral signatures of face perception emerge in deep neural networks optimized for face recognition. Proceedings of the National Academy of Sciences, 120(32), e2220642120.CrossRef Google Scholar

Farah, M. J., Tanaka, J. W., & Drain, H. M. (1995). What causes the face inversion effect? Journal of Experimental Psychology: Human Perception and Performance, 21(3), 628–634. https://doi.org/10.1037/0096-1523.21.3.628Google Scholar PubMed

Fausey, C. M., Jayaraman, S., & Smith, L. B. (2016). From faces to hands: Changing visual input in the first two years. Cognition, 152, 101–107. https://doi.org/10.1016/j.cognition.2016.03.005CrossRef Google Scholar PubMed

Hill, M. Q., Parde, C. J., Castillo, C. D., Colón, Y. I., Ranjan, R., Chen, J.-C., … O'Toole, A. J. (2019). Deep convolutional neural networks in the face of caricature. Nature Machine Intelligence, 1(11), 522–529. https://doi.org/10.1038/s42256-019-0111-7CrossRef Google Scholar

Jacob, G., Pramod, R. T., Katti, H., & Arun, S. P. (2021). Qualitative similarities and differences in visual object representations between brains and deep networks. Nature Communications, 12(1), 1–14. https://doi.org/10.1038/s41467-021-22078-3CrossRef Google Scholar PubMed

Kanwisher, N., Gupta, P., & Dobs, K. (2023). CNNs reveal the computational implausibility of the expertise hypothesis. iScience, 26(2), 105976. https://doi.org/10.1016/j.isci.2023.105976CrossRef Google Scholar PubMed

Khaligh-Razavi, S. M., & Kriegeskorte, N. (2014). Deep supervised, but not unsupervised, models may explain IT cortical representation. PLoS Computational Biology, 10(11), e1003915.Google Scholar

Ma, W. J., & Peters, B. (2020). A neural network walks into a lab: Towards using deep nets as models for human behavior (pp. 1–39).Google Scholar

McClelland, J. L., McNaughton, B. L., & O'Reilly, R. C. (1995). Why there are complementary learning systems in the hippocampus and neocortex: Insights from the successes and failures of connectionist models of learning and memory. Psychological Review, 102(3), 419–457. https://doi.org/10.1037/0033-295X.102.3.419CrossRef Google Scholar PubMed

O'Toole, A. J., & Castillo, C. D. (2021). Face recognition by humans and machines: Three fundamental advances from deep learning. Annual Review of Vision Science, 7, 543–570. https://doi.org/10.1146/annurev-vision-093019-111701CrossRef Google Scholar PubMed

Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., … Sutskever, I. (2021). Learning transferable visual models from natural language supervision. In International conference on machine learning (pp. 8748–8763). PMLR.Google Scholar

Rosenblatt, F. (1958). The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65(6), 386–408.CrossRef Google Scholar PubMed

Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323(6088), 533–536.CrossRef Google Scholar

Shoham, A., Grosbard, I., Patashnik, O., Cohen-Or, D., & Yovel, G. (2022). Deep learning algorithms reveal a new visual-semantic representation of familiar faces in human perception and memory. Biorxiv, 2022-10.Google Scholar

Smith, L. B., & Slone, L. K. (2017). A developmental approach to machine learning? Frontiers in Psychology, 8, 1–10. https://doi.org/10.3389/fpsyg.2017.02124CrossRef Google Scholar PubMed

Tian, F., Xie, H., Song, Y., Hu, S., & Liu, J. (2022). The face inversion effect in deep convolutional neural networks. Frontiers in Computational Neuroscience, 16, 1–8. https://doi.org/10.3389/fncom.2022.854218CrossRef Google Scholar PubMed

Turk, M., & Pentland, A. (1991). Eigenfaces for recognition. Journal of Cognitive Neuroscience, 3(1), 71–86. https://doi.org/10.1162/jocn.1991.3.1.71CrossRef Google Scholar PubMed

Vogelsang, L., Gilad-Gutnick, S., Ehrenberg, E., Yonas, A., Diamond, S., Held, R., & Sinha, P. (2018). Potential downside of high initial visual acuity. Proceedings of the National Academy of Sciences of the United States of America, 115(44), 11333–11338. https://doi.org/10.1073/pnas.1800901115CrossRef Google Scholar PubMed

Yamins, D. L. K., & DiCarlo, J. J. (2016). Using goal-driven deep learning models to understand sensory cortex. Nature Neuroscience, 19(3), 356–365. https://doi.org/10.1038/nn.4244CrossRef Google Scholar PubMed

Yin, R. K. (1969). Looking at upside-down faces. Journal of Experimental Psychology, 81(1), 141.CrossRef Google Scholar

Yovel, G., Grosbard, I., & Abudarham, N. (2023). Deep learning models challenge the prevailing assumption that face-like effects for objects of expertise support domain-general mechanisms. Proceedings of the Royal Society B, 290(1998), 20230093.CrossRef Google Scholar PubMed