Hostname: page-component-7bb8b95d7b-lvwk9 Total loading time: 0 Render date: 2024-09-25T14:38:14.221Z Has data issue: false hasContentIssue false

Bayes beyond the predictive distribution

Published online by Cambridge University Press:  23 September 2024

Anna Székely*
Affiliation:
Department of Computational Sciences, HUN-REN Wigner Research Centre for Physics, Budapest, Hungary szekely.anna@wigner.hu orban.gergo@wigner.mta.hu http://golab.wigner.mta.hu/people/anna-szekely/ http://golab.wigner.mta.hu/people/gergo-orban/ Department of Cognitive Science, Faculty of Natural Sciences, Budapest University of Technology and Economics, Budapest, Hungary
Gergő Orbán
Affiliation:
Department of Computational Sciences, HUN-REN Wigner Research Centre for Physics, Budapest, Hungary szekely.anna@wigner.hu orban.gergo@wigner.mta.hu http://golab.wigner.mta.hu/people/anna-szekely/ http://golab.wigner.mta.hu/people/gergo-orban/
*
*Corresponding author.

Abstract

Binz et al. argue that meta-learned models offer a new paradigm to study human cognition. Meta-learned models are proposed as alternatives to Bayesian models based on their capability to learn identical posterior predictive distributions. In our commentary, we highlight several arguments that reach beyond a predictive distribution-based comparison, offering new perspectives to evaluate the advantages of these modeling paradigms.

Type
Open Peer Commentary
Copyright
Copyright © The Author(s), 2024. Published by Cambridge University Press

In their review, Binz et al. propose a framework for studying the adaptive nature of the mind. They propose that recent advances in machine learning empower meta-learning paradigms to be used as a flexible and general framework for studying the computations, the representations, and even the neuronal processes underlying learning. The authors put forward a number of arguments that provide support for such a paradigm. In this commentary, we aim to reflect on these arguments in order to better identify the advantages and limits of using meta-learned models instead of Bayesian ones.

The authors pit the meta-learning paradigm against Bayesian approaches. Bayesian models provide a similarly general framework for formulating learning problems as meta-learned models, but the two paradigms differ in the principles that guide model construction. In contrast with the primarily data-driven approach of meta-learned models, Bayesian approaches formulate the computational challenge humans face when performing task(s) through the definition of likelihood and priors, which summarize our assumptions about the relevant quantities of the computational challenge and our prior beliefs about these quantities. In other words, when constructing a Bayesian model, one needs to define a generative model of the task and also the relevant quantities that shape the learning procedure, which instantly provides a set of testable hypotheses and, thus, an opportunity to better understand cognition. The authors challenge the Bayesian approach by pointing out that in complex tasks, both defining and evaluating the likelihood can be impossible, and the function classes that Bayesian models rely on can be severely constrained. The authors argue that these challenges can be circumvented by using meta-learned models instead. To support the paradigm shift, the authors cite promising new studies that explore the equivalence of meta-learned models and Bayesian approaches. While these unifying views certainly contribute to a better understanding of learning, some aspects of these views deserve further consideration.

The authors argue that it is the posterior predictive distribution that a model ultimately learns, and thus, this quantity provides a platform to compare alternative approaches. The posterior predictive distribution is then used to establish the equivalence of Bayesian and meta-learned models. We would challenge this view based on two observations. First, it is important to point out that in its general form, the posterior predictive distribution is not a quantity that is invariant for a set of tasks, but it depends on the choice of the prior. This also means that the equivalence of the meta-learner and the Bayesian learner is constrained. This constraint can be illuminated by considering the contribution of the priors in Bayesian models. The effect of prior is most pronounced when data are scarce. In such cases, the equivalence is hard to establish as it is unclear what sort of prior the meta-learner model implicitly assumes. When data are abundant, however, the contribution of the prior diminishes, and in such cases, it is easier to establish the equivalence of the two model classes. Second, comparing Bayesian models and deep networks based on predictive performance alone ignores the power of having a framework that permits combining structured knowledge representations with powerful inference (Griffiths, Chater, Kemp, Perfors, & Tenenbaum, Reference Griffiths, Chater, Kemp, Perfors and Tenenbaum2010; Kemp, Perfors, & Tenenbaum, Reference Kemp, Perfors and Tenenbaum2007; Kemp & Tenenbaum, Reference Kemp and Tenenbaum2008; Tenenbaum, Griffiths, & Kemp, Reference Tenenbaum, Griffiths and Kemp2006, Reference Tenenbaum, Kemp, Griffiths and Goodman2011). A key benefit of Bayesian modeling is the characterization of generative models that could plausibly account for the behavioral outcomes. Creating and testing hypotheses regarding these generative models enables us to better understand the computations that underlie cognition and give rise to the behavioral outcome.

The authors refer to inductive biases that can be transparently captured by meta-learned models, some of which are not necessarily easy to capture in Bayesian models. While we agree that some forms of inductive biases are readily delivered by these meta-learned models, Bayesian models too are capable of investigating relevant inductive biases. These inductive biases might include assumptions about the function classes that learning operates on (Kemp & Tenenbaum, Reference Kemp and Tenenbaum2008) or assumptions about the computational complexity of the generative model (Csikor, Meszéna, & Orbán, Reference Csikor, Meszéna and Orbán2023) both of which can be phrased through the definition of the likelihood. Such inductive biases can be explored by pitting them against alternatives and assessing the models’ power to predict human learning. In summary, we argue that characterization of learning through the specification of the generative model, comprised of the prior and the likelihood, makes it possible to explore the assumptions behind the models, which assumptions may remain hidden in meta-learned models.

Finally, it's important to clarify that we agree with the authors that more flexible tools provide unique opportunities to study a broader class of phenomena. However, recent advances in Bayesian models open new opportunities in this aspect, for example, variational autoencoders (Nagy, Török, & Orbán, Reference Nagy, Török and Orbán2020; Spens & Burgess, Reference Spens and Burgess2024), non-parametric methods (Éltető, Nemeth, Janacsek, & Dayan, Reference Éltető, Nemeth, Janacsek and Dayan2022; Heald, Lengyel, & Wolpert, Reference Heald, Lengyel and Wolpert2021; Török et al., Reference Török, Nagy, Kiss, Janacsek, Németh and Orbán2022), or probabilistic programming (Lake, Salakhutdinov, & Tenenbaum, Reference Lake, Salakhutdinov and Tenenbaum2015), might leverage the need to meticulously define model architectures a priori by the experimenter and will complement the data-driven meta-learning approach proposed by the authors. In particular, the contribution of changing inductive biases to task performance in humans has been recently investigated in an implicit learning paradigm using a non-parametric Bayesian approach (Székely et al., Reference Székely, Török, Kiss, Janacsek, Németh and Orbán2024). In general, a combination of flexible nonlinear Bayesian models with structure learning is particularly appealing and has proven to be a valuable tool in continual learning (Achille et al., Reference Achille, Eccles, Matthey, Burgess, Watters, Lerchner and Higgins2018; Rao et al., Reference Rao, Visin, Rusu, Teh, Pascanu and Hadsell2019).

Financial support

Supported by the European Union project RRF-2.3.1-21-2022-00004 within the framework of the Artificial Intelligence National Laboratory.

Competing interest

None.

References

Achille, A., Eccles, T., Matthey, L., Burgess, C. P., Watters, N., Lerchner, A., & Higgins, I. (2018). Life-long disentangled representation learning with cross-domain latent homologies. NeurIPS.Google Scholar
Csikor, F., Meszéna, B., & Orbán, G. (2023). Top-down perceptual inference shaping the activity of early visual cortex. BioRxiv. https://doi.org/10.1101/2023.11.29.569262Google Scholar
Éltető, N., Nemeth, D., Janacsek, K., & Dayan, P. (2022). Tracking human skill learning with a hierarchical Bayesian sequence model. PLoS Computational Biology, 18(11), e1009866. https://doi.org/10.1371/journal.pcbi.1009866CrossRefGoogle ScholarPubMed
Griffiths, T. L., Chater, N., Kemp, C., Perfors, A., & Tenenbaum, J. B. (2010). Probabilistic models of cognition: Exploring representations and inductive biases. Trends in Cognitive Sciences, 14(8), 357364. https://doi.org/10.1016/j.tics.2010.05.004CrossRefGoogle ScholarPubMed
Heald, J. B., Lengyel, M., & Wolpert, D. M. (2021). Contextual inference underlies the learning of sensorimotor repertoires. Nature, 600, 489493. https://doi.org/10.1038/s41586-021-04129-3CrossRefGoogle ScholarPubMed
Kemp, C., Perfors, A., & Tenenbaum, J. B. (2007). Learning overhypotheses with hierarchical Bayesian models. Developmental Science, 10(3), 307321. https://doi.org/10.1111/j.1467-7687.2007.00585.xCrossRefGoogle ScholarPubMed
Kemp, C., & Tenenbaum, J. B. (2008). The discovery of structural form. PNAS, 105(31), 1068710692. https://doi.org/10.1073/pnas.0802631105CrossRefGoogle ScholarPubMed
Lake, B. M., Salakhutdinov, R., & Tenenbaum, J. B. (2015). Human-level concept learning through probabilistic program induction. Science, 350(6266), 13321338. Retrieved from www.sciencemag.orgCrossRefGoogle ScholarPubMed
Nagy, D. G., Török, B., & Orbán, G. (2020). Optimal forgetting: Semantic compression of episodic memories. PLoS Computational Biology, 16(10), e1008367. https://doi.org/10.1371/journal.pcbi.1008367CrossRefGoogle ScholarPubMed
Rao, D., Visin, F., Rusu, A. A., Teh, Y. W., Pascanu, R., & Hadsell, R. (2019). Continual unsupervised representation learning. NeurIPS.Google Scholar
Spens, E., & Burgess, N. (2024). A generative model of memory construction and consolidation. Nature Human Behaviour, 8, 526543. https://doi.org/10.1038/s41562-023-01799-zCrossRefGoogle ScholarPubMed
Székely, A., Török, B., Kiss, M. M., Janacsek, K., Németh, D., & Orbán, G. (2024). Identifying transfer learning in the reshaping of inductive biases. PsyArxiv.Google ScholarPubMed
Tenenbaum, J. B., Griffiths, T. L., & Kemp, C. (2006). Theory-based Bayesian models of inductive learning and reasoning. Trends in Cognitive Sciences, 10(7), 309318. https://doi.org/10.1016/j.tics.2006.05.009CrossRefGoogle ScholarPubMed
Tenenbaum, J. B., Kemp, C., Griffiths, T. L., & Goodman, N. D. (2011). How to grow a mind: Statistics, structure, and abstraction. Science, 331(6022), 12791285. https://doi.org/10.1126/science.1192788CrossRefGoogle Scholar
Török, B., Nagy, D. G., Kiss, M., Janacsek, K., Németh, D., & Orbán, G. (2022). Tracking the contribution of inductive bias to individualised internal models. PLoS Computational Biology, 18(6), e1010182. https://doi.org/10.1371/journal.pcbi.1010182CrossRefGoogle ScholarPubMed