Calibration, Coherence, and Scoring Rules

Teddy Seidenfeld

doi:10.1086/289244

Calibration, Coherence, and Scoring Rules

Published online by Cambridge University Press: 01 April 2022

Teddy Seidenfeld

Show author details

Teddy Seidenfeld*: Affiliation:
Department of Philosophy, Washington University in St. Louis

Article contents

Abstract
Footnotes
References

Get access

Rights & Permissions

Abstract

Can there be good reasons for judging one set of probabilistic assertions more reliable than a second? There are many candidates for measuring “goodness“ of probabilistic forecasts. Here, I focus on one such aspirant: calibration. Calibration requires an alignment of announced probabilities and observed relative frequency, e.g., 50 percent of forecasts made with the announced probability of .5 occur, 70 percent of forecasts made with probability .7 occur, etc.

To summarize the conclusions: (i) Surveys designed to display calibration curves, from which a recalibration is to be calculated, are useless without due consideration for the interconnections between questions (forecasts) in the survey. (ii) Subject to feedback, calibration in the long run is otiose. It gives no ground for validating one coherent opinion over another as each coherent forecaster is (almost) sure of his own long-run calibration. (iii) Calibration in the short run is an inducement to hedge forecasts. A calibration score, in the short run, is improper. It gives the forecaster reason to feign violation of total evidence by enticing him to use the more predictable frequencies in a larger finite reference class than that directly relevant.

Type: Research Article
Information: Philosophy of Science , Volume 52 , Issue 2 , June 1985 , pp. 274 - 294

DOI: https://doi.org/10.1086/289244 [Opens in a new window]
Copyright: Copyright © 1985 by the Philosophy of Science Association

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

†

I thank Jay Kadane and Mark Schervish for helpful discussions about their important work on calibration, and Isaac Levi for his constructive criticism of this and earlier drafts. Also, I have benefited from conversations with M. De Groot and J. K. Ghosh.

Preliminary versions of this paper were delivered at the Meeting of the Society for Philosophy and Psychology, May 13–16, 1982, London, Ontario; and at Session TA10, “Modeling Uncertainty,“ of the TIMS/ORSA conference, April 27, 1983, Chicago, Illinois.

Research for this work was sponsored by a Washington University Faculty Research Grant.

References

Alpert, M., and Raiffa, H. (1982), “A progress report on the training of probability assessors”, in Judgment under Uncertainty: Heuristics and Biases, Kahneman, D., Slovic, P., and Tversky, A., (eds.). Cambridge: Cambridge University Press, pp. 294–305. Hereafter, “Judgment under Uncertainty.”CrossRef Google Scholar

Blackwell, D. and Girshick, M. (1954), Theory of Games and Statistical Decisions. London and New York: John Wiley.Google Scholar

Brier, G. W. (1950), “Verification of Forecasts Expressed in Terms of Probability”, Monthly Weather Review 78: 1–3.2.0.CO;2>CrossRef Google Scholar

Bross, I. D. J. (1953), Design for Decision. New York: Macmillan.Google Scholar

Chen, R. (1977), “On Almost Sure Convergence in a Finitely Additive Setting”, Z. Wahrscheinlichkeitstheorie 37: 341–56.CrossRef Google Scholar

Dawid, A. P. (1982), “The Well Calibrated Bayesian”, Journal of the American Statistical Association 77: 605–10; discussion, 610–13.Google Scholar

De Groot, M., and Eriksson, E. (forthcoming), “Probability forecasting, stochastic dominance and the Lorenz curve”, in Proceedings of the Second International Meeting on Bayesian Statistics, Valencia, Spain, 1983.Google Scholar

De Groot, M., and Fienberg, S. E. (1981), “Assessing Probability Assessors: Calibration and Refinement”, Technical Report 105, Dept. of Statistics. Pittsburgh: Carnegie-Mellon University.Google Scholar

De Groot, M., and Fienberg, S. (1982), “The Comparison and Evaluation of Forecasters”, Technical Report 244, Department of Statistics. Pittsburgh: Camegie-Mellon University.Google Scholar

Dubins, L. (1974), “On Lebesgue-like Extensions of Finitely Additive Measures”, Annals of Probability 2: 456–63.CrossRef Google Scholar

Dubins, L. (1975), “Finitely Additive Conditional Probabilities, Conglomerability and Disintegrations”, Annals of Probability 3: 89–99.CrossRef Google Scholar

Feller, W. (1966), An Introduction to Probability Theory and its Applications. Vol. 2. London and New York: John Wiley.Google Scholar

Finetti, B. de (1972), Probability, Induction and Statistics. London and New York: John Wiley.Google Scholar

Finetti, B. de (1974), Theory of Probability. Vol. 1. London and New York: John Wiley.Google Scholar

French, S. (forthcoming), “Group consensus probability distributions: a critical survey”, in Proceedings of the Second International Meeting on Bayesian Statistics, Valencia, Spain, 1983.Google Scholar

Gibbard, A. (1973), “Manipulation of Voting Schemes: A General Result”, Econometrica 41: 587–601.CrossRef Google Scholar

Hoerl, A. E., and Fallin, H. K. (1974), “Reliability of Subjective Evaluations in a High Incentive Situation,” Journal of the Royal Statistical Society A 127: 227–30.Google Scholar

Horwich, P. (1982), Probability and Evidence, Cambridge: Cambridge University Press.Google Scholar

Kadane, J. B., and Lichtenstein, S. (1982), “A Subjectivist View of Calibration”, Technical Report 233, Dept. of Statistics. Pittsburgh: Camegie-Mellon University.Google Scholar

Kyburg, H. E. (1974), The Logical Foundations of Statistical Inference. Dordrecht: D. Reidel.CrossRef Google Scholar

Kyburg, H. E. (1978), “Subjective Probability: Considerations, Reflections, and Problems”, Journal Philosophical Logic 7: 157–80.CrossRef Google Scholar

Levi, I. (1980), The Enterprise of Knowledge, Cambridge: The MIT Press.Google Scholar

Levi, I. (1981), “Direct Inference and Confirmational Conditionalization”, Philosophy of Science 48: 532–52.CrossRef Google Scholar

Lichtenstein, S., and Fischhoff, B. (1977), “Do Those Who Know More also Know More about How Much They Know?” Organizational Behavior and Human Performance 20: 159–83.CrossRef Google Scholar

Lichtenstein, S.; Fischhoff, B.; and Phillips, L. (1982), “Calibration of probabilities: The state of the art to 1980”, in Judgment under Uncertainty, Kahneman, D., Slovic, P., and Tversky, A. (eds.). Cambridge: Cambridge University Press, pp. 306–34.Google Scholar

Lindley, D. V. (1981), “Scoring rules and the Inevitability of Probability”, unpublished report, ORC 81~-1, Operations Research Center. Berkeley: University of California.CrossRef Google Scholar

Lindley, D. V. (forthcoming), “Reconciliation of discrete probability distributions”, in Proceedings of the Second International Meeting on Bayesian Statistics, Valencia, Spain, 1983.CrossRef Google Scholar

Lindley, D. V.; Tversky, A.; and Brown, R. V. (1979), “On the Reconcilliation of Probability Assessments”, with discussion, Journal of the Royal Statistical Society A 142: 146–80.CrossRef Google Scholar

Murphy, A. H. (1973a), “Hedging and Skill Scores for Probability Forecasts”, Journal of Applied Meteorology 12: 215–23.2.0.CO;2>CrossRef Google Scholar

Murphy, A. H. (1973b), “A New Vector Partition of the Probability Score”, Journal of Applied Meteorology 12: 595–600.2.0.CO;2>CrossRef Google Scholar

Murphy, A. H. (1974), “A Sample Skill Score for Probability Forecasts”, Monthly Weather Review 102: 48–55.2.0.CO;2>CrossRef Google Scholar

Murphy, A. H., and Epstein, E. S. (1967), “Verification of Probabilistic Predictions: A Brief Review”, Journal of Applied Meteorology 6: 748–55.2.0.CO;2>CrossRef Google Scholar

Murphy, A. H., and Winkler, R. L. (1977), “Reliability of Subjective Probability Forecasts of Precipitation and Temperature”, Applied Statistics 26: 41–47.CrossRef Google Scholar

Pratt, J., and Schlaifer, R. (forthcoming), “Repetitive assessment of judgmental probability distributions: a case study”, in Proceedings of the Second International Meeting on Bayesian Statistics, Valencia, Spain, 1983.Google Scholar

Putnam, H. (1981), Reason, Truth and History. Cambridge: Cambridge University Press.CrossRef Google Scholar

Rao, C. R. (1980), “Diversity and Dissimilarity Coefficients: A unified approach,” Technical Report 80~-10, Institute for Statistics and Applications, Dept. of Mathematics and Statistics, University of Pittsburgh.Google Scholar

Sanders, F. (1958), “The evaluation of subjective probability forecasts”, Dept. of Meteorology, Contract AF 19(604)-1305, Scientific Report 5. Cambridge: MIT.Google Scholar

Savage, L. J. (1954), The Foundations of Statistics. New York: John Wiley.Google Scholar

Savage, L. J. (1971), “Elicitation of Personal Probabilities and Expectations”, Journal of the American Statistical Association 66; 783–801.CrossRef Google Scholar

Schervish, M. J. (1983), “A General Method for Comparing Probability Assessors”, Technical Report 275, Dept. of Statistics. Pittsburgh: Carnegie-Mellon University.Google Scholar

Schervish, M., Seidenfeld, T.; and Kadane, J. (1984), “The Extent of Non-conglomerability of Finitely Additive Probabilities”, Z. Wahrscheinlichkeitstheorie 66: 205–26.CrossRef Google Scholar

Seidenfeld, T. (1978), “Direct Inference and Inverse Inference”, Journal of Philosophy 75: 709–30.CrossRef Google Scholar

Seidenfeld, T., and Schervish, M. (1983), “A Conflict Between Finite Additivity and Avoiding Dutch Book”, Philosophy of Science 50: 398–412.CrossRef Google Scholar

Shimony, A. (1955), “Coherence and the Axioms of Confirmation”, Journal of Symbolic Logic 20: 1–28.CrossRef Google Scholar

Spielman, S. (1976), “Exchangeability and the Certainty of Objective Randomness,” Journal of Philosophical Logic 5: 399–406.Google Scholar

Winkler, R. L. (1967), “The Assessment of Prior Distributions in Bayesian Analysis”, Journal of the American Statistical Association 62: 776–800.CrossRef Google Scholar

Zeckhauser, R. (1973), “Voting Systems, Honest Preferences and Pareto Optimality”, American Political Science Review 67: 934–46.CrossRef Google Scholar

Article contents

Calibration, Coherence, and Scoring Rules

Abstract

Access options

Footnotes

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests