The Berkelmans–Pries dependency function: A generic measure of dependence between random variables

Guus Berkelmans; Sandjai Bhulai; Rob van der mei; Joris Pries

doi:10.1017/jpr.2022.118

The Berkelmans–Pries dependency function: A generic measure of dependence between random variables

Part of: Foundations of probability theory Multivariate analysis

Published online by Cambridge University Press: 23 March 2023

Rob van der mei and

Guus Berkelmans*: Affiliation:
Centrum Wiskunde & Informatica
Sandjai Bhulai*: Affiliation:
Vrije Universiteit (VU)
Rob van der mei*: Affiliation:
Centrum Wiskunde & Informatica Vrije Universiteit
Joris Pries*: Affiliation:
Centrum Wiskunde & Informatica
*: *Postal address: Department of Stochastics, P.O. Box 94079, 1090 GB Amsterdam, Netherlands
***Postal address: Department of Mathematics, De Boelelaan 1111, 1081 HV Amsterdam, Netherlands. Email: s.bhulai@vu.nl
****Postal address: Department of Stochastics, Science Park 123, 1098 XG Amsterdam, Netherlands. Email: mei@cwi.nl
*Postal address: Department of Stochastics, P.O. Box 94079, 1090 GB Amsterdam, Netherlands

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Measuring and quantifying dependencies between random variables (RVs) can give critical insights into a dataset. Typical questions are: ‘Do underlying relationships exist?’, ‘Are some variables redundant?’, and ‘Is some target variable Y highly or weakly dependent on variable X?’ Interestingly, despite the evident need for a general-purpose measure of dependency between RVs, common practice is that most data analysts use the Pearson correlation coefficient to quantify dependence between RVs, while it is recognized that the correlation coefficient is essentially a measure for linear dependency only. Although many attempts have been made to define more generic dependency measures, there is no consensus yet on a standard, general-purpose dependency function. In fact, several ideal properties of a dependency function have been proposed, but without much argumentation. Motivated by this, we discuss and revise the list of desired properties and propose a new dependency function that meets all these requirements. This general-purpose dependency function provides data analysts with a powerful means to quantify the level of dependence between variables. To this end, we also provide Python code to determine the dependency function for use in practice.

Keywords

Probability theory measure theory distributions association correlation

MSC classification

Primary: 62H20: Measures of association (correlation, canonical correlation, etc.)

Secondary: 60A10: Probabilistic measure theory 62H05: Characterization and structure theory

Type: Original Article
Information: Journal of Applied Probability , Volume 60 , Issue 4 , December 2023 , pp. 1115 - 1135

DOI: https://doi.org/10.1017/jpr.2022.118 [Opens in a new window]
Copyright: © The Author(s), 2023. Published by Cambridge University Press on behalf of Applied Probability Trust

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Agarwal, R., Sacre, P. and Sarma, S. V. (2015). Mutual dependence: A novel method for computing dependencies between random variables. Preprint, arXiv:1506.00673.Google Scholar

Bradley, R. C. (2005). Basic properties of strong mixing conditions. A survey and some open questions. Prob. Surv. 2, 107–144.Google Scholar

Capitani, L., Bagnato, L. and Punzo, A. (2014). Testing serial independence via density-based measures of divergence. Methodology Comput. Appl. Prob. 16, 627–641.Google Scholar

Embrechts, P., McNeil, A. J. and Straumann, D. (2002). Correlation and Dependence in Risk Management: Properties and Pitfalls. Cambridge University Press, pp. 176–223.Google Scholar

Gebelein, H. (1941). Das statistische problem der korrelation als variations- und eigenwertproblem und sein zusammenhang mit der ausgleichsrechnung. J. Appl. Math. Mech. 21, 364–379.Google Scholar

Gramacki, A. (2017). Nonparametric Kernel Density Estimation and Its Computational Aspects, 1st edn. Springer, New York.Google Scholar

Granger, C. W., Maasoumi, E. and Racine, J. (2004). A dependence metric for possibly nonlinear processes. J. Time Series Anal. 25, 649–669.CrossRef Google Scholar

Gretton, A., Herbrich, R., Smola, A., Bousquet, O. and Schölkopf, B. (2005). Kernel methods for measuring independence. J. Mach. Learn. Res. 6, 2075–2129.Google Scholar

Hellinger, E. (1909). Neue begründung der theorie quadratischer formen von unendlichvielen veränderlichen. J. reine angew. Math. 1909, 210–271.CrossRef Google Scholar

Hotelling, H. (1936). Relations between two sets of variates. Biometrika 28, 321–377.CrossRef Google Scholar

Janse, R. J., Hoekstra, T., Jager, K. J., Zoccali, C., Tripepi, G., Dekker, F. W. and van Diepen, M. (2021). Conducting correlation analysis: Important limitations and pitfalls. Clinical Kidney J. 14, 2332–2337.CrossRef Google Scholar PubMed

Joe, H. (1989). Relative entropy measures of multivariate dependence. J. Amer. Statist. Assoc. 84, 157–164.Google Scholar

Kimeldorf, G. and Sampson, A. R. (1978). Monotone dependence. Ann. Statist. 6, 895–903.CrossRef Google Scholar

Kruskal, W. H. (1958). Ordinal measures of association. J. Amer. Statist. Assoc. 53, 814–861.Google Scholar

Lancaster, H. O. (1963). Correlation and complete dependence of random variables. Ann. Math. Statist. 34, 1315–1321.Google Scholar

Móri, T. F. and Székely, G. J. (2019). Four simple axioms of dependence measures. Metrika 82, 1–16.CrossRef Google Scholar

Press, W. H., Teukolsky, S. A., Vetterling, W. T. and Flannery, B. P. (2007). Numerical Recipes, 3 edn. Cambridge University Press.Google Scholar

Rényi, A. (1959). On measures of dependence. Acta Math. Acad. Scient. Hungar. 10, 441–451.CrossRef Google Scholar

Reshef, D. N., Reshef, Y. A., Finucane, H. K., Grossman, S. R., McVean, G., Turnbaugh, P. J., Lander, E. S., Mitzenmacher, M. and Sabeti, P. C. (2011). Detecting novel associations in large data sets. Science 334, 1518–1524.CrossRef Google Scholar PubMed

Sugiyama, M. and Borgwardt, K. M. (2013). Measuring statistical dependence via the mutual information dimension. In Proc. Twenty-Third Int. Joint Conf. Artificial Intelligence, IJCAI ’13. AAAI Press, Beijing, pp. 1692–1698.Google Scholar

Székely, G. J. and Rizzo, M. L. (2009). Brownian distance covariance. Ann. Appl. Statist. 3, 1236–1265.Google Scholar

Watanabe, S. (1960). Information theoretical analysis of multivariate correlation. IBM J. Res. Devel. 4, 66–82.Google Scholar

Article contents

The Berkelmans–Pries dependency function: A generic measure of dependence between random variables

Abstract

Keywords

MSC classification

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests