Hostname: page-component-848d4c4894-pftt2 Total loading time: 0 Render date: 2024-05-20T05:05:51.226Z Has data issue: false hasContentIssue false

The Berkelmans–Pries dependency function: A generic measure of dependence between random variables

Published online by Cambridge University Press:  23 March 2023

Guus Berkelmans*
Affiliation:
Centrum Wiskunde & Informatica
Sandjai Bhulai*
Affiliation:
Vrije Universiteit (VU)
Rob van der mei*
Affiliation:
Centrum Wiskunde & Informatica Vrije Universiteit
Joris Pries*
Affiliation:
Centrum Wiskunde & Informatica
*
*Postal address: Department of Stochastics, P.O. Box 94079, 1090 GB Amsterdam, Netherlands
***Postal address: Department of Mathematics, De Boelelaan 1111, 1081 HV Amsterdam, Netherlands. Email: s.bhulai@vu.nl
****Postal address: Department of Stochastics, Science Park 123, 1098 XG Amsterdam, Netherlands. Email: mei@cwi.nl
*Postal address: Department of Stochastics, P.O. Box 94079, 1090 GB Amsterdam, Netherlands

Abstract

Measuring and quantifying dependencies between random variables (RVs) can give critical insights into a dataset. Typical questions are: ‘Do underlying relationships exist?’, ‘Are some variables redundant?’, and ‘Is some target variable Y highly or weakly dependent on variable X?’ Interestingly, despite the evident need for a general-purpose measure of dependency between RVs, common practice is that most data analysts use the Pearson correlation coefficient to quantify dependence between RVs, while it is recognized that the correlation coefficient is essentially a measure for linear dependency only. Although many attempts have been made to define more generic dependency measures, there is no consensus yet on a standard, general-purpose dependency function. In fact, several ideal properties of a dependency function have been proposed, but without much argumentation. Motivated by this, we discuss and revise the list of desired properties and propose a new dependency function that meets all these requirements. This general-purpose dependency function provides data analysts with a powerful means to quantify the level of dependence between variables. To this end, we also provide Python code to determine the dependency function for use in practice.

Type
Original Article
Copyright
© The Author(s), 2023. Published by Cambridge University Press on behalf of Applied Probability Trust

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Agarwal, R., Sacre, P. and Sarma, S. V. (2015). Mutual dependence: A novel method for computing dependencies between random variables. Preprint, arXiv:1506.00673.Google Scholar
Bradley, R. C. (2005). Basic properties of strong mixing conditions. A survey and some open questions. Prob. Surv. 2, 107144.Google Scholar
Capitani, L., Bagnato, L. and Punzo, A. (2014). Testing serial independence via density-based measures of divergence. Methodology Comput. Appl. Prob. 16, 627641.Google Scholar
Embrechts, P., McNeil, A. J. and Straumann, D. (2002). Correlation and Dependence in Risk Management: Properties and Pitfalls. Cambridge University Press, pp. 176223.Google Scholar
Gebelein, H. (1941). Das statistische problem der korrelation als variations- und eigenwertproblem und sein zusammenhang mit der ausgleichsrechnung. J. Appl. Math. Mech. 21, 364379.Google Scholar
Gramacki, A. (2017). Nonparametric Kernel Density Estimation and Its Computational Aspects, 1st edn. Springer, New York.Google Scholar
Granger, C. W., Maasoumi, E. and Racine, J. (2004). A dependence metric for possibly nonlinear processes. J. Time Series Anal. 25, 649669.CrossRefGoogle Scholar
Gretton, A., Herbrich, R., Smola, A., Bousquet, O. and Schölkopf, B. (2005). Kernel methods for measuring independence. J. Mach. Learn. Res. 6, 20752129.Google Scholar
Hellinger, E. (1909). Neue begründung der theorie quadratischer formen von unendlichvielen veränderlichen. J. reine angew. Math. 1909, 210271.CrossRefGoogle Scholar
Hotelling, H. (1936). Relations between two sets of variates. Biometrika 28, 321377.CrossRefGoogle Scholar
Janse, R. J., Hoekstra, T., Jager, K. J., Zoccali, C., Tripepi, G., Dekker, F. W. and van Diepen, M. (2021). Conducting correlation analysis: Important limitations and pitfalls. Clinical Kidney J. 14, 23322337.CrossRefGoogle ScholarPubMed
Joe, H. (1989). Relative entropy measures of multivariate dependence. J. Amer. Statist. Assoc. 84, 157164.Google Scholar
Kimeldorf, G. and Sampson, A. R. (1978). Monotone dependence. Ann. Statist. 6, 895903.CrossRefGoogle Scholar
Kruskal, W. H. (1958). Ordinal measures of association. J. Amer. Statist. Assoc. 53, 814861.Google Scholar
Lancaster, H. O. (1963). Correlation and complete dependence of random variables. Ann. Math. Statist. 34, 13151321.Google Scholar
Móri, T. F. and Székely, G. J. (2019). Four simple axioms of dependence measures. Metrika 82, 116.CrossRefGoogle Scholar
Press, W. H., Teukolsky, S. A., Vetterling, W. T. and Flannery, B. P. (2007). Numerical Recipes, 3 edn. Cambridge University Press.Google Scholar
Rényi, A. (1959). On measures of dependence. Acta Math. Acad. Scient. Hungar. 10, 441451.CrossRefGoogle Scholar
Reshef, D. N., Reshef, Y. A., Finucane, H. K., Grossman, S. R., McVean, G., Turnbaugh, P. J., Lander, E. S., Mitzenmacher, M. and Sabeti, P. C. (2011). Detecting novel associations in large data sets. Science 334, 15181524.CrossRefGoogle ScholarPubMed
Sugiyama, M. and Borgwardt, K. M. (2013). Measuring statistical dependence via the mutual information dimension. In Proc. Twenty-Third Int. Joint Conf. Artificial Intelligence, IJCAI ’13. AAAI Press, Beijing, pp. 16921698.Google Scholar
Székely, G. J. and Rizzo, M. L. (2009). Brownian distance covariance. Ann. Appl. Statist. 3, 12361265.Google Scholar
Watanabe, S. (1960). Information theoretical analysis of multivariate correlation. IBM J. Res. Devel. 4, 6682.Google Scholar