Hostname: page-component-77c89778f8-fv566 Total loading time: 0 Render date: 2024-07-19T13:12:54.380Z Has data issue: false hasContentIssue false

Uncovering Sparsity and Heterogeneity in Firm-Level Return Predictability Using Machine Learning

Published online by Cambridge University Press:  13 September 2022

Theodoros Evgeniou
Affiliation:
INSEAD Decision Sciences theodoros.evgeniou@insead.edu
Ahmed Guecioueur*
Affiliation:
INSEAD Finance
Rodolfo Prieto
Affiliation:
INSEAD Finance rodolfo.prieto@insead.edu
*
ahmed.guecioueur@insead.edu (corresponding author)

Abstract

We develop an approach that combines the estimation of monthly firm-level expected returns with an assignment of firms to (possibly) latent groups, both based on observable characteristics, using machine learning principles with linear models. The best-performing methods are flexible two-stage sparse models that capture group-membership predictive relationships. Portfolios formed to exploit such group-varying predictions based on a parsimonious set of characteristics deliver economically meaningful returns with low turnover. We propose statistical tests based on nonparametric bootstrapping for our results, and detail how different characteristics may matter for different groups of firms, making comparisons to the existing literature.

Type
Research Article
Copyright
© The Author(s), 2022. Published by Cambridge University Press on behalf of the Michael G. Foster School of Business, University of Washington

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

We thank Jennifer Conrad (the editor) and Alberto Martín-Utrera (the referee) for their constructive comments. We are grateful to Panos Mavrokonstantis for excellent research assistance while he was a Senior Research Scientist at INSEAD. We also thank participants at the 13th Annual SoFiE Conference, the 3rd Future of Financial Information Conference, the inaugural Miami Herbert Winter Research Conference on ML and Business, the 2021 AFA PhD Poster Session, the 2020 European Winter Meetings of the Econometric Society, the 22nd INFER Annual Conference, the 9th Wharton-INSEAD Doctoral Consortium, and the INSEAD Accounting and Finance PhD seminar series, as well as Alex Chinco (discussant), Victor DeMiguel, Scott Murray (discussant), Joël Peress, Marcel Rindisbacher, Raman Uppal, Jinyuan Zhang, and Guofu Zhou for their helpful comments. A previous version of this article was circulated under the title “Modeling Heterogeneity in Firm-Level Return Predictability with Machine Learning.”

References

Ando, T., and Bai, J.. “Clustering Huge Number of Financial Time Series: A Panel Data Approach with High-Dimensional Predictors and Factor Structures.” Journal of the American Statistical Association, 112 (2017), 11821198.Google Scholar
Asness, C. S.; Porter, R. B.; and Stevens, R. L.. “Predicting Stock Returns Using Industry-Relative Firm Characteristics.” Available at SSRN, 213872 (2000).Google Scholar
Balasubramaniam, V.; Campbell, J. Y.; Ramadorai, T.; and Ranish, B.. “Who Owns What? A Factor Model for Direct Stockholding.” Journal of Finance, forthcoming (2023).Google Scholar
Barrot, J.-N., and Sauvagnat, J.. “Input Specificity and the Propagation of Idiosyncratic Shocks in Production Networks.” Quarterly Journal of Economics, 131 (2016), 15431592.Google Scholar
Belloni, A.; Chen, D.; Chernozhukov, V.; and Hansen, C.. “Sparse Models and Methods for Optimal Instruments with an Application to Eminent Domain.” Econometrica, 80 (2012), 23692429.Google Scholar
Belloni, A.; Chernozhukov, V.; and Hansen, C.. “Inference on Treatment Effects After Selection Among High-Dimensional Controls.” Review of Economic Studies, 81 (2014), 608650.CrossRefGoogle Scholar
Bonhomme, S., and Manresa, E.. “Grouped Patterns of Heterogeneity in Panel Data.” Econometrica, 83 (2015), 11471184.Google Scholar
Brown, G., and Kapadia, N.. “Firm-Specific Risk and Equity Market Development.” Journal of Financial Economics, 84 (2007), 358388.CrossRefGoogle Scholar
Cameron, A. C., and Trivedi, P. K.. Microeconometrics: Methods and Applications. New York, NY: Cambridge University Press (2005).Google Scholar
Campbell, J. Y., and Thompson, S. B.. “Predicting Excess Stock Returns Out of Sample: Can Anything Beat the Historical Average?Review of Financial Studies, 21 (2008), 15091531.Google Scholar
Carhart, M. M.On Persistence in Mutual Fund Performance.” Journal of Finance, 52 (1997), 5782.Google Scholar
Chernick, M. R. Bootstrap Methods: A Guide for Practitioners and Researchers, 2nd ed. Hoboken, NJ: John Wiley & Sons (2007).Google Scholar
Chernozhukov, V.; Chetverikov, D.; Demirer, M.; Duflo, E.; Hansen, C.; and Newey, W.. “Double/Debiased/Neyman Machine Learning of Treatment Effects.” American Economic Review Papers and Proceedings, 107 (2017), 261265.CrossRefGoogle Scholar
Chetty, R.; Looney, A.; and Kroft, K.. “Salience and Taxation: Theory and Evidence.” American Economic Review, 99 (2009), 11451477.Google Scholar
Cochrane, J. H.Presidential Address: Discount Rates.” Journal of Finance, 66 (2011), 10471108.Google Scholar
Cohen, L., and Frazzini, A.. “Economic Links and Predictable Returns.” Journal of Finance, 63 (2008), 19772011.Google Scholar
Conrad, J., and Kaul, G.. “An Anatomy of Trading Strategies.” Review of Financial Studies, 11 (1998), 489519.Google Scholar
Daniel, K.; Mota, L.; Rottke, S.; and Santos, T.. “The Cross-Section of Risk and Returns.” Review of Financial Studies, 33 (2020), 19271979.Google Scholar
DeMiguel, V.; Garlappi, L.; and Uppal, R.. “Optimal Versus Naive Diversification: How Inefficient is the 1/N Portfolio Strategy?Review of Financial Studies, 22 (2009), 19151953.Google Scholar
DeMiguel, V.; Martin-Utrera, A.; Nogales, F. J.; and Uppal, R.. “A Transaction-Cost Perspective on the Multitude of Firm Characteristics.” Review of Financial Studies, 33 (2020), 21802222.Google Scholar
DeMiguel, V.; Nogales, F. J.; and Uppal, R.. “Stock Return Serial Dependence and Out-of-Sample Portfolio Performance.” Review of Financial Studies, 27 (2014), 10311073.Google Scholar
Diebold, F., and Mariano, R.. “Comparing Predictive Accuracy.” Journal of Business and Economic Statistics, 13 (1995), 253263.Google Scholar
Diebold, F. X., and Shin, M.. “Machine Learning for Regularized Survey Forecast Combination: Partially-Egalitarian Lasso and Its Derivatives.” International Journal of Forecasting, 35 (2019), 16791691.Google Scholar
Ding, C., and He, X., “K-Means Clustering via Principal Component Analysis.” In Proceedings of the Twenty-First International Conference on Machine Learning. New York, NY: Association for Computing Machinery (2004), 29.Google Scholar
Dorn, D., and Huberman, G.. “Preferred Risk Habitat of Individual Investors.” Journal of Financial Economics, 97 (2010), 155173.Google Scholar
Fama, E. F.Market Efficiency, Long-Term Returns, and Behavioral Finance.” Journal of Financial Economics, 49 (1998), 283306.Google Scholar
Fama, E. F., and French, K. R.. “A Five-Factor Asset Pricing Model.” Journal of Financial Economics, 116 (2015), 122.Google Scholar
Farmer, L.; Schmidt, L.; and Timmermann, A.. “Pockets of Predictability.” Available at SSRN, 3152386 (2019).Google Scholar
Feng, G.; Giglio, S.; and Xiu, D.. “Taming the Factor Zoo: A Test of New Factors.” Journal of Finance, 75 (2020), 13271370.Google Scholar
Fisher, J. D.; Puelz, D. W.; and Carvalho, C. M.. “Monotonic Effects of Characteristics on Returns.” Annals of Applied Statistics, 14 (2020), 16221650.Google Scholar
Freyberger, J.; Neuhierl, A.; and Weber, M.. “Dissecting Characteristics Nonparametrically.” Review of Financial Studies, 33 (2020), 23262377.Google Scholar
Fuster, A.; Goldsmith-Pinkham, P.; Ramadorai, T.; and Walther, A.. “Predictably Unequal? The Effects of Machine Learning on Credit Markets.” Journal of Finance, 77 (2022), 547.Google Scholar
Gabaix, X.A Sparsity-Based Model of Bounded Rationality.” Quarterly Journal of Economics, 129 (2014), 16611710.Google Scholar
Gabaix, X.Behavioral Inattention.” In Handbook of Behavioral Economics: Applications and Foundations, Vol. 2. Amsterdam, Netherlands: Elsevier (2019), 261343.Google Scholar
Gabaix, X.A Behavioral New Keynesian Model.” American Economic Review, 110 (2020), 22712327.Google Scholar
Giannone, D.; Lenza, M.; and Primiceri, G. E.. “Economic Predictions with Big Data: The Illusion of Sparsity.” Econometrica, 89 (2021), 24092437.Google Scholar
Green, J.; Hand, J. R.; and Zhang, X. F.. “The Characteristics That Provide Independent Information About Average US Monthly Stock Returns.” Review of Financial Studies, 30 (2017), 43894436.Google Scholar
Grishchenko, O. V., and Rossi, M.. “The Role of Heterogeneity in Asset Pricing: The Effect of a Clustering Approach.” Journal of Business & Economic Statistics, 30 (2012), 297311.Google Scholar
Gu, S.; Kelly, B.; and Xiu, D.. “Empirical Asset Pricing via Machine Learning.” Review of Financial Studies, 33 (2020), 22232273.Google Scholar
Gu, S.; Kelly, B.; and Xiu, D.. “Autoencoder Asset Pricing Models.” Journal of Econometrics, 222 (2021), 429450.Google Scholar
Guecioueur, A. “How Do Investors Learn as Data Becomes Bigger? Evidence from a FinTech Platform.” Available at SSRN, 3708476 (2020).Google Scholar
Han, Y.; He, A.; Rapach, D.; and Zhou, G.. “Expected Stock Returns and Firm Characteristics: E-LASSO, Assessment, and Implications.” Available at SSRN, 3185335 (2021).Google Scholar
Hanna, R.; Mullainathan, S.; and Schwartzstein, J.. “Learning Through Noticing: Theory and Evidence from a Field Experiment.” Quarterly Journal of Economics, 129 (2014), 13111353.Google Scholar
Hastie, T.; Tibshirani, R.; and Friedman, J.. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed. Springer Science & Business Media (2009).Google Scholar
Hoberg, G., and Phillips, G.. “Text-Based Network Industries and Endogenous Product Differentiation.” Journal of Political Economy, 124 (2016), 14231465.Google Scholar
Hou, K.Industry Information Diffusion and the Lead–Lag Effect in Stock Returns.” Review of Financial Studies, 20 (2007), 11131138.Google Scholar
Hou, K., and Robinson, D. T.. “Industry Concentration and Average Stock Returns.” Journal of Finance, 61 (2006), 19271956.Google Scholar
Huang, S.; O’Hara, M.; and Zhong, Z.. “Innovation and Informed Trading: Evidence from Industry ETFs.” Review of Financial Studies, 34 (2021), 12801316.Google Scholar
Jiang, G.; Lee, C. M.; and Zhang, Y.. “Information Uncertainty and Expected Returns.” Review of Accounting Studies, 10 (2005), 185221.Google Scholar
Kapetanios, G.A Bootstrap Procedure for Panel Data Sets with Many Cross-Sectional Units.” Econometrics Journal, 11 (2008), 377395.Google Scholar
Karolyi, G. A., and Van Nieuwerburgh, S.. “New Methods for the Cross-Section of Returns.” Review of Financial Studies, 33 (2020), 18791890.Google Scholar
Kelly, B. T.; Pruitt, S.; and Su, Y.. “Characteristics are Covariances: A Unified Model of Risk and Return.” Journal of Financial Economics, 134 (2019), 501524.Google Scholar
Koijen, R. S. J., and Yogo, M.. “A Demand System Approach to Asset Pricing.” Journal of Political Economy, 127 (2019), 14751515.Google Scholar
Lee, J. D.; Sun, D. L.; Sun, Y.; and Taylor, J. E.. “Exact Post-Selection Inference, with Application to the Lasso.” Annals of Statistics, 44 (2016), 907927.Google Scholar
Lewellen, J.The Time-Series Relations Among Expected Return, Risk, and Book-to-Market.” Journal of Financial Economics, 54 (1999), 543.Google Scholar
Lewellen, J.The Cross-Section of Expected Stock Returns.” Critical Finance Review, 4 (2015), 144.Google Scholar
Lien, D., and Vuong, Q. H.. “Selecting the Best Linear Regression Model: A Classical Approach.” Working Paper No. 606, California Institute of Technology Social Science (1986).Google Scholar
Lustig, H.; Van Nieuwerburgh, S.; and Verdelhan, A.. “The Wealth-Consumption Ratio.” Review of Asset Pricing Studies, 3 (2013), 3894.Google Scholar
Menzly, L., and Ozbas, O.. “Market Segmentation and Cross-Predictability of Returns.” Journal of Finance, 65 (2010), 15551580.Google Scholar
Menzly, L.; Santos, T.; and Veronesi, P.. “Understanding Predictability.” Journal of Political Economy, 112 (2004), 147.Google Scholar
Merton, R. C.An Intertemporal Capital Asset Pricing Model.” Econometrica: Journal of the Econometric Society, 41 (1973), 867887.Google Scholar
Moskowitz, T. J., and Grinblatt, M.. “Do Industries Explain Momentum?Journal of Finance, 54 (1999), 12491290.Google Scholar
Nagel, S. Machine Learning in Asset Pricing. Princeton, NJ: Princeton University Press (2021).Google Scholar
Novy-Marx, R., “Testing Strategies Based on Multiple Signals.” Working Paper, University of Rochester (2016).Google Scholar
Novy-Marx, R., and Velikov, M.. “A Taxonomy of Anomalies and Their Trading Costs.” Review of Financial Studies, 29 (2016), 104147.Google Scholar
Patton, A. J., and Weller, B.. “Risk Price Variation: The Missing Half of Empirical Asset Pricing.” Review of Financial Studies, 35 (2022), 51275184.Google Scholar
Peng, L., and Xiong, W.. “Investor Attention, Overconfidence and Category Learning.” Journal of Financial Economics, 80 (2006), 563602.Google Scholar
Rapach, D., and Zhou, G.. “Forecasting Stock Returns.” In Handbook of Economic Forecasting, Vol. 2. Amsterdam, Netherlands: Elsevier (2013), 328383.Google Scholar
Rapach, D. E.; Strauss, J. K.; Tu, J.; and Zhou, G.. “Industry Return Predictability: A Machine Learning Approach.” Journal of Financial Data Science, 1 (2019), 928.Google Scholar
Rapach, D. E.; Strauss, J. K.; and Zhou, G.. “Out-of-Sample Equity Premium Prediction: Combination Forecasts and Links to the Real Economy.” Review of Financial Studies, 23 (2010), 821862.Google Scholar
Rapach, D. E., and Zhou, G.. “Chapter 1: Time-Series and Cross-Sectional Stock Return Forecasting: New Machine Learning Methods.” In Machine Learning for Asset Management. Hoboken, NJ: John Wiley & Sons (2020), 133.Google Scholar
Reis, R.Inattentive Consumers.” Journal of Monetary Economics, 53 (2006), 17611800.Google Scholar
Ross, S. A.The Arbitrage Theory of Capital Asset Pricing.” Journal of Economic Theory, 13 (1976), 341360.Google Scholar
Ross, S. A. Neoclassical Finance. Princeton, NJ: Princeton University Press (2005).Google Scholar
Rousseeuw, P. J.Silhouettes: A Graphical Aid to the Interpretation and Validation of Cluster Analysis.” Journal of Computational and Applied Mathematics, 20 (1987), 5365.Google Scholar
Santos, T., and Veronesi, P.. “Labor Income and Predictable Stock Returns.” Review of Financial Studies, 19 (2006), 144.Google Scholar
Sims, C. A.Implications of Rational Inattention.” Journal of Monetary Economics, 50 (2003), 665690.Google Scholar
Tibshirani, R. J.; Taylor, J.; Lockhart, R.; and Tibshirani, R.. “Exact Post-Selection Inference for Sequential Regression Procedures.” Journal of the American Statistical Association, 111 (2016), 600620.Google Scholar
Timmermann, A.Forecasting Methods in Finance.” Annual Review of Financial Economics, 10 (2018), 449479.Google Scholar
Welch, I., and Goyal, A.. “A Comprehensive Look at the Empirical Performance of Equity Premium Prediction.” Review of Financial Studies, 21 (2007), 14551508.Google Scholar
Zou, H., and Hastie, T.. “Regularization and Variable Selection via the Elastic Net.” Journal of the Royal Statistical Society: Series B (Methodological), 67 (2005), 301320.Google Scholar
Supplementary material: PDF

Evgeniou et al. supplementary material

Evgeniou et al. supplementary material

Download Evgeniou et al. supplementary material(PDF)
PDF 454.6 KB