Book contents
- Frontmatter
- Contents
- Preface
- 1 Introduction
- Part one Pattern Classification with Binary-Output Neural Networks
- Part two Pattern Classification with Real-Output Networks
- 9 Classification with Real-Valued Functions
- 10 Covering Numbers and Uniform Convergence
- 11 The Pseudo-Dimension and Fat-Shattering Dimension
- 12 Bounding Covering Numbers with Dimensions
- 13 The Sample Complexity of Classification Learning
- 14 The Dimensions of Neural Networks
- 15 Model Selection
- Part three Learning Real-Valued Functions
- Part four Algorithmics
- Appendix 1 Useful Results
- Bibliography
- Author index
- Subject index
14 - The Dimensions of Neural Networks
Published online by Cambridge University Press: 26 February 2010
- Frontmatter
- Contents
- Preface
- 1 Introduction
- Part one Pattern Classification with Binary-Output Neural Networks
- Part two Pattern Classification with Real-Output Networks
- 9 Classification with Real-Valued Functions
- 10 Covering Numbers and Uniform Convergence
- 11 The Pseudo-Dimension and Fat-Shattering Dimension
- 12 Bounding Covering Numbers with Dimensions
- 13 The Sample Complexity of Classification Learning
- 14 The Dimensions of Neural Networks
- 15 Model Selection
- Part three Learning Real-Valued Functions
- Part four Algorithmics
- Appendix 1 Useful Results
- Bibliography
- Author index
- Subject index
Summary
Introduction
In this chapter we bound the pseudo-dimension and the fat-shattering dimension of the function classes computed by certain neural networks. The pseudo-dimension bounds follow easily from VC-dimension bounds obtained earlier, so these shall not detain us for long. Of more importance are the bounds we obtain on the fat-shattering dimension. We derive these bounds by bounding certain covering numbers. Later in the book, we shall use these covering number bounds directly.
We bound the covering numbers and fat-shattering dimensions for networks that are fully connected between adjacent layers, that have units with a bounded activation function satisfying a Lipschitz constraint, and that have all weights (or all weights in certain layers) constrained to be small. We give two main results on the covering numbers and fat-shattering dimensions of networks of this type. In Section 14.3 we give bounds in terms of the number of parameters in the network. In contrast, Section 14.4 gives bounds on the fat-shattering dimension that instead grow with the bound on the size of the parameters and, somewhat surprisingly, are independent of the number of parameters in the network. This result is consistent with the intuition we obtain by studying networks of linear units (units with the identity function as their activation function). For a network of this kind, no matter how large, the function computed by the network is a linear combination of the input variables, and so its pseudo-dimension does not increase with the number of parameters.
- Type
- Chapter
- Information
- Neural Network LearningTheoretical Foundations, pp. 193 - 217Publisher: Cambridge University PressPrint publication year: 1999