Book contents
- Frontmatter
- Contents
- Preface
- Notation
- 1 The Learning Methodology
- 2 Linear Learning Machines
- 3 Kernel-Induced Feature Spaces
- 4 Generalisation Theory
- 5 Optimisation Theory
- 6 Support Vector Machines
- 7 Implementation Techniques
- 8 Applications of Support Vector Machines
- A Pseudocode for the SMO Algorithm
- B Background Mathematics
- References
- Index
7 - Implementation Techniques
Published online by Cambridge University Press: 05 March 2013
- Frontmatter
- Contents
- Preface
- Notation
- 1 The Learning Methodology
- 2 Linear Learning Machines
- 3 Kernel-Induced Feature Spaces
- 4 Generalisation Theory
- 5 Optimisation Theory
- 6 Support Vector Machines
- 7 Implementation Techniques
- 8 Applications of Support Vector Machines
- A Pseudocode for the SMO Algorithm
- B Background Mathematics
- References
- Index
Summary
In the previous chapter we showed how the training of a Support Vector Machine can be reduced to maximising a convex quadratic form subject to linear constraints. Such convex quadratic programmes have no local maxima and their solution can always be found efficiently. Furthermore this dual representation of the problem showed how the training could be successfully effected even in very high dimensional feature spaces. The problem of minimising differentiable functions of many variables has been widely studied, especially in the convex case, and most of the standard approaches can be directly applied to SVM training. However, in many cases specific techniques have been developed to exploit particular features of this problem. For example, the large size of the training sets typically used in applications is a formidable obstacle to a direct use of standard techniques, since just storing the kernel matrix requires a memory space that grows quadratically with the sample size, and hence exceeds hundreds of megabytes even when the sample size is just a few thousand points.
Such considerations have driven the design of specific algorithms for Support Vector Machines that can exploit the sparseness of the solution, the convexity of the optimisation problem, and the implicit mapping into feature space. All of these features help to create remarkable computational efficiency. The elegant mathematical characterisation of the solutions can be further exploited to provide stopping criteria and decomposition procedures for very large datasets.
In this chapter we will briefly review some of the most common approaches before describing in detail one particular algorithm, Sequential Minimal Optimisation (SMO), that has the additional advantage of not only being one of the most competitive but also being simple to implement. As an exhaustive discussion of optimisation algorithms is not possible here, a number of pointers to relevant literature and on-line software is provided in Section 7.8.
- Type
- Chapter
- Information
- Publisher: Cambridge University PressPrint publication year: 2000