Kernels for structured data: strings, trees, etc.

John Shawe-Taylor; Nello Cristianini

doi:10.1017/CBO9780511809682.012

11 - Kernels for structured data: strings, trees, etc.

from Part III - Constructing kernels

Published online by Cambridge University Press: 29 March 2011

John Shawe-Taylor and

Nello Cristianini

Show author details

John Shawe-Taylor: Affiliation:
University of Southampton
Nello Cristianini: Affiliation:
University of California, Davis

Book contents

Get access

Summary

Probably the most important data type after vectors and free text is that of symbol strings of varying lengths. This type of data is commonplace in bioinformatics applications, where it can be used to represent proteins as sequences of amino acids, genomic DNA as sequences of nucleotides, promoters and other structures. Partly for this reason a great deal of research has been devoted to it in the last few years. Many other application domains consider data in the form of sequences so that many of the techniques have a history of development within computer science, as for example in stringology, the study of string algorithms.

Kernels have been developed to compute the inner product between images of strings in high-dimensional feature spaces using dynamic programming techniques. Although sequences can be regarded as a special case of a more general class of structures for which kernels have been designed, we will discuss them separately for most of the chapter in order to emphasise their importance in applications and to aid understanding of the computational methods. In the last part of the chapter, we will show how these concepts and techniques can be extended to cover more general data structures, including trees, arrays, graphs and so on.

Certain kernels for strings based on probabilistic modelling of the data-generating source will not be discussed here, since Chapter 12 is entirely devoted to these kinds of methods. There is, however, some overlap between the structure kernels presented here and those arising from probabilistic modelling covered in Chapter 12.

Information

Type: Chapter
Information: Kernel Methods for Pattern Analysis , pp. 344 - 396

DOI: https://doi.org/10.1017/CBO9780511809682.012 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2004

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Book purchase

Temporarily unavailable

Accessibility standard: Unknown

Accessibility compliance for the PDF of this book is currently unknown and may be updated in the future.