Book contents
3 - Memory and Similarity
Published online by Cambridge University Press: 22 September 2009
Summary
An MBLP system as introduced in the previous chapters has two components: a learning component which is memory-based, and a performance component which is similarity-based. The learning component is memorybased as it involves storing examples in memory (also called the instance base or case base) without abstraction, selection, or restructuring. In the performance component of an MBLP system the stored examples are used as a basis for mapping input to output; input instances are classified by assigning them an output label. During classification, a previously unseen test instance is presented to the system. The class of this instance is determined on the basis of an extrapolation from the most similar example(s) in memory. There are different ways in which this approach can be operationalized. The goal of this chapter is twofold: to provide a clear definition of the operationalizations we have found to work well for NLP tasks, and to provide an introduction to TIMBL, a software package implementing all algorithms and metrics discussed in this book. The emphasis on hands-on use of software in a book such as this deserves some justification. Although our aims are mainly theoretical in showing that MBLP has the right bias for solving NLP tasks on the basis of argumentation and experiment, we believe that the strengths and limitations of any algorithm can only be understood in sufficient depth by experimenting with this specific algorithm.
- Type
- Chapter
- Information
- Memory-Based Language Processing , pp. 26 - 56Publisher: Cambridge University PressPrint publication year: 2005