Introduction to Information Retrieval

Christopher D. Manning; Prabhakar Raghavan; Hinrich Schütze

doi:10.1017/CBO9780511809071

Chapter 18: Matrix decompositions and latent semantic indexing

pp. 369-384

Christopher D. Manning

, Stanford University, California,

Prabhakar Raghavan

, Google, Inc.,

Hinrich Schütze

, Universität Stuttgart

Get access

Add bookmark
Cite
Share

Summary

On page 113, we introduced the notion of a term-document matrix: an M × N matrix C, each of whose rows represents a term and each of whose columns represents a document in the collection. Even for a collection of modest size, the term-document matrix C is likely to have several tens of thousands of rows and columns. In Section 18.1.1, we first develop a class of operations from linear algebra, known as matrix decomposition. In Section 18.2, we use a special form of matrix decomposition to construct a low-rank approximation to the term-document matrix. In Section 18.3 we examine the application of such low-rank approximations to indexing and retrieving documents, a technique referred to as latent semantic indexing. Although latent semantic indexing has not been established as a significant force in scoring and ranking for information retrieval (IR), it remains an intriguing approach to clustering in a number of domains including for collections of text documents (Section 16.6, page 343). Understanding its full potential remains an area of active research.

Readers who do not require a refresher on linear algebra may skip Section 18.1, although Example 18.1 is especially recommended as it highlights a property of eigenvalues that we exploit later in the chapter.

Linear algebra review

We briefly review some necessary background in linear algebra. Let C be an M × N matrix with real-valued entries; for a term–document matrix, all entries are in fact non-negative.

About the book

Chapter DOI https://doi.org/10.1017/CBO9780511809071.019
Book DOI https://doi.org/10.1017/CBO9780511809071
Subjects Computer Science,Data Science, Databases, Data Mining, and Information Retrieval
Format: Hardback
- Publication date: 07 July 2008
- ISBN: 9780521865715
Format: Digital
- Publication date: 05 June 2012
- ISBN: 9780511809071
Find out more details about this book

Access options

Review the options below to login to check your access.

Purchase options

eTextbook

US$76.00

Hardback

US$76.00

Have an access code?

To redeem an access code, please log in with your personal login.

If you believe you should have access to this content, please contact your institutional librarian or consult our FAQ page for further information about accessing our content.

Also available to purchase from these educational ebook suppliers