Skip to main content Accessibility help
Internet Explorer 11 is being discontinued by Microsoft in August 2021. If you have difficulties viewing the site on Internet Explorer 11 we recommend using a different browser such as Microsoft Edge, Google Chrome, Apple Safari or Mozilla Firefox.

13th August 2024: Online ordering is currently unavailable due to technical issues, however alternative purchasing options are available.
As we resolve the issues resulting from this, we are also experiencing some delays to publication. We are working hard to restore services as soon as possible and apologise for the inconvenience. For further updates please visit our website .

Home
> Matrix decompositions and latent…

Chapter 18: Matrix decompositions and latent semantic indexing

Chapter 18: Matrix decompositions and latent semantic indexing

pp. 369-384

Authors

, Stanford University, California, , Google, Inc., , Universität Stuttgart
Resources available Unlock the full potential of this textbook with additional resources. There are Instructor restricted resources available for this textbook. Explore resources
  • Add bookmark
  • Cite
  • Share

Summary

On page 113, we introduced the notion of a term-document matrix: an M × N matrix C, each of whose rows represents a term and each of whose columns represents a document in the collection. Even for a collection of modest size, the term-document matrix C is likely to have several tens of thousands of rows and columns. In Section 18.1.1, we first develop a class of operations from linear algebra, known as matrix decomposition. In Section 18.2, we use a special form of matrix decomposition to construct a low-rank approximation to the term-document matrix. In Section 18.3 we examine the application of such low-rank approximations to indexing and retrieving documents, a technique referred to as latent semantic indexing. Although latent semantic indexing has not been established as a significant force in scoring and ranking for information retrieval (IR), it remains an intriguing approach to clustering in a number of domains including for collections of text documents (Section 16.6, page 343). Understanding its full potential remains an area of active research.

Readers who do not require a refresher on linear algebra may skip Section 18.1, although Example 18.1 is especially recommended as it highlights a property of eigenvalues that we exploit later in the chapter.

Linear algebra review

We briefly review some necessary background in linear algebra. Let C be an M × N matrix with real-valued entries; for a term–document matrix, all entries are in fact non-negative.

About the book

Access options

Review the options below to login to check your access.

Purchase options

eTextbook
US$71.99
Hardback
US$71.99

Have an access code?

To redeem an access code, please log in with your personal login.

If you believe you should have access to this content, please contact your institutional librarian or consult our FAQ page for further information about accessing our content.

Also available to purchase from these educational ebook suppliers