Skip to main content Accessibility help
Internet Explorer 11 is being discontinued by Microsoft in August 2021. If you have difficulties viewing the site on Internet Explorer 11 we recommend using a different browser such as Microsoft Edge, Google Chrome, Apple Safari or Mozilla Firefox.

Last updated 16 July 2024: Online ordering is currently unavailable due to technical issues. We apologise for any delays responding to customers while we resolve this. Alternative purchasing options are available . For further updates please visit our website: https://www.cambridge.org/news-and-insights/technical-incident

Home
> Scoring, term weighting, and…

Chapter 6: Scoring, term weighting, and the vector space model

Chapter 6: Scoring, term weighting, and the vector space model

pp. 100-123

Authors

, Stanford University, California, , Google, Inc., , Universität Stuttgart
  • Add bookmark
  • Cite
  • Share

Summary

Thus far, we have dealt with indexes that support Boolean queries: A document either matches or does not match a query. In the case of large document collections, the resulting number of matching documents can far exceed the number a human user could possibly sift through. Accordingly, it is essential for a search engine to rank-order the documents matching a query. To do this, the search engine computes, for each matching document, a score with respect to the query at hand. In this chapter, we initiate the study of assigning a score to a (query, document) pair. This chapter consists of three main ideas.

  • We introduce parametric and zone indexes in Section 6.1, which serve two purposes. First, they allow us to index and retrieve documents by metadata, such as the language in which a document is written. Second, they give us a simple means for scoring (and thereby ranking) documents in response to a query.

  • Next, in Section 6.2 we develop the idea of weighting the importance of a term in a document, based on the statistics of occurrence of the term.

  • In Section 6.3, we show that by viewing each document as a vector of such weights, we can compute a score between a query and each document. This view is known as vector space scoring.

  • Section 6.4 develops several variants of term-weighting for the vector space model. Chapter 7 develops computational aspects of vector space scoring and related topics.

    About the book

    Access options

    Review the options below to login to check your access.

    Purchase options

    Purchasing is temporarily unavailable, please try again later

    Have an access code?

    To redeem an access code, please log in with your personal login.

    If you believe you should have access to this content, please contact your institutional librarian or consult our FAQ page for further information about accessing our content.

    Also available to purchase from these educational ebook suppliers