Skip to main content Accessibility help
Internet Explorer 11 is being discontinued by Microsoft in August 2021. If you have difficulties viewing the site on Internet Explorer 11 we recommend using a different browser such as Microsoft Edge, Google Chrome, Apple Safari or Mozilla Firefox.

Last updated 16 July 2024: Online ordering is currently unavailable due to technical issues. We apologise for any delays responding to customers while we resolve this. Alternative purchasing options are available . For further updates please visit our website: https://www.cambridge.org/news-and-insights/technical-incident

Home
> Probabilistic information retrieval

Chapter 11: Probabilistic information retrieval

Chapter 11: Probabilistic information retrieval

pp. 201-217

Authors

, Stanford University, California, , Google, Inc., , Universität Stuttgart
  • Add bookmark
  • Cite
  • Share

Summary

During the discussion of relevance feedback in Section 9.1.2, we observed that if we have some known relevant and nonrelevant documents, then we can straightforwardly start to estimate the probability of a term t appearing in a relevant document P(t|R = 1), and that this could be the basis of a classifier that decides whether documents are relevant or not. In this chapter, we more systematically introduce this probabilistic approach to information retrieval (IR), which provides a different formal basis for a retrieval model and results in different techniques for setting term weights.

Users start with information needs, which they translate into query representations. Similarly, there are documents, which are converted into document representations (the latter differing at least by how text is tokenized, but perhaps containing fundamentally less information, as when a nonpositional index is used). Based on these two representations, a system tries to determine how well documents satisfy information needs. In the Boolean or vector space models of IR, matching is done in a formally defined but semantically imprecise calculus of index terms. Given only a query, an IR system has an uncertain understanding of the information need. Given the query and document representations, a system has an uncertain guess of whether a document has content relevant to the information need. Probability theory provides a principled foundation for such reasoning under uncertainty.

About the book

Access options

Review the options below to login to check your access.

Purchase options

Purchasing is temporarily unavailable, please try again later

Have an access code?

To redeem an access code, please log in with your personal login.

If you believe you should have access to this content, please contact your institutional librarian or consult our FAQ page for further information about accessing our content.

Also available to purchase from these educational ebook suppliers