Introduction to Information Retrieval

Christopher D. Manning; Prabhakar Raghavan; Hinrich Schütze

doi:10.1017/CBO9780511809071

Chapter 13: Text classification and Naive Bayes

pp. 234-265

Christopher D. Manning

, Stanford University, California,

Prabhakar Raghavan

, Google, Inc.,

Hinrich Schütze

, Universität Stuttgart

Get access

Add bookmark
Cite
Share

Summary

Thus far, this book has mainly discussed the process of ad hoc retrieval, where users have transient information needs that they try to address by posing one or more queries to a search engine. However, many users have ongoing information needs. For example, you might need to track developments in multicore computer chips. One way of doing this is to issue the query multicore and computer and chip against an index of recent newswire articles each morning. In this and the following two chapters we examine the question: How can this repetitive task be automated? To this end, many systems support standing queries. A standing query is like any other query except that it is periodically executed on a collection to which new documents are incrementally added over time.

If your standing query is just multicore and computer and chip, you will tend to miss many relevant new articles which use other terms such as multicore processors. To achieve good recall, standing queries thus have to be refined over time and can gradually become quite complex. In this example, using a Boolean search engine with stemming, you might end up with a query like (multicore or multi-core) and (chip or processor or microprocessor).

To capture the generality and scope of the problem space to which standing queries belong, we now introduce the general notion of a classification problem. Given a set of classes, we seek to determine which class(es) a given object belongs to.

About the book

Chapter DOI https://doi.org/10.1017/CBO9780511809071.014
Book DOI https://doi.org/10.1017/CBO9780511809071
Subjects Computer Science,Data Science, Databases, Data Mining, and Information Retrieval
Format: Hardback
- Publication date: 07 July 2008
- ISBN: 9780521865715
Format: Digital
- Publication date: 05 June 2012
- ISBN: 9780511809071
Find out more details about this book

Access options

Review the options below to login to check your access.

Purchase options

eTextbook

US$76.00

Hardback

US$76.00

Have an access code?

To redeem an access code, please log in with your personal login.

If you believe you should have access to this content, please contact your institutional librarian or consult our FAQ page for further information about accessing our content.

Also available to purchase from these educational ebook suppliers