Skip to main content Accessibility help
×
Hostname: page-component-7479d7b7d-8zxtt Total loading time: 0 Render date: 2024-07-16T00:27:07.448Z Has data issue: false hasContentIssue false

10 - Web retrieval, ranking and personalization

Published online by Cambridge University Press:  08 June 2018

Jaime Teevan
Affiliation:
Microsoft Research
Susan Dumais
Affiliation:
Microsoft Research
Get access

Summary

Web search is unique

Web search is different from other types of information retrieval. The scale and diversity of web content is several orders of magnitude larger than what is found in traditional information retrieval corpora. Web corpora contain billions of web pages with content ranging from images to blogs to technical articles. Similarly, the scale and variety of the people who issue web queries and the tasks that underlie those queries are immense. Web search tools are used for everything from simple navigational queries to complex research tasks that extend over time. Although there are some common motivations, strategies, tasks and information targets, there is a long tail of uncommon behavior that accounts for a significant portion of all web search interactions. We begin this chapter by looking more closely at what makes web search unique. We then show how these unique aspects have led to interesting approaches to ranking, including the use of machine learning and rich features. We describe how people interact with web search results, how rich behavioral data can be used to personalize the search experience, and the challenges and opportunities that these capabilities pose for evaluation. We conclude by looking at emerging trends in web search to give a glimpse of what web search tools of the future might look like.

Large-scale, diverse content

The web is very large; the number of documents it contains is estimated in the tens of billions, with many pages generated on the fly based on underlying databases or user interaction. Before the rise of the web, search engines typically only dealt with static corpora of, at most, millions of documents. The billions of web documents search tools must contend with come in a larger variety of formats than typically dealt with in traditional information retrieval, including blogs, online stores, government pages, forums and news sites. Text can be extracted from many of these document types (e.g. HTML, PDF, Word), but not necessarily from all (e.g. images, video). The content people search for within this large and diverse corpus is broad. In addition to documents, people look for particular images, answers to their questions, entities, templates and applications.

Type
Chapter

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Save book to Kindle

To save this book to your Kindle, first ensure coreplatform@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×