5 - Link Analysis for the World Wide Web
from Part III - Graph-Based Information Retrieval
Published online by Cambridge University Press: 01 June 2011
Summary
This chapter addresses link-analysis methods used by search engines, such as PageRank and HITS, and covers topics relevant to their application, including method stability, the combination of link- and content-based models, topic-sensitive ranking, and query-dependent link analysis.
The Web as a Graph
The Web – a common abbreviation for the World Wide Web – consists of billions of interlinked hypertext pages. These pages contain text, images, videos, or sounds and are usually viewed using Web browsers, such as Firefox or Internet Explorer. Users can navigate the Web by either directly typing the address of a Web page (i.e., the URL) inside a browser or following the links that connect Web pages among them.
The Web is a typical example of a graph, with Web pages corresponding to vertices in the graph and links between pages corresponding to directed edges. For instance, if the page http://www.unt.edu includes a link to the page http://www.cs.unt.edu and another to the page http://www.htsc.unt.edu, and the latter page in turn links to the page of the National Institutes of Health http://www.nih.gov and also back to the http://www.unt.edu page, it means that these four pages form a subgraph of four vertices with four edges, as illustrated in Figure 5.1.
Although the size of the Web is generally considered to be unknown, there are various estimates concerning the size of the indexed Web – that is, the subset of the Web that is covered by search engines.
- Type
- Chapter
- Information
- Publisher: Cambridge University PressPrint publication year: 2011