Putting into Practice: Full-Text Indexing with L<span class='sc'>ucene</span>

doi:10.1017/CBO9780511998225.018

17 - Putting into Practice: Full-Text Indexing with Lucene

from Part 3 - Building Web Scale Applications

Published online by Cambridge University Press: 05 June 2012

Serge Abiteboul ,

Ioana Manolescu ,

Philippe Rigaux ,

Marie-Christine Rousset and

Pierre Senellart

Nicolas Travers

Show author details

Serge Abiteboul: Affiliation:
INRIA Saclay – Île-de- France
Ioana Manolescu: Affiliation:
INRIA Saclay – Île-de- France
Philippe Rigaux: Affiliation:
Conservatoire Nationale des Arts et Metiers, Paris
Marie-Christine Rousset: Affiliation:
Université de Grenoble, France
Pierre Senellart: Affiliation:
Télécom ParisTech, France

Book contents

Get access

Summary

Lucene is an open-source tunable indexing platform often used for full-text indexing of Web sites. It implements an inverted index, creating posting lists for each term of the vocabulary. This chapter proposes some exercises to discover the Lucene platform and test its functionalities through its Java API.

PRELIMINARY: A LUCENE SANDBOX

We provide a simple graphical interface that lets you capture a collection of Web documents (from a given Web site), index it, and search for documents matching a keyword query. The tool is implemented with Lucene (surprise!) and helps to assess the impact of the search parameters, including ranking factors.

You can download the program from our Web site. It consists of a Java archive that can be executed right away (provided you have a decent Java installation on your computer). Figure 17.1 shows a screenshot of the main page. It allows you to

Download a set of documents collected from a given URL (including local addresses),
Index and query those documents,
Consult the information used by Lucene to present ranked results.

Use this tool as a preliminary contact with full text search and information retrieval. The projects proposed at the end of the chapter give some suggestions to realize a similar application.

INDEXING PLAIN TEXT WITH LUCENE – A FULL EXAMPLE

We embark now in a practical experimentation with Lucene. First, download the Java packages from the Web site http://lucene.apache.org/java/docs/.

Type: Chapter
Information: Web Data Management , pp. 364 - 373

DOI: https://doi.org/10.1017/CBO9780511998225.018 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2011

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Book contents

17 - Putting into Practice: Full-Text Indexing with Lucene

Summary

Access options

Save book to Kindle

Save book to Dropbox

Save book to Google Drive