Skip to main content Accessibility help
×
Hostname: page-component-78c5997874-v9fdk Total loading time: 0 Render date: 2024-11-18T10:26:37.859Z Has data issue: false hasContentIssue false

VI - Information Extraction

Published online by Cambridge University Press:  08 August 2009

Ronen Feldman
Affiliation:
Bar-Ilan University, Israel
James Sanger
Affiliation:
ABS Ventures, Boston, Massachusetts
Get access

Summary

INTRODUCTION TO INFORMATION EXTRACTION

A mature IE technology would allow rapid creation of extraction systems for new tasks whose performance would approach a human level. Nevertheless, even systems without near perfect recall and precision can be of real value. In such cases, the results of the IE system would need to be fed into an auditing environment to allow auditors to fix the system's precision (an easy task) and recall (much harder) errors. These types of systems would also be of value in cases in which the information is too vast for the users to be able to read all of it; hence, even a partially correct IE system would be preferable to the alternative of not obtaining any potentially relevant information. In general, IE systems are useful if the following conditions are met:

  • The information to be extracted is specified explicitly and no further inference is needed.

  • A small number of templates are sufficient to summarize the relevant parts of the document.

  • The needed information is expressed relatively locally in the text (check Bagga and Biermann 2000).

As a first step in tagging documents for text mining systems, each document is processed to find (i.e., extract) entities and relationships that are likely to be meaningful and content-bearing. The term relationships here denotes facts or events involving certain entities.

By way of example, a possible event might be a company's entering into a joint venture to develop a new drug.

Type
Chapter
Information
The Text Mining Handbook
Advanced Approaches in Analyzing Unstructured Data
, pp. 94 - 130
Publisher: Cambridge University Press
Print publication year: 2006

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Save book to Kindle

To save this book to your Kindle, first ensure coreplatform@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

  • Information Extraction
  • Ronen Feldman, Bar-Ilan University, Israel, James Sanger, ABS Ventures, Boston, Massachusetts
  • Book: The Text Mining Handbook
  • Online publication: 08 August 2009
  • Chapter DOI: https://doi.org/10.1017/CBO9780511546914.007
Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

  • Information Extraction
  • Ronen Feldman, Bar-Ilan University, Israel, James Sanger, ABS Ventures, Boston, Massachusetts
  • Book: The Text Mining Handbook
  • Online publication: 08 August 2009
  • Chapter DOI: https://doi.org/10.1017/CBO9780511546914.007
Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

  • Information Extraction
  • Ronen Feldman, Bar-Ilan University, Israel, James Sanger, ABS Ventures, Boston, Massachusetts
  • Book: The Text Mining Handbook
  • Online publication: 08 August 2009
  • Chapter DOI: https://doi.org/10.1017/CBO9780511546914.007
Available formats
×