Book contents
- Frontmatter
- Contents
- Introduction
- Part 1 Modeling Web Data
- Part 2 Web Data Semantics and Integration
- 7 Ontologies, RDF, and OWL
- 8 Querying Data Through Ontologies
- 9 Data Integration
- 10 Putting into Practice: Wrappers and Data Extraction with XSLT
- 11 Putting into Practice: Ontologies in Practice
- 12 Putting into Practice: Mashups with Yahoo! Pipes and XProc
- Part 3 Building Web Scale Applications
- Bibliography
- Index
9 - Data Integration
from Part 2 - Web Data Semantics and Integration
Published online by Cambridge University Press: 05 June 2012
- Frontmatter
- Contents
- Introduction
- Part 1 Modeling Web Data
- Part 2 Web Data Semantics and Integration
- 7 Ontologies, RDF, and OWL
- 8 Querying Data Through Ontologies
- 9 Data Integration
- 10 Putting into Practice: Wrappers and Data Extraction with XSLT
- 11 Putting into Practice: Ontologies in Practice
- 12 Putting into Practice: Mashups with Yahoo! Pipes and XProc
- Part 3 Building Web Scale Applications
- Bibliography
- Index
Summary
INTRODUCTION
The goal of data integration is to provide a uniform access to a set of autonomous and possibly heterogeneous data sources in a particular application domain. This is typically what we need when, for instance, querying the deep web that is composed of a plethora of databases accessible through Web forms. We would like to be able with a single query to find relevant data no matter which database provides it.
A first issue for data integration (that will be ignored here) is social: The owners of some data set may be unwilling to fully share it and be reluctant to participate in a data integration system. Also, from a technical viewpoint, the difficulty comes from the lack of interoperability between the data sources, that may use a variety of formats, specific query-processing capabilities, different protocols. However, the real bottleneck for data integration is logical. It comes from the so-called semantic heterogeneity between the data sources. They typically organize data using different schemas even in the same application domain. For instance, each university or educational institution may choose to model students and teaching programs in its own way. A French university may use the social security number to identify students and the attributes nom, prenom, whereas the Erasmus database about European students may use a European student number and the attributes firstname, lastname, and home university.
In this chapter, we study data integration in the mediator approach. In this approach, data remain exclusively in data sources and are obtained when the system is queried.
- Type
- Chapter
- Information
- Web Data Management , pp. 196 - 230Publisher: Cambridge University PressPrint publication year: 2011