Data Integration

Serge Abiteboul; Ioana Manolescu; Philippe Rigaux; Marie-Christine Rousset; Pierre Senellart

doi:10.1017/CBO9780511998225.010

9 - Data Integration

from Part 2 - Web Data Semantics and Integration

Published online by Cambridge University Press: 05 June 2012

Serge Abiteboul ,

Ioana Manolescu ,

Philippe Rigaux ,

Marie-Christine Rousset and

Pierre Senellart

Show author details

Serge Abiteboul: Affiliation:
INRIA Saclay – Île-de- France
Ioana Manolescu: Affiliation:
INRIA Saclay – Île-de- France
Philippe Rigaux: Affiliation:
Conservatoire Nationale des Arts et Metiers, Paris
Marie-Christine Rousset: Affiliation:
Université de Grenoble, France
Pierre Senellart: Affiliation:
Télécom ParisTech, France

Book contents

Get access

Summary

INTRODUCTION

The goal of data integration is to provide a uniform access to a set of autonomous and possibly heterogeneous data sources in a particular application domain. This is typically what we need when, for instance, querying the deep web that is composed of a plethora of databases accessible through Web forms. We would like to be able with a single query to find relevant data no matter which database provides it.

A first issue for data integration (that will be ignored here) is social: The owners of some data set may be unwilling to fully share it and be reluctant to participate in a data integration system. Also, from a technical viewpoint, the difficulty comes from the lack of interoperability between the data sources, that may use a variety of formats, specific query-processing capabilities, different protocols. However, the real bottleneck for data integration is logical. It comes from the so-called semantic heterogeneity between the data sources. They typically organize data using different schemas even in the same application domain. For instance, each university or educational institution may choose to model students and teaching programs in its own way. A French university may use the social security number to identify students and the attributes nom, prenom, whereas the Erasmus database about European students may use a European student number and the attributes firstname, lastname, and home university.

In this chapter, we study data integration in the mediator approach. In this approach, data remain exclusively in data sources and are obtained when the system is queried.

Type: Chapter
Information: Web Data Management , pp. 196 - 230

DOI: https://doi.org/10.1017/CBO9780511998225.010 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2011

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Book contents

9 - Data Integration

Summary

Access options

Save book to Kindle

Save book to Dropbox

Save book to Google Drive