Research Themes
Integrating and sharing information across disparate sources entails
several significant challenges: relevant data objects are split across multiple
information sources, and often owned by different organizations. The sources
represent, maintain, and export the information using a variety of formats,
interfaces and semantics. In the two last decades, much research work
in the database community has been devoted to developing approaches and
systems that enable information integration in heterogeneous and distributed
environments.
Figure 1 – A mediation-based architecture
Currently, one of the mainstream approach for information gathering
and analysis across several distributed and heterogeneous sources is the
use of mediation based architecture. In this architecture (c.f., figure 1),
native information sources are ‘‘wrapped” into logical views that represent
uniform interfaces to access the underlying sources. Then, an integrated
global schema is constructed to provide a single entry point to query the
available information sources. Although mediation architecture are certainly
very useful in many application contexts, they present fundamental limitations
to cope with the rapid growth of a highly dynamic information space such
as the Web. Clearly, a brute force data integration approach, where the
development of an integrated schema requires the understanding of both
structure and semantics of all the sources to be integrated, is hardly applicable
because of the dynamic nature and size of the Web. Therefore, the effective
sharing of a potentially large number of distributed, heterogeneous and dynamic
information sources, requires more scalable and flexible data sharing and
querying techniques. To cope with this issue, there has been a renewed interest
in leveraging traditional data integration techniques. In particular, an
emerging new research direction, around the notion of peer to peer (P2P)
model, aims at developing a decentralized data sharing architecture where,
rather than requiring the use of a single integrated schema to share data,
allows peers to define semantic mappings between pairs of peers. On another
side, the emerging semantic web efforts propose to employ machine-understandable
abstractions for the representation of resource semantics. In particular,
the semantic web promotes the use of ontologies as a tool for reconciling
semantic heterogeneity between web resources. Despite these early efforts,
many of the P2P and semantic web objectives, including improving the technology
to organise, search, integrate, and evolve large scaled web-accessible sources,
remain difficult to achieve.
In this project, we plan to focus on three research issues:
(i) Flexibility of integration, mainly by investigating
approaches that enable automatic schemas mapping discovery in order to facilitate
the integration task,
(ii) Flexibility and scalability of query processing: our
aim is the design and development of flexible and efficient algorithms to
handle query processing in presence of large number of information sources
and
(iii) Flexibility at semantic level, by investigating the
mapping graph restructuring and their impact on query optimization.
The expertises required for this project are:
Data mediation (LIRMM, LIRIS)
Knowledge representation (LIRIS, LIMOS)
Optimisation and restructuration (LIRMM, LIRIS)
Flexible Query Rewriting (LIMOS, IRISA/ENSST)
Application (CEMAGREF)