Research Themes



Integrating and sharing information across disparate sources entails several significant challenges: relevant data objects are split across multiple information sources, and often owned by different organizations. The sources represent, maintain, and export the information using a variety of formats, interfaces and semantics. In the two last decades, much research work in the database community has been devoted to developing approaches and systems that enable information integration in heterogeneous and distributed environments.

mediation Schema Architecture
Figure 1 – A mediation-based architecture

 Currently, one of the mainstream approach for information gathering and analysis across several distributed and heterogeneous sources is the use of mediation based architecture. In this architecture (c.f., figure 1), native information sources are ‘‘wrapped” into logical views that represent uniform interfaces to access the underlying sources. Then, an integrated global schema is constructed to provide a single entry point to query the available information sources. Although mediation architecture are certainly very useful in many application contexts, they present fundamental limitations to cope with the rapid growth of a highly dynamic information space such as the Web. Clearly, a brute force data integration approach, where the development of an integrated schema requires the understanding of both structure and semantics of all the sources to be integrated, is hardly applicable because of the dynamic nature and size of the Web. Therefore, the effective sharing of a potentially large number of distributed, heterogeneous and dynamic information sources, requires more scalable and flexible data sharing and querying techniques. To cope with this issue, there has been a renewed interest in leveraging traditional data integration techniques. In particular, an emerging new research direction, around the notion of peer to peer (P2P) model, aims at developing a decentralized data sharing architecture where, rather than requiring the use of a single integrated schema to share data, allows peers to define semantic mappings between pairs of peers. On another side, the emerging semantic web efforts propose to employ machine-understandable abstractions for the representation of resource semantics. In particular, the semantic web promotes the use of ontologies as a tool for reconciling semantic heterogeneity between web resources. Despite these early efforts, many of the P2P and semantic web objectives, including improving the technology to organise, search, integrate, and evolve large scaled web-accessible sources, remain difficult to achieve.
In this project, we plan to focus on three research issues:

(i)    Flexibility of integration, mainly by investigating approaches that enable automatic schemas mapping discovery in order to facilitate the integration task,
(ii)    Flexibility and scalability of query processing: our aim is the design and development of flexible and efficient algorithms to handle query processing in presence of large number of information sources and
(iii)    Flexibility at semantic level, by investigating the mapping graph restructuring and their impact on query optimization.

The expertises required for this project are:

  • Data mediation (LIRMM, LIRIS)
  • Knowledge representation (LIRIS, LIMOS)
  • Optimisation and restructuration (LIRMM, LIRIS)
  • Flexible Query Rewriting (LIMOS, IRISA/ENSST)
  • Application (CEMAGREF)