Bases de Données Avancées 2011

Rabat, Maroc, 24-27 oct. 2011

Programme

 

Lundi 24 Octobre

 

08h15 - 09h00 : Enregistrement et café de bienvenue
09h00 - 09h30 : Ouverture des journées
09h30 - 10h30 : Keynote 1 : Data Management in the Cloud (Amr El Abbadi, University of California, Santa Barbara)
10h30 - 11h00 : Pause
11h00 - 12h30 : Session 1 : Flux d'informations sur le WEB
12h30 - 14h00 : Déjeuner
14h00 - 16h00 : Session 2 : Sémantique des données et des services
16h30 - 18h00 : Session 3 : Protection des données et de la vie privée
18h30 : Visite de l’UIR et Cocktail (Départ en bus devant l’hôtel Sofitel à 18h)

 

Mardi 25 Octobre

 

08h30 - 09h00 : Café de bienvenue
09h00 - 10h00 : Keynote 2 : The Story of VectorWise, a database spin-off from CWI (Peter A. Boncz, CWI and Vrije Universiteit, Amsterdam) (slides)
10h00 - 10h30 : Pause

10h30 - 12h30 : Session 4 : Interrogation et archivage de données du WEB
12h30 - 14h00 : Déjeuner
14h00 - 15h45 : Session 5 : Pair-à-pair et performance
15h45 - 16h15 : Pause
16h15 - 17h45 : Démos + Posters
19h30 : Banquet (Départ en bus devant l’hôtel Sofitel à 19h)

 

Mercredi 26 Octobre

 

Journée organisée à l’UIR - Départ à 8h30*
09h00 - 10h30 : Session 6 : Traitement de données semi-structurées
10h30 - 11h00 : Pause

11h00 - 12h30 : Session 7 : Potpourri
12h30 - 14h00 : Déjeuner
14h00 - 16h00 : Tutoriel 1 : Introduction to Probabilistic Data Management (Evgeny Kharlamov - Free University of Bozen-Bolzano & University of Oxford, Pierre Senellart - Télécom ParisTech) (slides)
16h00 - 16h30 : Pause
16h30 - 18h00 : Tutoriel 2 : P2P Techniques for Decentralized Applications (Esther Pacitti, LIRMM & INRIA, University of Montpellier 2) (slides)

*Le transport en bus est assuré. Information sur place

 

Jeudi 27 Octobre

 

Journée organisée à l’UIR - Départ à 8h30*
09h00 - 10h30 :Tutoriel 3 : Mise en place d'un système décisionnel basé sur un SI Hospitalier (Samir Belfkih, Université Internationale de Rabat)
10h30 - 11h00 : Pause

11h00 - 12h30 :Tutoriel 4 : New Challenges in Business Intelligence (Eric Simon, SAP)
12h30 - 14h00 : Déjeuner
14h00 - 16h00 : Tutoriel 5 : Découverte de motifs: Enumération, Programmation par Contraintes/SAT et Bases de données (L. Nourine - LIMOS, Clermont-Ferrand, J.M. Petit - LIRIS, Lyon, L. Saïs - CRIL, Lens) (slides)
16h00 - 16h30 : Pause
16h30 - 18h00 : Tutoriel 6 : De l'organisation des droits d'accès au système. Ou, comment modéliser qui a droit à quoi ? (Romuald Thion, Université Claude Bernard, Lyon) (slides)

*Le transport en bus est assuré. Information sur place

 

Sessions

 

Session 1 : Flux d'informations sur le WEB

Session 2 : Sémantique des données et des services

Session 3 : Protection des données et de la vie privée

Session 4 : Interrogation et archivage de données du WEB

Session 5 : Pair-à-pair et performance

Session 6 : Traitement de données semi-structurées

Session 7 : Potpourri

 

Tutoriels

 

Tutoriel 1 : Introduction to Probabilistic Data Management
Evgeny Kharlamov (Free University of Bozen-Bolzano & University of Oxford), Pierre Senellart (Télécom ParisTech)

Tutoriel 2 : P2P Techniques for Decentralized Applications
Esther Pacitti, LIRMM & INRIA, University of Montpellier 2

Tutoriel 3 : Mise en place d'un système décisionnel basé sur un SI Hospitalier
Samir Belfkih (Université Internationale de Rabat)

Tutoriel 4 : New Challenges in Business Intelligence
Eric Simon (SAP)

Tutoriel 5 : Découverte de motifs: Enumération, Programmation par Contraintes/SAT et Bases de données
L. Nourine (LIMOS, Clermont-Ferrand), J.M. Petit (LIRIS, Lyon), L. Saïs (CRIL, Lens)

Tutoriel 6 : De l'organisation des droits d'accès au système. Ou, comment modéliser qui a droit à quoi ?
Romuald Thion (Université Claude Bernard, Lyon)

 

Keynotes

 

Keynote 1 : Data Management in the Cloud
Amr El Abbadi (University of California, Santa Barbara)

Keynote 2 : The Story of VectorWise, a database spin-off from CWI
Peter A. Boncz (CWI and Vrije Universiteit, Amsterdam)

 

Sessions

 

Session 1 : Flux d'informations sur le WEB

  1. Optimizing large collections of continuous content-based RSS aggregation queries,
    Jordi Creus Tomàs (Univ. Pierre et Marie Curie), Bernd Amann (LIP6), Vassilis Christophides (FORTH), Dan Vodislav (ETIS/Univ. Cergy-Pontoise), Nicolas Travers (CNAM)
  2. Online Refresh Strategies for RSS Feed Crawlers,
    Roxana Horincar (LIP6, UPMC), Bernd Amann (LIP6), Thierry Artières (LIP6, UPMC)
  3. Everything you would like to know about RSS feeds and you are afraid to ask,
    Zeinab Hmedeh (CNAM), Nicolas Travers (CNAM), Nelly Vouzoukidou (FORTH), Vassilis Christophides (FORTH), Cédric du Mouza (CNAM), Michel Scholl (CNAM)

Session 2 : Sémantique des données et des services

  1. Ontology Alignment at the Instance and Schema Level,
    Fabian Suchanek (INRIA Saclay), Serge Abiteboul (INRIA Saclay and ENS Cachan), Pierre Senellart ( Telecom ParisTech)
  2. Query Evaluation with Asymmetric Web Services,
    Nicoleta Preda (PRiSM, UVSQ), Fabian Suchanek (INRIA Saclay), Wenjun Yuan (University of Hong Kong), Gerhard Weikum (Max-Planck Institute for Informatics)
  3. A Fuzzy Set Approach to Handle Preferences in Service Retrieval,
    Katia Abbaci (IRISA/ENSSAT, Univ. of Rennes 1), Fernando Lemos (PRiSM, UVSQ), Allel Hadjali (IRISA/ENSSAT, Univ. de Rennes 1), Daniela Grigori (PRiSM, UVSQ), Ludovic Liétard (IRISA/IUT Institute), Daniel Rocacher (IRISA/ENSSAT, Univ. de Rennes 1), Mokrane Bouzeghoub (PRiSM, UVSQ)
  4. XD-ER : un modèle conceptuel pour les environnements dynmamiques,
    Nicolas Lumineau (LIRIS, Univ. Claude Bernard Lyon 1), Frédérique Laforest (LIRIS, INSA-Lyon), Yann Gripay (LIRIS, INSA-Lyon), Jean-Marc Petit (LIRIS, INSA-Lyon)

Session 3 : Protection des données et de la vie privée

  1. Indexation spatiale et clustering pour la publication de données préservant la vie privée,
    Adeel Anjum (LINA, Univ. de Nantes), Guillaume Raschia (Polytech Nantes)
  2. Protection des données personnelles lors de la composition des services DaaS pour Mashup,
    Salah-Eddine Tbahriti (LIRIS, Univ. Claude Bernard Lyon 1), Mahmoud Barhamgi (LIRIS, Univ. Claude Bernard Lyon 1), Nabila Benharkat (LIRIS, INSA de Lyon), Chirine Ghedira (LIRIS, Univ. Claude Bernard Lyon 1), Djamal Benslimane (LIRIS, Univ. Claude Bernard Lyon 1), Michael Mrissa (LIRIS, Univ. Claude Bernard Lyon 1)
  3. SocioPath: In Whom You Trust? (article court),
    Nagham Alhadad (LINA), Philippe Lamarre (LINA), Patricia Serrano-Alvarado (LINA, Univ. de Nantes), Yann Busnel (LINA), Marco Biazzini (LINA)
  4. Contrôle d'accès basé sur la provenance (article court),
    Francois Lesueur (INSA de Lyon), Romuald Thion (LIRIS)

Session 4 : Interrogation et archivage de données du WEB

  1. Breadth-first strategies for top-k algorithms over web data sources,
    Mehdi Badr (ETIS/Univ. Cergy-Pontoise), Dan Vodislav (ETIS/Univ. Cergy-Pontoise)
  2. Algorithme top-k pour la recherche d’information dans les réseaux sociaux,
    Talel Abdessalem (Telecom ParisTech), Bogdan Cautis (Telecom ParisTech), Silviu Maniu (Telecom ParisTech)
  3. Archiving the Web using Page Changes Patterns: A Case Study,
    Myriam Ben Saad (LIP6), Stéphane Gancarski (LIP6)
  4. Coherence-oriented Crawling and Navigation for Web Archives using Patterns,
    Myriam Ben Saad (LIP6), ZeynepP Pehlivan (UMPC-LIP6), Stéphane Gancarski (LIP6)

Session 5 : Pair-à-pair et performance

  1. Seamless Distribution of Data Centric Applications through Declarative Overlays,
    Stephane Grumbach (INRIA), Eric Bellemon (INRIA), Ahmad Ahmad-Kassem (INRIA)
  2. To Replicate or Not To Replicate Queries in the Presence of Autonomous Participants?,
    Jorge-Arnulfo Quiane-Ruiz (Saarland University), Philippe Lamarre (LINA), Patrick Valduriez (INRIA and LIRMM, Montpellier)
  3. The Melodic Signature Index for Fast Content-based Retrieval of Symbolic Scores,
    Camelia Constantin (Univ. Paris 6), Cédric du Mouza (CNAM), Zoe Faget (Univ. Paris Dauphine), Philippe Rigaux (CNAM)
  4. Data Management in Forecasting Systems: Case Study - Performance Problems and Preliminary Results (article court),
    Haitang Feng (LIRIS), Nicolas Lumineau (LIRIS, Univ. Claude Bernard Lyon 1), Mohand-Saïd Hacid (LIRIS, Univ. Claude Bernard Lyon 1), Richard Domps

Session 6 : Traitement de données semi-structurées

  1. Optimal Probabilistic Generators for XML Corpora,
    Serge Abiteboul (INRIA Saclay and ENS Cachan), Yael Amsterdamer (INRIA and Tel Aviv University), Daniel Deutch (INRIA Saclay and ENS Cachan and Ben Gurion University), Tova Milo (Tel Aviv University), Pierre Senellart (Telecom ParisTech)
  2. Growing Triples on Trees: an XML-RDF Hybrid Model for Annotated Documents,
    François Goasdoué (LRI, Univ. Paris-Sud11, CNRS, and INRIA Saclay), Konstantinos Karanasos (INRIA Saclay and LRI, Univ. Paris-Sud 11), Yannis Katsis (INRIA Saclay and ENS Cachan), Julien Leblay (INRIA Saclay and LRI, Univ. Paris-Sud 11), Ioana Manolescu (INRIA Saclay), Stamatis Zampetakis (University of Crete, INRIA Saclay and LRI, Univ. Paris-Sud 11)
  3. Maintenance efficace de documents XML volumineux,
    Mohamed-Amine Baazizi (LRI, Univ. Paris-Sud11 and INRIA Saclay), Nicole Bidoit (LRI, Univ. Paris-Sud11 and INRIA Saclay), Dario Colazzo (LRI, Univ. Paris-Sud11 and INRIA Saclay)

Session 7 : Potpourri

  1. Embedding User's Requirements in Data Warehouse Repositories,
    Selma Khouri (Ecole Supérieure d'Informatique), Bellatreche Ladjel (LISI ENSMA), Patrick Marcel (François-Rabelais University)
  2. A cluster-based matrix-factorization for online integration of new ratings,
    Modou GUEYE (UCAD), Talel Abdessalem (Telecom ParisTech), Hubert Naacke (LIP6-UPMC Sorbonne Univ.)
  3. Efficient SUM Query Processing over Uncertain Data,
    Reza Akbarinia (INRIA and LIRMM, Montpellier), Patrick Valduriez (INRIA and LIRMM, Montpellier), Guillaume Verger (INRIA and LIRMM, Montpellier)

 

Demos

 

  • P2Prec: a Social-based P2P Recommendation System,
    Guillaume Verger (INRIA and LIRMM, Montpellier), Didier Parigot (INRIA), Fady Draidi (LIRMM), Esther Pacitti (LIRMM)
  • DomVision : Intergiciel de gestion de données pour l'environnement domestique,
    Loic Petit (France Telecom), Claudia Roncancio (Université de Grenoble), Cyril Labbé (Univ. de Grenoble), François-Gaël Ottogalli (France Telecom)
  • WebTribe: Dynamic Community Analysis from Online Forums - A tool for the Community Manager,
    Damien Leprovost (Le2i CNRS Lab), Lylia Abrouk (Le2i CNRS Lab)
  • Privacy Support for Sensitive Data Sharing in P2P Systems,
    Patricia SERRANO-ALVARADO (LINA-Univ. de Nantes), Mohamed Jawad (LINA-Univ. de Nantes), Patrick Valduriez (INRIA and LIRMM, Montpellier), Stéphane Drapeau (Obeo)
  • ProbDB: Efficient Execution of Aggregate Queries over Probabilistic Data,
    Guillaume Verger (INRIA and LIRMM, Montpellier), Reza Akbarinia (INRIA and LIRMM, Montpellier), Patrick Valduriez (INRIA and LIRMM, Montpellier)
  • RDFViewS: A Storage Tuning Wizard for RDF Applications,
    François Goasdoué (INRIA Saclay and LRI, Univ. Paris-Sud 11), Konstantinos Karanasos (INRIA Saclay and LRI, Univ. Paris-Sud 11), Julien Leblay (INRIA Saclay and LRI, Univ. Paris-Sud 11), Ioana Manolescu (INRIA Saclay)
  • Recherche et classement de compromis: Intégration des requêtes skylines et de l'analyse multicritère,
    Isma Sadoun (PRiSM, Univ. de Versailles Saint-Quentin), Karine Zeitouni (PRiSM, Univ. de Versailles Saint-Quentin)

 

Posters

 

  • Une approche multidimensionnelle pour la personnalisation des connaissances neurologiques dans un environnement mobile,
    Allioui Youssouf (USMBA), El Beqqali Omar (USMBA)
  • Product Lifecycle Management system architecture based on Agents technologies for collaboration,
    Boulaalam Abdelhak (USMBA), El Beqqali Omar (USMBA), Nfaoui El Habib (USMBA)
  • Annotation sémantique des ressources d’enseignement à distance à base d’agents intelligents,
    Oriche Aziz (FST TETOUAN), Chekry Abderrahman (LIROSA)
  • Vers une variabilité simplifiée pour l’adaptation des systèmes d’information pervasifs,
    Alaaeddine Yousfi (LRIT), Rajaa Saidi (UMV)
  • Query Expansion in Information Retrieval Using Associated Queries,
    Abderrahim El Qadi (GSCM-LRIT), Btihal El Ghali (GSCM-LRIT), Driss Aboutajddine (GSCM-LRIT)
  • Partage des objets pédagogiques dans une plate forme e-Learning,
    Chekry Abderrahman (LIROSA), Oriche Aziz (FST TETOUAN), Khaldi Mohamed (LIROSA)

 

Tutoriels

 

Tutoriel 1 : Introduction to Probabilistic Data Management

Evgeny Kharlamov (Free University of Bozen-Bolzano & University of Oxford)
Pierre Senellart (Télécom ParisTech)

Abstract: This tutorial, accessible to students and researchers in the database management area without any specific prerequisite, introduces probabilistic databases and how they are used to represent uncertain, imprecise, or incomplete information. We first give motivating examples of uncertainty in the real world (missing values, imprecision on measurement, uncertain data integration, etc.) and derive from this requirements for uncertain data models. We next present a catalog of uncertain and probabilistic data models from the literature, starting with SQL NULLs up to recent advances in probabilistic relational and XML databases, explaining pros and cons of these different data models. In the third part, we focus on query answering over some important models: query answering by lineage computation, complexity results, algorithms, approximation techniques. We then discuss updates as a way to obtain probabilistic data, and deal with the issue of representation systems closed under updates. We conclude the tutorial by a presentation of probabilistic database systems from an implementer and a user's perspectives: what are the systems issues? what kind of out-of-the shelf system can be used to represent uncertain data? To illustrate, we give a short demonstration of the MayBMS extension to PostgreSQL.

Bios: Evgeny Kharlamov is a (post-doctoral) research assistant at the Free University of Bozen-Bolzano (FUB), Italy, and a visiting researcher at the Computing Laboratory of Oxford University. Evgeny received a PhD degree in Computer Science from the FUB in April 2011, working with Werner Nutt and Pierre Senellart. He got his European M.Sc. degree in Computer Science from both Dresden University of Technology and FUB in 2006. Evgeny is an alumni of the Novosibirsk State University, the Russian leading research school, where he studied mathematics. His research interests focus around theoretical and algorithmic aspects of (i) database management systems, with the focus on management of uncertain data, and (ii) the Semantic side of the World Wide Web. Evgeny has published several papers in top-tier conferences (VLDB, ICDT, ISWC).

Pierre Senellart is an Associate Professor in the DBWeb team at Télécom ParisTech. He is an alumnus of the École normale supérieure and obtained his M.Sc. (2003) and his Ph.D. (2007) in computer science from Université Paris-Sud, studying under the supervision of Serge Abiteboul. Pierre Senellart has published articles in internationally renowned conferences and journals (PODS, AAAI, VLDB Journal, Journal of the ACM, etc.) He has been a member of the program committee and participated in the organization of various international conferences and workshops (including WWW, CIKM, ICDE, VLDB, SIGMOD, ICDT). He is also the Information Director of the Journal of the ACM. His research interests focus around theoretical aspects of database management systems and the World Wide Web, and more specifically on the intentional indexing of the deep Web, probabilistic XML databases, and graph mining.

 

Tutoriel 2 : P2P Techniques for Decentralized Applications

Esther Pacitti, LIRMM & INRIA, University of Montpellier 2

Abstract: As an alternative to traditional client-server systems, Peer-to-Peer (P2P) systems provide major advantages in terms of scalability, autonomy and dynamic behavior of peers, and decentralization of control. Thus, they are well-suited for large-scale data sharing in distributed environments. Most of the existing P2P approaches for data sharing rely on either structured networks (e.g. DHTs) for efficient indexing or unstructured networks for ease of deployment, or some combination. However, these approaches have some limitations, such as lack of freedom for data placement in DHTs, and high latency and high network traffic in unstructured networks. To address these limitations, gossip protocols which are easy to deploy and scale well, can be exploited. In this tutorial, I will give a overview of these different P2P techniques and architectures, discuss their trade-offs and illustrate their use for decentralizing several large-scale data sharing applications such as:

  1. Content distribution networks for web content caching with the goal of reducing the load of web servers. The idea is that peers keep the content they retrieve and later serve it to other peers that are close to them in a specific locality. I will present how DHT’s can be combined with gossip protocols to enable users to share data by locality, and handle queries related to specific url’s efficiently.
  2. Recommendation systems for document retrieval: the motivation is to facilitate document sharing for on-line communities that are not willing to move their data to centralized servers. Given a key-word query, the goal is to find relevant peers that can recommend documents that are relevant for the query and are of high quality. I will present the use of gossip protocols to disseminate relevant information concerning document topics. Based on the gossip views and taking into account user similarities, I will show how key word queries are routed in a top-k approach.
  3. Collaborative text editing: the goal here is to enable multi-master replication for text edition in the context of a P2P wiki system. I will show the use of DHT’s as a support for logging and reconciliation.

Finally, I will introduce current research directions in P2P data management.

Bio: Esther Pacitti is a professor of computer science at University of Montpellier 2 and head of the Zenith team at LIRMM, pursuing research in distributed data management. Her research interests include data replication and query processing in large-scale distributed systems (Cloud, P2P) and scientific data management. She has published more than 80 technical papers in international journals and conferences. She has co-edited several proceedings and written several book chapters. She has served as program committee member of major international conferences including SIGMOD, VLDB, EDBT, ICDCS, Euro-Par, etc.

 

Tutoriel 3 : Mise en place d'un système décisionnel basé sur un SI Hospitalier

Samir Belfkih

Abstract: Ce tutoriel est accessible à tous les étudiants et professeurs intéressé par l’informatique dans le secteur médical. Les responsables des systèmes d’information hospitalier seraient également concernés par cette présentation. L’objectif principal est de montrer les enjeux d’un SI hospitalier ainsi que la pertinence de la mise en place d’un système décisionnel dans les centres hospitaliers régionaux.

Bio: Pr. Samir BELFKIH, Directeur de Recherche en Système d’Information, Décisionnel, Imagerie Médicale Télé radiologie à l’Université Internationale de Rabat. Avant, il a travaillé en tant qu’Assistant Hospitalo-Universitaire en bi-appartenance : Enseignant chercheur au centre des Etudes et de Recherche en Informatique Médicale (CERIM) de l’Université Lille II et Ingénieur Informaticien au Département d’Information Médicale (DIM) du CHRU de Lille chargé de la mise en place d’un Système Décisionnel à partir du Système d’information Hospitalier. Il est également porteur de nombreux projets sur l’urbanisation de Système d’Information. En tant que conférencier, il a présenté ses différents travaux de recherche dans plusieurs conférences internationales.

 

Tutoriel 4 : New Challenges in Business Intelligence

Eric Simon

Abstract: This talk will review some new challenges that arise in the field of business intelligence (BI). These challenges result from the increasing need of organizations to improve the efficiency of their business by giving every "operational worker" insights needed to make better operational decisions, and aligning day-to-day operations with strategic goals. A first challenge known as "operational BI" is to embed analytics and reporting information into business workflow applications so that the user has all required information to make good decisions. This raises several data integration issues. Another challenge called "real time analytics" is to enable BI applications to execute against operational data (or production data), without having to build the usual layers of data such as ODS, EDW and Data Marts, thereby increasing the timeliness of analytics and reducing the cost of ownership for organizations. This raises new database issues in data storage and query and update processing. Finally, a last challenge often referred to as "big data" is to analyze very huge amount of data that usually result from web logs or similar types of automatically generated streams of semi-structured data. This again raises new issues on architecture and data management. The talk will illustrate the challenges with examples and give an overview of the various technical issues that need to be addressed.

Bio: Eric Simon is currently the chief architect of SAP's "Information Management" product division. He was before director of R&D for the Data Access and Data Federator products at SAP Business Objects. Eric was a founder and CEO of Medience, a french start-up acquired by Business Objects in 2005, which developed the Data Federator technology. Previously, Eric was a tenure research scientist at INRIA (Institut national de Recherche en Informatique et Automatique). There, he created the research project "Le Select" that produced the innovative solutions later transferred to Medience. Eric received a PhD in computer science from University of Paris VI in 1986. His area of interest has been in advanced database systems, query optimization, data integration methods and algorithms, and business intelligence. He has co-authored more than 70 research papers published in international conferences and journals. His research work was distinguished by a best paper award at both the VLDB and ACM OOPSLA international conferences. He co-authored several patents at Bell Labs, Medience, Business Objects, and SAP. Eric regularly served in the program committees of international research conferences (IEEE Data Engineering, VLDB, ACM SIGMOD, ...).

 

Tutoriel 5 : Découverte de motifs: Enumération, Programmation par Contraintes/SAT et Bases de données

L. Nourine, LIMOS, Clermont-Ferrand
J.M. Petit, LIRIS, Lyon
L. Saïs, CRIL, Lens

Abstract: Ce cours propose de dresser un tour d'horizon des problèmes de découvertes de motifs intéressants dans des masses de données. Après une brève introduction sur les applications sous-jacentes, nous présenterons leurs principales caractéristiques puis, nous montrerons comment ils peuvent tirer profit de trois champs disciplinaires de l'informatique : l'algorithmique d'énumération, la programmation par contraintes/SAT et les bases de données.

Bios: Jean-Marc Petit is Professor of Computer Sciences at INSA Lyon since 2005. INSA Lyon belongs to the University of Lyon and is one of the top engineering schools in France. Since 2008, he leads the database group at the LIRIS laboratory (UMR 5205 CNRS) and he is director of the master by research program in Computer Sciences since 2007. He was the co-chair of the VLDB 2009 organizing committee held in Lyon, France. His main research interest concerns the cross fertilization between databases and data mining.

 

Tutoriel 6 : De l'organisation des droits d'accès au système. Ou, comment modéliser qui a droit à quoi ?

Romuald Thion (Université Claude Bernard, Lyon)

Abstract: La sécurité (classiquement définie comme le triptyque intégrité, confidentialité, disponibilité) des données est un des défis majeurs de l'informatique contemporaine. Que ce soient les données commerciales, la propriété intellectuelle ou les données personnelles, toutes nécessitent la mise en oeuvre de mécanismes visant à améliorer, si ce n'est garantir, leur confidentialité. Parmi les mécanismes existants (dont le chiffrement), il est une famille d'entre eux dont l'utilisation est omniprésente dans les système, car inévitable quand on envisage la sécurisation d'un système : le contrôle d'accès, ou autorisation. Ces mécanismes, par l'intermédiaire de moniteurs aux points d'entrée du système, s'assurent que seuls les ayants-droits accèdent aux données. L'objectif des mécanismes d'autorisation est commun : il faut déterminer qui à droit à quoi pour prendre la décision d'accès. Toutefois, selon les systèmes, l'organisation des droits peut-être très différente. Elle peut varier selon le type des objets considérés (des fichiers, des tuples, des tables, des lignes d'annuaire) mais aussi selon la finalité du système et l'organisation du métier (systèmes militaires, bancaires, commerciaux, ERP). Dans ce tutoriel, nous présenterons les principaux concepts du contrôle d'accès et présenterons différents modèles d'organisation des droits existants et proposés dans des travaux de recherche. Nous nous intéresserons en particulier à l'organisation basée dur les rôles, à la conception de modèles et aux modèles de contrôle distribué.

Bio: Romuald THION, âgé de 30 ans, est docteur de l'Insa de Lyon. Il est actuellement maître de conférences au département Informatique de l'Université Claude Bernard Lyon 1 depuis le 1er septembre 2010 et effectue sa recherche au LIRIS (Laboratoire d'InfoRmatique en Image et Systèmes d'information) au sein de l'équipe "Bases de Données", où il travaille sur la protection des données. De 2004 à 2008, Romuald THION a travaillé sur l'organisation et la modélisation formelle des politiques de contrôle d'accès. Grâce à l'utilisation d'outils issus des BD, en l'espèce les contraintes d'intégrité, il a proposé une organisation relationnelle des droits d'accès aux systèmes ainsi que des outils qui permettent d'en vérifier la cohérence, de les représenter graphiquement et d'en faire de l'ingénierie inverse. De 2008 à 2010, il a travaillé dans l'action exploratoire LICIT de l'INRIA Grenoble Rhône-Alpes sur la modélisation formelle d'obligations légales, en particulier celles portant sur la protection de la vie privée et des données personnelles.

 

Keynotes

 

Keynote 1 : Data Management in the Cloud

Amr El Abbadi

Abstract: Over the past two decades, database and systems researchers have made significant advances in the development of algorithms and techniques to provide data management solutions that carefully balance the three major requirements when dealing with critical data: high availability, reliability, and data consistency. However, over the past few years the data requirements, in terms of data availability and system scalability, from Internet scale enterprises that provide services and cater to millions of users has been unprecedented. Cloud computing has emerged as an extremely successful paradigm for deploying Internet and Web-based applications. Scalability, elasticity, pay-per-use pricing, and autonomic control of large-scale operations are the major reasons for the successful widespread adoption of cloud infrastructures. Current proposed solutions to scalable data management, driven primarily by prevalent application requirements, significantly downplay the data consistency requirements and instead focus on high scalability and resource elasticity to support data-rich applications for millions to tens of millions of users. However, the growing popularity of "cloud computing", the resulting shift of a large number of Internet applications to the cloud, and the quest towards providing data management services in the cloud, has opened up the challenge for designing data management systems that provide consistency guarantees at a granularity which goes beyond single rows and keys. In this talk, we analyze the design choices that allowed modern scalable data management systems to achieve orders of magnitude higher levels of scalability compared to traditional databases. With this understanding, we highlight some design principles for data management systems that can be used to augment existing databases with new cloud features such as scalability, elasticity, and autonomy. We then present two systems that leverage these principles. The first system, G-Store, provides transactional guarantees on data granules formed on-demand while being efficient and scalable. The second system, ElasTraS, provides elastically scalable transaction processing using logically contained database partitions. Finally, we will present two techniques for on-demand live database migration, a primitive operation critical to provide lightweight elasticity as a first class notion in the next generation of database systems. The first technique, Albatross, supports live migration in a multitenant database serving OLTP style workloads where the persistent database image is stored in network attached storage. The second technique, Zephyr, efficiently migrates live databases in a shared nothing transactional database architecture.

Bio: Amr El Abbadi is currently Professor and Chair of the Computer Science Department at the University of California, Santa Barbara. He received his B. Eng. in Computer Science from Alexandria University, Egypt, and received his Ph.D. in Computer Science from Cornell University in August 1987. Prof. El Abbadi is an ACM Fellow. He has served as a journal editor for several database journals, including, currently, The VLDB Journal. He has been Program Chair for multiple database and distributed systems conferences, most recently SIGSPATIAL GIS 2010 and ACM Symposium on Cloud Computing (SoCC) 2011. He has also served as a board member of the VLDB Endowment from 2002—2008. In 2007, Prof. El Abbadi received the UCSB Senate Outstanding Mentorship Award for his excellence in mentoring graduate students. He has published over 250 articles in databases and distributed systems.

 

Keynote 2 : The Story of VectorWise, a database spin-off from CWI

Peter A. Boncz

Abstract: VectorWise is a new entrant in the analytical database marketplace whose technology comes straight from innovations in the database research community in the past years. In particular, VectorWise is a spin-off from the MonetDB research group and I will describe the technical highlights and their roots in research, including the vectorized execution model, architecture-conscious query processing, updatable column-stores, ando advanced disk scheduling and table clustering. Apart from the technical innovations, I will describe the story of how VectorWise happened as one case of how researchers can cross the gap between research and business. Now that I have participated in two such adventures, I will also comment on what I learned from this on the non-research aspects of spinning out research results.

Bio: Peter A. Boncz (CWI and Vrije Universiteit, Amsterdam). Peter A. Boncz Peter is a researcher in the database architecture research group (INS1) of CWI since 2002, and since 2009 also holds a part-time position at the Vrije Universiteit in Amsterdam. He obtained his Ph.D. degree at the University of Amsterdam in 2002 with research on architecture-conscious column stores that resulted in the MonetDB system. His VLDB paper on this topic won the 2009 VLDB 10-year Best Paper Award. His research interests are centered around high performance database architecture for relational, XML and graph data models. Peter Boncz also co-started the DaMoN workshop series that has brought together architecture-conscious researchers at the last five editions of SIGMOD/PODS. He also was a co-founder of Data Distilleries BV, that used MonetDB in commercial data mining technology, and was acquired by SPSS in 2002. Following work on the MonetDB/X100 project he founded in 2008 a new CWI spin-off c
alled VectorWise, which was acquired in 2010 by Ingres. VectorWise created the analytical DBMS that is currently topping the TPC-H charts for single-node systems.