Few words about me
Currently in charge the Data Quality of the Clinical Data Warehouse at the AP-HP, I am passionate about data management and knowledge mining. After 12 years both in public research and private sector, I'm driven by any challenge involving massive data management, data engineering, supervised or unsupervised learning.
Skills and experiences
This section presents my previous positions and related skills.
For more detailled information, please refer to the corresponding tabs.
For more detailled information, please refer to the corresponding tabs.
Résumé
Professional Experiences
Data Quality Project Leader
since 2021
Lead Data Scientist
2018 – 2020
Software Engineer – Data Engineer
2017 – 2018
Post-doctoral researcher in Knowledge Engineering
2014 – 2017
Post-doctoral researcher in Machine Learning
2013 – 2014
Internship in Web development
2009
Graduate studies
PhD in Computer Sciences
2009 – 2012
Le2i lab, University of Burgundy – Dijon, France
Master’s degree in Business Management
2009 – 2012
University of Burgundy – Dijon, France
Master’s degree of Computer Science
2007 – 2009
University of Burgundy – Dijon, France
Projects
DataGlue
2018
Personal Project – Tweag
DataGlue is an Haskell projet that aims to offer high-level functions which mask the complexity of the implementation of Haskell libraries to the data scientist. The aim is also to align the different useful packages for data science in Haskell to propose a coherent, fully compatible and directly usable data science ecosystem. DataGlue is based upon Jupyter lab to provide the same integrated environment and notebook capacities that R or Python data scientists are used to.
The DataGlue initiative is just at its beginning, and is part of the dataHaskell open source project.
Functional development, Data Science.
SAGE: “StorAGe for Exascale Data Centric Computing”
2017 – 2018
H2020 Project – European Union
Supervised by Dr. Arnaud Spiwack.
SAGE system aims to implement a Big Data/Extreme Computing (BDEC) and High Performance Data Analytics (HPDA) capable infrastructure suitable for Extreme scales – including Exascale and beyond. The SAGE storage system will be capable of efficiently storing and retrieving immense volumes of data at extreme scales, with the ability to accept and perform user defined computations integral to the storage system.
Functional development.
C3-Cloud
2016 – 2017
H2020 Project – European Union
Supervised by Pr. Theodoros N. Arvanitis and Pr. Marie-Christine Jaulent.
The C3-Cloud project will establish an ICT infrastructure enabling a collaborative care and cure cloud to enable continuous coordination of patient-centred care activities by a multidisciplinary care team and patients/informal care givers.
Fusion of multimodal patient and provider data is achieved via an interoperability middleware for seamless integration with existing information systems. An Integrated Terminology Server with advanced semantic functions enables meaningful analysis of multimodal data and clinical rules. Active patient involvement and treatment adherence is achieved through a Patient Empowerment Platform ensuring patient needs are respected in decision making and taking into account preferences and psychosocial aspects. Co-design and 4-layered multi-method multi-stakeholder evaluation will lead to a user friendly solution.
Project management, Knowledge engineering, Interoperability Middleware.
Vigi4MED
2014 – 2016
ANSM Project – Inserm
Supervised by Dr. Cédric Bousquet and Pr. Marie-Christine Jaulent.
Side effects that are likely to be related to drugs render a major public health problem. Pharmacovigilance is the discipline that aims to analyze these adverse effects, explain and implement measures for their prevention. Vigi4MED is a project to detect adverse drug reactions (ADRs) from social networks founded by the French drug safety agency (ANSM).
Messages are evaluated from the point of view of their potential interest and quality. The likelihood of adverse reaction is verified by taking into account patient language and context. The evaluation is made (1) retrospectively on medications withdrawn from the market, and (2) prospectively to identify new drugs that may have serious and unexpected side effects not seen in clinical trials due to their rarity.
Twitter parsing, Knowledge engineering.
FIORA: “Semantic inference engine for custom consulting”
2013 – 2014
FUI Project – Inria
Supervised by Pr. Yves Lechevalier.
The FIORA project aims to designing and developing a personalization engine, to propose contents and advices the most appropriate for a given user, with maximum reliability.
Scientifically, a key objective is to merge within a single engine case-based reasoning, techniques of collaborative filtering recommendations, and validation of recommendations by data mining. A decisive step is the construction of a formal interaction layer where information and knowledge are handled and treated. In addition, the FIORA project developed scalable personalization techniques, coupled with the BigData approaches and knowledge representation, to set up a referral system the least intrusive possible, scalable and distributed.
Data Mining, Knowledge engineering, XML Parsers.
WebTribe
2009 – 2012
Thesis project – Le2i lab
WebTribe is a tool for community discovery based on the analysis of discussion forums, tweets, e-mails, and more generally of any communication situation between users.
In this tool, communications from various sources are tracked in real time, analyzed according to a reference ontology, and a summary of user activity is built in a continuous and incremental way. Communities are identified and updated depending on the semantics and structure of communications between users.
See Youtube demonstration.
See Youtube demonstration.
Data Mining, Semantic clustering, Unsupervised learning.
Education
Graduate studies
PhD in Computer Sciences
2009 – 2012
Le2i lab, University of Burgundy – Dijon, France
Entitled “Community discovery and analysis through an online semantic approach: the WebTribe tool” and defended on November 30, 2012, this thesis focuses on the notion of online communities, their discoverability by unsupervised learning methods and the definition and computation of their key features.
Data Mining, Semantic clustering, Unsupervised learning.
Master’s degree in Business Management
2009 – 2012
University of Burgundy – Dijon, France
Alternating training during the PhD, this master degree provides strong skills in team, financial, project and business management. It predisposes young PhDs to career advancement to senior management and executive positions.
Management, Law, Economics.
Master’s degree of Computer Science
2007 – 2009
University of Burgundy – Dijon, France
Master’s degree with Multimedia specialization. Internship on the automatic and supervised generation of Web contents, supervised by Pr. David Gross-Amblard.
Development, Multimedia, Web Mining.
Certifications
Applied Data Science with Python Specialization
2018
University of Michigan (MOOC)
A 5 courses specialization to learn to apply data science methods and techniques, and acquire analysis skills. The courses highlight the python libraries dedicated to data science (pandas, matplotlib, scikit-learn, nltk, networkx) and their practical usages on real data.
Data Science, MapReduce, Pig, Spark, Python.
Data Science at Scale Specialization
2017
University of Washington (MOOC)
A 4 courses specialization focused on scalable data management solutions, data mining algorithms, and practical statistical and machine learning concepts. The specialization also covers data visualization, communication of the results, and legal and ethical issues that arise in working with big data.
Data Science, NoSQL, R.
France Grilles Certification
2016
Renater / Inserm – Paris, France
Training and certification to work on France Grilles, the french representation within EGI, the European Grid Infrastructure. The grid is a portfolio of services on a distributed e-infrastructure for the storage and analysis of scientific data.
Distributed computing, Cloud management.
Skills
The following skills come from the different experiences listed in the previous tabs. Each of these skills is assessed on a completely subjective scale.
Data Science
Theoretical bases
Supervised learning
Unsupervised learning
Knowledge engineering
Semantic clustering
Data Quality
Data Ethics
Python
R
Big Data
Theoretical bases
Flow parsing
MapReduce / Hadoop
Spark
NoSQL
Distributed computing
AWS
Programming
Theoretical bases
Java
C/C++
Haskell
Bash
Web techs
Business
Team Management
Business Management
Project management
Quality management
Agile software dev.
Systems & Softwares
Linux
Windows
LaTeX
LibreOffice
G Suite
Git / GitHub / BitBucket
Communication
Teaching
Popular science
French
English
Other activities
Hobbies
- Photography
Look here, a gallery! (not really up to date) - Astronomy
Often combined with the previous one. - Sports
Skiing, Hiking, sometimes jogging. - Cycling
Not a sport, a lifestyle. - Travels
Because hiking and photography have to be done somewhere.
Volunteering
Teaching
For all my students (and others), you will find here all materials related to the lectures I assumed. Here are the courses that I was in charge of, and author, only. For any other teachings, please refer to the website of the teacher in charge.
Contact
Do you have a question? If you need any additional information, have any request or just want to say hello, just send me an e-mail at: