REDEEM: Resilient, Decentralized and Privacy-Preserving Machine
Learning
Introduction
Basic info:
- Programme: PEPR IA
- Call : Programmes et équipements prioritaire de recherche / PEPR « Intelligence artificielle »
- Project ID : ANR-23-PEIA-0005
Abstract.
Classical learning paradigms usually involve a unitary entity, which is using a data generation
process or static dataset to learn a mathematical model through successive steps of a specialized
optimization algorithm. This approach has substantial drawbacks, namely the existence of a
unique model owner, and the need, for the model owner, to gather potentially sensitive data during
the training process. Arising from complex biological systems or societies, collective behaviors
and distributed intelligence have been mathematically modeled and studied using various agent
behaviors and communication protocols, e.g. in multi-agents systems. Similarly, some arguments
seem to show that decentralized learning can result in distributed artificial intelligence. Recent
trends, e.g. with the Web3 decentralized web initiative or European regulation on data privacy and
Artificial Intelligence, support the need to develop innovative fully decentralized learning
approaches. In particular, this would pave the way for user empowerment regarding crucial
machine learning-powered services currently made available by a few American companies, and
relying on the exploitation of sensitive user data.
Since Machine Learning approaches place data at the heart of system performance, this
decentralized vision has the advantage of keeping the data necessary for contextual adaptation
locally, which both guarantees its confidentiality and saves its transmission. Rather than sharing
data, learning amounts to designing a specific algorithm to drive the evolutions of the models
which should encode the knowledge provided by the local context. Local adaptation to the context
is covered by approaches grouped under the term « incremental learning », which includes
transfer learning (learning a new task), fine tuning (specializing a task to a specific context), active
learning (exploiting the intervention of an operator), semi-supervised learning (exploiting partial
annotations)... These subjects are of course important and deserve attention, but in this project,
we will focus our efforts on the mode of distribution of the learning stage and sharing of the
produced knowledge, with a questioning on the security and the sovereignty of these processes.
The concept of federated AI proposes a strategy of centralizing the local evolution of models in a
global model which is then redistributed to all the systems. This vision allows centralized systems
to benefit from the efforts of local actors and to remain in control of the final model and its
redistribution. A more collaborative approach, allowing for a wider and more equitable distribution
of power, consists of peer-to-peer knowledge sharing (gossip protocol). This vision of distributed
AI is attractive because it contributes to user empowerment by limiting the dissemination of
personal and confidential information to a single node in the network and it makes systems
independent of a superior force that would decide what is good for everyone, but on the other
hand it opens up major issues of security and robustness: how can we guarantee the compliance
of a model learned in another context? How can we protect our AI network from the introduction of
biased knowledge, malicious or not, or even "backdoor" functions? If the pooling consists of a
simultaneous optimisation, how can we ensure the validity of contributions that are not always
explicable?
The action led on the theme of distributed AI is therefore at the confluence of the topics
Embedded and Frugality (distributed systems are frequently low-resource embedded systems
such as telephones, vehicles or autonomous robots) and Trust, as the issues of security, reliability
and robustness are shed in a new light in collaborative AI.
This project brings together a consortium of complementary teams and researchers, with primary
expertise in machine learning, distributed optimization, consensus algorithms and game theory. It
also associates a unique spectrum of research orientation, from highly theoretical work on
convergence of distributed learning algorithms to extensive experiences towards practical and
efficient implementations as well as innovative dissemination activities.
Partners
- Ecole polytechnique
- CEA
- INRIA
- CNRS
Jobs
We are looking for PhD students, engineers and post-doctoral researchers.
There are also open positions on related projects.
Project website
to be created