REDEEM: Resilient, Decentralized and Privacy-Preserving Machine Learning

Introduction

Basic info:

Abstract.

Classical learning paradigms usually involve a unitary entity, which is using a data generation process or static dataset to learn a mathematical model through successive steps of a specialized optimization algorithm. This approach has substantial drawbacks, namely the existence of a unique model owner, and the need, for the model owner, to gather potentially sensitive data during the training process. Arising from complex biological systems or societies, collective behaviors and distributed intelligence have been mathematically modeled and studied using various agent behaviors and communication protocols, e.g. in multi-agents systems. Similarly, some arguments seem to show that decentralized learning can result in distributed artificial intelligence. Recent trends, e.g. with the Web3 decentralized web initiative or European regulation on data privacy and Artificial Intelligence, support the need to develop innovative fully decentralized learning approaches. In particular, this would pave the way for user empowerment regarding crucial machine learning-powered services currently made available by a few American companies, and relying on the exploitation of sensitive user data.

Since Machine Learning approaches place data at the heart of system performance, this decentralized vision has the advantage of keeping the data necessary for contextual adaptation locally, which both guarantees its confidentiality and saves its transmission. Rather than sharing data, learning amounts to designing a specific algorithm to drive the evolutions of the models which should encode the knowledge provided by the local context. Local adaptation to the context is covered by approaches grouped under the term « incremental learning », which includes transfer learning (learning a new task), fine tuning (specializing a task to a specific context), active learning (exploiting the intervention of an operator), semi-supervised learning (exploiting partial annotations)... These subjects are of course important and deserve attention, but in this project, we will focus our efforts on the mode of distribution of the learning stage and sharing of the produced knowledge, with a questioning on the security and the sovereignty of these processes.

The concept of federated AI proposes a strategy of centralizing the local evolution of models in a global model which is then redistributed to all the systems. This vision allows centralized systems to benefit from the efforts of local actors and to remain in control of the final model and its redistribution. A more collaborative approach, allowing for a wider and more equitable distribution of power, consists of peer-to-peer knowledge sharing (gossip protocol). This vision of distributed AI is attractive because it contributes to user empowerment by limiting the dissemination of personal and confidential information to a single node in the network and it makes systems independent of a superior force that would decide what is good for everyone, but on the other hand it opens up major issues of security and robustness: how can we guarantee the compliance of a model learned in another context? How can we protect our AI network from the introduction of biased knowledge, malicious or not, or even "backdoor" functions? If the pooling consists of a simultaneous optimisation, how can we ensure the validity of contributions that are not always explicable?

The action led on the theme of distributed AI is therefore at the confluence of the topics Embedded and Frugality (distributed systems are frequently low-resource embedded systems such as telephones, vehicles or autonomous robots) and Trust, as the issues of security, reliability and robustness are shed in a new light in collaborative AI.

This project brings together a consortium of complementary teams and researchers, with primary expertise in machine learning, distributed optimization, consensus algorithms and game theory. It also associates a unique spectrum of research orientation, from highly theoretical work on convergence of distributed learning algorithms to extensive experiences towards practical and efficient implementations as well as innovative dissemination activities.

Partners

Jobs

We are looking for PhD students, engineers and post-doctoral researchers. There are also open positions on related projects.

Project website

to be created