Distributed Analysis of Medical Data
Distributed Analysis of Medical Data is a project carried out by Dr. Marco Milanesio (MSI Data Science engineer) within Inria Sophia Antipolis Méditerranée Epione team.
Project Description:
The major drawback when dealing with distributed frameworks such Spark is their actual definition of “distributed” which is usually declined into “locally distributed”. That means that the framework is usually deployed on local clusters to exploit distributed file systems to connect machines operating within the same “cloud”. The current Big Data paradigm is thus under rethinking to offer the possibility of exploiting still heterogeneous and distributed data but at a geographic scale. Multiple challenges raise from such a new definition, both from an algorithm and a system design perspective.
In collaboration with Marco Lorenzi, we are currently investigating these subjects, with a particular stress on how to achieve federated learning at scale. To do so, a number of challenges are raised:
- Ubiquitous computing: how to manage different (geographic) data locations and how to combine intermediate results.
- Data transfer: only move selected intermediate results: what can be moved? How often?
- Fault tolerance: what happens when part of the computation fails? Recovery mechanisms must be put in place.