[IPOL announce] new article: Incidence of the Sample Size Distribution on One-Shot Federated Learning

Sun Feb 12 23:53:40 CET 2023

A new article is available in IPOL: http://www.ipol.im/pub/art/2023/440/

Marie Garin, and Gonzalo Iñaki Quintana,
Incidence of the Sample Size Distribution on One-Shot Federated Learning,
Image Processing On Line, 13 (2023), pp. 57–64.
https://doi.org/10.5201/ipol.2023.440

Abstract
Federated Learning (FL) is a learning paradigm where multiple nodes 
collaboratively train a model by only exchanging updates or parameters. 
This enables to keep data locally, therefore enhancing privacy - 
statement requiring nuance, e.g. memorization of training data in 
language models. Depending on the application, the number of samples 
that each node contains can be very different, which can impact the 
training and the final performance. This work studies the impact of the 
per-node sample size distribution on the mean squared error (MSE) of the 
one-shot federated estimator. We focus on one-shot aggregation of 
statistical estimations made across disjoint, independent and 
identically distributed (i.i.d.) data sources, in the context of 
empirical risk minimization. In distributed learning, it is well-known 
that for a total number of m nodes, each node should contain at least m 
samples to equal the performance of centralized training. In a federated 
scenario, this result remains true, but now applies to the mean of the 
per-node sample size distribution. The demo enables to visualize this 
effect as well as to compare the behavior of the FESC (Federated 
Estimation with Statistical Correction) algorithm - a weighting scheme 
which depends on the local sample size - with respect to the classical 
federated estimator and the centralized one, for a large collection of 
distributions, number of nodes, and features space dimension.