Literature Database Entry
tao2025trustworthy
Youming Tao, "Trustworthy Collaborative Machine Learning for Edge AI: Privacy, Unlearning, and Robustness," PhD Thesis, School of Electrical Engineering and Computer Science (EECS), TU Berlin (TUB), October 2025. (Advisor: Falko Dressler; Referees: Falko Dressler, Isabel Wagner and Pavel Laskov)
Abstract
Collaborative Machine Learning (CML) enables edge devices to train models jointly without sharing raw data, offering a promising solution for privacy-sensitive and latency-critical applications in areas such as mobile computing, healthcare, and industrial automation. Among the various CML frameworks, Federated Learning (FL) has become the most widely adopted paradigm. However, deploying FL in real-world edge environments raises fundamental trust-related challenges, each linked to one of the three core components of CML: data, model, and collaboration mechanism. Specifically, at the data level, strong privacy guarantees are difficult to achieve due to the risk of gradient leakage. At the model level, machine unlearning becomes essential to remove outdated, erroneous, or sensitive data from trained models. At the collaboration level, robustness against adversarial or faulty clients is critical to ensure reliable training in fully open environments. This dissertation systematically addresses these challenges by developing trustworthy FL algorithms with formal theoretical guarantees on convergence, differential privacy, machine unlearning, and Byzantine robustness. First, to mitigate privacy risks in wireless edge learning, the dissertation explores FL over analog over-the-air aggregation channels, where both inherent wireless noise and participant-injected perturbations jointly contribute to privacy amplification. A privacy-aware OTA-FL algorithm is proposed that jointly leverages wireless noise and structured perturbation to achieve user-level local differential privacy, even under malicious server-side manipulation of channel state information (CSI). A bandwidth-adaptive gradient compression scheme is integrated to reduce communication overhead and accommodate limited bandwidth. Theoretical guarantees on privacy and convergence are established, and experimental results demonstrate robust protection against CSI attacks, achieving improved trade-offs between privacy, accuracy, and communication efficiency compared to state-of-the-art OTA-FL baselines. Second, to address the need for data forgetting, the dissertation introduces a provably exact federated unlearning framework. Exact unlearning is defined as the requirement that the distribution of the output model and all intermediate states of the unlearning algorithm be statistically indistinguishable from those of a retraining process conducted from scratch without the deleted data. By establishing a novel connection between federated unlearning and an optimal transport problem, the work identifies a sufficient condition for exact unlearning: the underlying FL algorithm must satisfy Total Variation (TV) stability. Guided by this insight, a TV-stable FL algorithm called FATS is proposed, based on local SGD with periodic averaging and structured sub-sampling. Efficient unlearning protocols are further developed for both sample-level and client-level deletion. Theoretically, FATS is proven to guarantee exact unlearning with favorable convergence rates. Empirical evaluations across multiple benchmarks confirm that FATS enables fast and thorough unlearning with significantly reduced computation and communication overhead compared to prior methods. Third, to ensure robustness against unreliable or malicious devices, the dissertation investigates FL under heavy-tailed gradient noise. It first considers scenarios where gradients follow coordinate-wise finite-variance heavy-tailed distributions, and proposes a local soft truncation-based gradient processing strategy that, when combined with appropriate global aggregation, achieves optimal Byzantine resilience in homogeneous data settings. The work is then extended to more realistic conditions involving infinite-variance noise and statistical heterogeneity. Two robust algorithms are developed using gradient and momentum clipping techniques. Additionally, a random projection-based approximate neighbor mixing method is introduced to reduce aggregation overhead in high-dimensional scenarios. On the theoretical side, the dissertation establishes the first high-probability convergence bounds under these relaxed assumptions. Empirically, the proposed algorithms demonstrate strong performance in adversarial and heterogeneous environments where previous approaches fail. Although the dissertation focuses on the FL framework, the underlying algorithmic principles and theoretical insights generalize to other CML paradigms such as split learning, swarm learning, and peer-to-peer learning. This work thus lays a cohesive foundation for trustworthy CML at the edge, supporting the secure and reliable deployment of intelligent edge systems.
Quick access
Contact
BibTeX reference
@phdthesis{tao2025trustworthy,
author = {Tao, Youming},
title = {{Trustworthy Collaborative Machine Learning for Edge AI: Privacy, Unlearning, and Robustness}},
advisor = {Dressler, Falko},
institution = {School of Electrical Engineering and Computer Science (EECS)},
location = {Berlin, Germany},
month = {10},
referee = {Dressler, Falko and Wagner, Isabel and Laskov, Pavel},
school = {TU Berlin (TUB)},
type = {PhD Thesis},
year = {2025},
}
Copyright notice
Links to final or draft versions of papers are presented here to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted or distributed for commercial purposes without the explicit permission of the copyright holder.
The following applies to all papers listed above that have IEEE copyrights: Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.
The following applies to all papers listed above that are in submission to IEEE conference/workshop proceedings or journals: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible.
The following applies to all papers listed above that have ACM copyrights: ACM COPYRIGHT NOTICE. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Publications Dept., ACM, Inc., fax +1 (212) 869-0481, or permissions@acm.org.
The following applies to all SpringerLink papers listed above that have Springer Science+Business Media copyrights: The original publication is available at www.springerlink.com.
This page was automatically generated using BibDB and bib2web.




