Literature Database Entry

zhang2026efficient


Ruirui Zhang, Yifei Zou, Peng Li, Fahao Chen, Yupeng Li, Xiuzhen Cheng, Falko Dressler and Dongxiao Yu, "Efficient Mixture-of-Experts Model Inference at the Edge via Adaptive Expert Merging," IEEE Transactions on Networking, June 2026. (online first)


Abstract

This paper studies the Mixture-of-Experts (MoE) model inference problem at the edge via the adaptive expert merging technique. To fill the gap between the large size of MoE models and the limited hardware resources at the edge, existing works primarily focus on static merging policies and overlook the dynamic and heterogeneous nature of edge environments. This paper addresses these gaps by proposing a flexible framework for adaptive expert merging and online inference task offloading on edge servers. Our contributions include (1) a novel approach allowing each edge server to adaptively merge experts based on its resources and task preferences, (2) an online inference task offloading algorithm with a bounded competitive ratio for pre-determined merging policies, and (3) an online algorithm for joint optimization of task offloading and expert merging, which ensures timely model updates and assignment of tasks to specialized servers. Experimental results on five datasets demonstrate that our approach reduces the overall weighted cost by at least 16% compared to baseline methods.

Quick access

Original Version DOI (at publishers web site)
BibTeX BibTeX

Contact

Ruirui Zhang
Yifei Zou
Peng Li
Fahao Chen
Yupeng Li
Xiuzhen Cheng
Falko Dressler
Dongxiao Yu

BibTeX reference

@article{zhang2026efficient,
    author = {Zhang, Ruirui and Zou, Yifei and Li, Peng and Chen, Fahao and Li, Yupeng and Cheng, Xiuzhen and Dressler, Falko and Yu, Dongxiao},
    doi = {10.1109/TON.2026.3704584},
    note = {to appear},
    title = {{Efficient Mixture-of-Experts Model Inference at the Edge via Adaptive Expert Merging}},
    journal = {IEEE Transactions on Networking},
    issn = {2998-4157},
    publisher = {IEEE},
    month = {6},
    year = {2026},
   }
   
   

Copyright notice

Links to final or draft versions of papers are presented here to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted or distributed for commercial purposes without the explicit permission of the copyright holder.

The following applies to all papers listed above that have IEEE copyrights: Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.

The following applies to all papers listed above that are in submission to IEEE conference/workshop proceedings or journals: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible.

The following applies to all papers listed above that have ACM copyrights: ACM COPYRIGHT NOTICE. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Publications Dept., ACM, Inc., fax +1 (212) 869-0481, or permissions@acm.org.

The following applies to all SpringerLink papers listed above that have Springer Science+Business Media copyrights: The original publication is available at www.springerlink.com.

This page was automatically generated using BibDB and bib2web.

Last modified: 2026-06-19