Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_ai 95% Match Research Paper LLM researchers,ML engineers,AI architects 2 weeks ago

L-MoE: End-to-End Training of a Lightweight Mixture of Low-Rank Adaptation Experts

large-language-models › model-architecture
📄 Abstract

Abstract: The Mixture of Experts (MoE) architecture enables the scaling of Large Language Models (LLMs) to trillions of parameters by activating a sparse subset of weights for each input, maintaining constant computational cost during inference. Concurrently, Low-Rank Adaptation (LoRA) has emerged as a dominant technique for parameter-efficiently fine-tuning LLMs on specialized tasks. In this work, we unify these two paradigms into a novel, end-to-end trainable framework named L-MoE: a Lightweight Mixture of LoRA Experts. L-MoE redefines MoE experts not as dense feed-forward networks, but as a collection of task-specialized, low-rank adapters. A lightweight gating network, trained jointly with the experts, learns to dynamically compose these LoRA adapters by computing a weighted average of their parameters for each input token. This composition is fully differentiable, allowing gradients from a standard auto-regressive language modeling objective to flow back through the entire architecture, simultaneously refining both the expert adapters and the routing strategy. This approach creates a highly parameter-efficient MoE model that is modular by design, allows for dynamic skill composition, and is trainable from end-to-end. We present the formal mathematical framework for L-MoE, detailing the differentiable routing mechanism and the joint optimization objective, thereby providing a new path toward building more efficient, scalable, and specialized language models.
Authors (2)
Shihao Ji
Zihui Song
Submitted
October 19, 2025
arXiv Category
cs.LG
arXiv PDF

Key Contributions

Introduces L-MoE, a novel framework that unifies Mixture of Experts (MoE) and Low-Rank Adaptation (LoRA) for end-to-end trainable LLMs. L-MoE redefines experts as LoRA adapters, dynamically composed by a lightweight gating network, enabling efficient scaling and specialization.

Business Value

Facilitates the creation of highly efficient and adaptable LLMs, enabling faster fine-tuning for diverse tasks and potentially reducing the computational cost of deploying large models.