arxiv_cl 93% Match Research Paper AI Researchers,IoT Developers,Edge Computing Engineers,Privacy Experts 1 week ago

CoSense-LLM: Semantics at the Edge with Cost- and Uncertainty-Aware Cloud-Edge Cooperation

large-language-models › multimodal-llms

📄 Abstract

Abstract: We present CoSense-LLM, an edge-first framework that turns continuous multimodal sensor streams (for example Wi-Fi CSI, IMU, audio, RFID, and lightweight vision) into compact, verifiable semantic tokens and coordinates with large language models under explicit latency, energy, bandwidth, and privacy constraints. CoSense-LLM has four parts: (i) SenseFusion, a lightweight encoder that aligns sensor embeddings with language and compresses them into short discrete code sequences; (ii) Edge-RAG, a local hybrid retrieval layer that grounds generation in site specific policies and notes; (iii) PromptRouter, a cost and uncertainty aware policy that selects edge only generation, edge plus retrieval, or compact cloud escalation; and (iv) Secure Execution, an auditable redaction path that enforces data minimization so raw waveforms never leave the device. The system works with modern serving optimizations, including paged or streaming KV caches, FlashAttention style kernels, speculative decoding, and quantized LoRA adapters, and supports on device personalization and federated updates under non IID drift. Across home, office, and clinic deployments, CoSense-LLM delivers grounded explanations while meeting tight service level objectives: it sustains sub second (p95) end to end latency on edge dominant paths, reduces inter tier token and bandwidth costs by preferring local retrieval grounded responses, and preserves privacy by transmitting only discrete codes and redacted metadata. Ablations show that Edge-RAG improves factual consistency and reduces contradictions, calibrated uncertainty enables selective abstention and controlled escalations, and KV plus decoding accelerators lower energy per decision. The results support an edge first design that treats semantics, privacy, and predictable latency as co equal goals for large model deployments in interference prone environments.

Authors (5)

Hasan Akgul

Mari Eplik

Javier Rojas

Aina Binti Abdullah

Pieter van der Merwe

Submitted

October 22, 2025

arXiv Category

cs.CL

arXiv PDF

Key Contributions

CoSense-LLM presents an edge-first framework for processing continuous multimodal sensor streams into semantic tokens, coordinating with LLMs under strict constraints (latency, energy, bandwidth, privacy). It features SenseFusion for encoding/compression, Edge-RAG for local grounding, PromptRouter for cost/uncertainty-aware escalation, and Secure Execution for data minimization, enabling powerful AI at the edge while preserving privacy.

Business Value

Enables the deployment of intelligent, context-aware applications on edge devices, reducing cloud costs, improving response times, and enhancing user privacy in IoT and smart environments.

Paper Metadata

Innovation Type

System Design

Deployment Feasibility

Moderate to High, depending on hardware capabilities and integration complexity. Focuses on edge deployment.

Limitations Addressed

Challenges of running complex AI models like LLMs on resource-constrained edge devices, privacy concerns with sensor data, and inefficient cloud-edge communication.

Performance Gains

Enables LLM capabilities on edge devices,Reduces reliance on constant cloud connectivity,Enhances data privacy

Technical Tags

edge AImultimodal sensingsemantic tokenscloud-edge cooperationLLM inferenceprivacy-preserving AIWi-Fi CSIIMUaudioRFIDvisionretrieval-augmented generation

Research Topics

Edge ComputingInternet of Things (IoT)Multimodal Machine LearningNatural Language ProcessingPrivacy-Preserving AISensor Fusion

Methods & Architectures

Sensor fusionLightweight encoderEdge-based retrievalCost-aware routingUncertainty-aware policySecure data redaction Large Language Models (LLMs)Edge AI modelsSenseFusion encoderEdge-RAG

Applications & Tasks

Internet of Things (IoT) Smart Homes Wearable Devices Industrial IoT Healthcare Monitoring Real-time Multimodal AnalysisEdge Inference ConstraintsPrivacy ConcernsCost-Efficient Cloud-Edge Coordination Converting sensor streams into semantic tokensCoordinating LLM inference between edge and cloudEnabling privacy-preserving AI on edge devices

Related Fields

Edge ComputingIoTSensor NetworksMachine LearningPrivacy-Enhancing Technologies

Keywords

edge AImultimodalLLMsensor fusionprivacyIoTcloud-edgesemantic tokensWi-Fi CSIIMUreal-time AI

Academic Context

#Edge Computing#Internet of Things (IoT)#Multimodal Machine Learning#Natural Language Processing#Privacy-Preserving AI#Sensor Fusion

Technology Stack

ML Infrastructure

Paged KV cachesFlashAttention style kernelsSpeculative decoding

Data Processing Tools

SenseFusion encoderEdge-RAG

Commercial Potential

Potential Products

Smart home assistants with enhanced privacyIndustrial monitoring systems with edge intelligencePersonalized health trackers with on-device AI

Target Industries

Consumer ElectronicsIndustrial AutomationHealthcare TechnologySmart CitiesAutomotive

Use Case Examples

A smart home system that understands user activity based on Wi-Fi signals and voice commands without sending raw data to the cloud.An industrial sensor network that detects anomalies and predicts maintenance needs locally.

Competitive Edge

Offers a comprehensive edge-first framework for multimodal AI, emphasizing privacy and cost-efficiency through intelligent cloud-edge cooperation.

Market Opportunity

Massive and rapidly growing market for IoT devices and edge AI solutions.

Revenue Models

Licensing the framework to hardware manufacturers or offering it as a platform for developing edge AI applications.

Resource Requirements

Compute Needs

Low to Moderate on edge devices, Moderate on cloud for LLM escalation.

Data Requirements

Multimodal sensor data streams.

Deployment Constraints

Requires capable edge hardware.,Integration complexity with diverse sensor types.

Scalability

Designed for scalability by offloading complex tasks to the cloud when necessary and optimizing edge processing.

Regulatory Considerations

Data privacy regulations (e.g.GDPRCCPA)especially concerning sensor data.

Production Readiness

Maturity Level

Research

Time to Market

2-3 years for robust product integration.

View Full Paper Back to Papers