Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_cl 93% Match Research Paper AI Researchers,IoT Developers,Edge Computing Engineers,Privacy Experts 1 week ago

CoSense-LLM: Semantics at the Edge with Cost- and Uncertainty-Aware Cloud-Edge Cooperation

large-language-models › multimodal-llms
📄 Abstract

Abstract: We present CoSense-LLM, an edge-first framework that turns continuous multimodal sensor streams (for example Wi-Fi CSI, IMU, audio, RFID, and lightweight vision) into compact, verifiable semantic tokens and coordinates with large language models under explicit latency, energy, bandwidth, and privacy constraints. CoSense-LLM has four parts: (i) SenseFusion, a lightweight encoder that aligns sensor embeddings with language and compresses them into short discrete code sequences; (ii) Edge-RAG, a local hybrid retrieval layer that grounds generation in site specific policies and notes; (iii) PromptRouter, a cost and uncertainty aware policy that selects edge only generation, edge plus retrieval, or compact cloud escalation; and (iv) Secure Execution, an auditable redaction path that enforces data minimization so raw waveforms never leave the device. The system works with modern serving optimizations, including paged or streaming KV caches, FlashAttention style kernels, speculative decoding, and quantized LoRA adapters, and supports on device personalization and federated updates under non IID drift. Across home, office, and clinic deployments, CoSense-LLM delivers grounded explanations while meeting tight service level objectives: it sustains sub second (p95) end to end latency on edge dominant paths, reduces inter tier token and bandwidth costs by preferring local retrieval grounded responses, and preserves privacy by transmitting only discrete codes and redacted metadata. Ablations show that Edge-RAG improves factual consistency and reduces contradictions, calibrated uncertainty enables selective abstention and controlled escalations, and KV plus decoding accelerators lower energy per decision. The results support an edge first design that treats semantics, privacy, and predictable latency as co equal goals for large model deployments in interference prone environments.
Authors (5)
Hasan Akgul
Mari Eplik
Javier Rojas
Aina Binti Abdullah
Pieter van der Merwe
Submitted
October 22, 2025
arXiv Category
cs.CL
arXiv PDF

Key Contributions

CoSense-LLM presents an edge-first framework for processing continuous multimodal sensor streams into semantic tokens, coordinating with LLMs under strict constraints (latency, energy, bandwidth, privacy). It features SenseFusion for encoding/compression, Edge-RAG for local grounding, PromptRouter for cost/uncertainty-aware escalation, and Secure Execution for data minimization, enabling powerful AI at the edge while preserving privacy.

Business Value

Enables the deployment of intelligent, context-aware applications on edge devices, reducing cloud costs, improving response times, and enhancing user privacy in IoT and smart environments.