arxiv_cv 85% Match Research Paper Researchers in event-based vision,Robotics engineers,Developers of high-speed perception systems 5 days ago

Spiking Patches: Asynchronous, Sparse, and Efficient Tokens for Event Cameras

computer-vision › 3d-vision

📄 Abstract

Abstract: We propose tokenization of events and present a tokenizer, Spiking Patches, specifically designed for event cameras. Given a stream of asynchronous and spatially sparse events, our goal is to discover an event representation that preserves these properties. Prior works have represented events as frames or as voxels. However, while these representations yield high accuracy, both frames and voxels are synchronous and decrease the spatial sparsity. Spiking Patches gives the means to preserve the unique properties of event cameras and we show in our experiments that this comes without sacrificing accuracy. We evaluate our tokenizer using a GNN, PCN, and a Transformer on gesture recognition and object detection. Tokens from Spiking Patches yield inference times that are up to 3.4x faster than voxel-based tokens and up to 10.4x faster than frames. We achieve this while matching their accuracy and even surpassing in some cases with absolute improvements up to 3.8 for gesture recognition and up to 1.4 for object detection. Thus, tokenization constitutes a novel direction in event-based vision and marks a step towards methods that preserve the properties of event cameras.

Authors (3)

Christoffer Koo Øhrstrøm

Ronja Güldenring

Lazaros Nalpantidis

Submitted

October 30, 2025

arXiv Category

cs.CV

arXiv PDF

Key Contributions

Proposes 'Spiking Patches', a novel tokenization method for event camera data that preserves the asynchronous and sparse properties of events, unlike frame or voxel representations. This leads to significantly faster inference (up to 10.4x) without sacrificing accuracy in tasks like gesture recognition and object detection when used with GNNs or Transformers.

Business Value

Enables the development of faster, more efficient AI systems for applications using event cameras, such as low-latency robotics, AR/VR, and high-speed tracking, potentially reducing hardware costs and power consumption.

Paper Metadata

Innovation Type

Data Representation / Preprocessing

Deployment Feasibility

High for systems utilizing event cameras. Requires integration into existing event-based vision pipelines. The speed improvements are a significant advantage.

Limitations Addressed

Synchronous and dense representations (frames, voxels) lose event camera properties,High accuracy often comes with slow inference speeds,Inefficiency in processing sparse, asynchronous event streams

Performance Gains

Up to 3.4x faster than voxel-based tokens,Up to 10.4x faster than frame-based tokens,Absolute accuracy improvements up to 3.8% for gesture recognition

Technical Tags

event camerasspiking patchesasynchronoussparse tokensevent representationGNNTransformergesture recognitionobject detectioninference speedevent-based vision

Research Topics

Event-Based VisionSensor FusionDeep LearningComputer VisionEfficient AI

Methods & Architectures

Spiking Patches tokenizerGraph Neural Network (GNN)Transformer GNNTransformer

Applications & Tasks

Robotics Autonomous Systems Augmented Reality High-speed Tracking Data RepresentationEfficient ProcessingEvent Data Analysis Gesture RecognitionObject DetectionProcessing asynchronous event data

Related Fields

RoboticsEmbedded SystemsNeuromorphic ComputingSensor Technology

Keywords

Event CamerasSpiking PatchesAsynchronousSparseTokensEvent RepresentationGNNTransformerGesture RecognitionObject DetectionInference SpeedEvent-Based VisionNeuromorphic

Academic Context

#Event-Based Vision#Sensor Fusion#Deep Learning#Computer Vision#Efficient AI

Commercial Potential

Potential Products

Efficient perception modules for dronesLow-latency tracking systemsAR/VR headsets with faster response times

Target Industries

RoboticsAutomotive (ADAS)Consumer ElectronicsAerospace

Use Case Examples

Fast object tracking for autonomous drones.Real-time gesture control for AR/VR devices.High-speed visual servoing in robotics.

Competitive Edge

Offers a more efficient data representation for event cameras compared to traditional frame/voxel methods, enabling faster inference while maintaining competitive accuracy, making event-based systems more practical.

Market Opportunity

Growing, driven by advancements in event camera technology and demand for high-speed, low-power vision systems.

Revenue Models

Licensing of the technologyintegration into hardware/software solutions.

Resource Requirements

Compute Needs

Lower inference compute requirements compared to frame/voxel methods due to sparser data representation and faster processing.

Data Requirements

Requires datasets specifically captured by event cameras.

Deployment Constraints

Requires specialized event camera hardware,Event data can be noisy and requires careful handling

Scalability

The tokenization method itself is scalable, and its efficiency benefits downstream model scalability.

Regulatory Considerations

Low.

Production Readiness

Maturity Level

Research

Time to Market

1-3 years, for integration into commercial event-based vision systems.

Patent Potential

Moderate, for the Spiking Patches tokenization algorithm.

View Full Paper Back to Papers