Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
π Abstract
Abstract: We propose tokenization of events and present a tokenizer, Spiking Patches,
specifically designed for event cameras. Given a stream of asynchronous and
spatially sparse events, our goal is to discover an event representation that
preserves these properties. Prior works have represented events as frames or as
voxels. However, while these representations yield high accuracy, both frames
and voxels are synchronous and decrease the spatial sparsity. Spiking Patches
gives the means to preserve the unique properties of event cameras and we show
in our experiments that this comes without sacrificing accuracy. We evaluate
our tokenizer using a GNN, PCN, and a Transformer on gesture recognition and
object detection. Tokens from Spiking Patches yield inference times that are up
to 3.4x faster than voxel-based tokens and up to 10.4x faster than frames. We
achieve this while matching their accuracy and even surpassing in some cases
with absolute improvements up to 3.8 for gesture recognition and up to 1.4 for
object detection. Thus, tokenization constitutes a novel direction in
event-based vision and marks a step towards methods that preserve the
properties of event cameras.
Authors (3)
Christoffer Koo ΓhrstrΓΈm
Ronja GΓΌldenring
Lazaros Nalpantidis
Submitted
October 30, 2025
Key Contributions
Proposes 'Spiking Patches', a novel tokenization method for event camera data that preserves the asynchronous and sparse properties of events, unlike frame or voxel representations. This leads to significantly faster inference (up to 10.4x) without sacrificing accuracy in tasks like gesture recognition and object detection when used with GNNs or Transformers.
Business Value
Enables the development of faster, more efficient AI systems for applications using event cameras, such as low-latency robotics, AR/VR, and high-speed tracking, potentially reducing hardware costs and power consumption.