Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Retrieving graphs from a large corpus, that contain a subgraph isomorphic to
a given query graph, is a core operation in many real-world applications. While
recent multi-vector graph representations and scores based on set alignment and
containment can provide accurate subgraph isomorphism tests, their use in
retrieval remains limited by their need to score corpus graphs exhaustively. We
introduce CORGII (Contextual Representation of Graphs for Inverted Indexing), a
graph indexing framework in which, starting with a contextual dense graph
representation, a differentiable discretization module computes sparse binary
codes over a learned latent vocabulary. This text document-like representation
allows us to leverage classic, highly optimized inverted indices, while
supporting soft (vector) set containment scores. Pushing this paradigm further,
we replace the classical, fixed impact weight of a `token' on a graph (such as
TFIDF or BM25) with a data-driven, trainable impact weight. Finally, we explore
token expansion to support multi-probing the index for smoother
accuracy-efficiency tradeoffs. To our knowledge, CORGII is the first indexer of
dense graph representations using discrete tokens mapping to efficient inverted
lists. Extensive experiments show that CORGII provides better trade-offs
between accuracy and efficiency, compared to several baselines.
Authors (4)
Pritish Chakraborty
Indradyumna Roy
Soumen Chakrabarti
Abir De
Submitted
October 26, 2025
NeurIPS 2025 paper