Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Higher-order $U$-statistics abound in fields such as statistics, machine
learning, and computer science, but are known to be highly time-consuming to
compute in practice. Despite their widespread appearance, a comprehensive study
of their computational complexity is surprisingly lacking. This paper aims to
fill that gap by presenting several results related to the computational aspect
of $U$-statistics. First, we derive a useful decomposition from an $m$-th order
$U$-statistic to a linear combination of $V$-statistics with orders not
exceeding $m$, which are generally more feasible to compute. Second, we explore
the connection between exactly computing $V$-statistics and Einstein summation,
a tool often used in computational mathematics, quantum computing, and quantum
information sciences for accelerating tensor computations. Third, we provide an
optimistic estimate of the time complexity for exactly computing
$U$-statistics, based on the treewidth of a particular graph associated with
the $U$-statistic kernel. The above ingredients lead to a new, much more
runtime-efficient algorithm of exactly computing general higher-order
$U$-statistics. We also wrap our new algorithm into an open-source Python
package called $\texttt{u-stats}$. We demonstrate via three statistical
applications that $\texttt{u-stats}$ achieves impressive runtime performance
compared to existing benchmarks. This paper aspires to achieve two goals: (1)
to capture the interest of researchers in both statistics and other related
areas further to advance the algorithmic development of $U$-statistics, and (2)
to offer the package $\texttt{u-stats}$ as a valuable tool for practitioners,
making the implementation of methods based on higher-order $U$-statistics a
more delightful experience.
Key Contributions
This paper addresses the computational complexity of higher-order U-statistics by deriving a decomposition to more feasible V-statistics and exploring connections to Einstein summation for tensor computations. It provides an optimistic estimate of U-statistic computation time based on graph treewidth, filling a gap in the comprehensive study of their computational complexity.
Business Value
Improved efficiency in statistical analysis and machine learning algorithms that rely on U-statistics, potentially leading to faster model training and inference in complex data analysis tasks.