# Academic Research Intelligence
Deep dive into AI research papers for researchers and academics
---
Executive Summary
- 1. Erasing 'Ugly' from the Internet: Propagation of the Beauty Myth in Text-Image Models
Investigates how generative AI models encode 'beauty' norms and erase 'ugliness'. Studies the propagation of Western beauty myths in text-image models and discusses societal implications, particularly concerning negative self-image and body dysmorphia.
- 2. Reevaluating Self-Consistency Scaling in Multi-Agent Systems
Reevaluates self-consistency scaling in multi-agent systems using Gemini 2.5 models. Examines trade-offs of increasing sampled reasoning paths, comparing pooled outputs to single chain-of-thought, and revisiting earlier findings under current model conditions.
- 3. Complex QA and language models hybrid architectures, Survey
Surveys complex question-answering strategies using hybrid LLM architectures. Reviews methods for addressing specific, complex questions beyond chatbot capabilities, exploring power-generation and climate change solutions, and enabling more sophisticated AI problem-solving.
- 4. ToM: Leveraging Tree-oriented MapReduce for Long-Context Reasoning in Large Language Models
Introduces ToM, a framework leveraging Tree-oriented MapReduce for long-context reasoning in LLMs. It improves logical coherence over RAG and divide-and-conquer methods by optimizing graph traversal for knowledge retrieval and reasoning, enabling better performance on complex tasks.
- 5. Adversarial D\'ej\`a Vu: Jailbreak Dictionary Learning for Stronger Generalization to Unseen Attacks
Proposes dictionary learning for adversarial training to defend LLMs against jailbreak attacks. Aims to improve generalization to unseen attacks by creating more robust safety guardrails, addressing a critical challenge in AI safety.
- 6. Spatial Knowledge Graph-Guided Multimodal Synthesis
Proposes a framework for generating spatially coherent multimodal data by integrating spatial knowledge graphs with MLLMs. It addresses spatial perception limitations in MLLMs, enabling the creation of more realistic and contextually accurate visual content.
- 7. PADBen: A Comprehensive Benchmark for Evaluating AI Text Detectors Against Paraphrase Attacks
Introduces PADBen, a benchmark for evaluating AI text detectors against paraphrase attacks. Reveals that iterative paraphrasing evades current detectors by creating an intermediate laundering region, demonstrating limitations in AI-generated text identification.
- 8. When, What, and How: Rethinking Retrieval-Enhanced Speculative Decoding
Introduces ReSpec, a retrieval-enhanced speculative decoding framework for LLM acceleration. It optimizes cache scheduling as a graph problem using Lexicographic Minimax Path Optimization to minimize global errors and improve content quality.
- 9. MARS-SQL: A multi-agent reinforcement learning framework for Text-to-SQL
Introduces MARS-SQL, a multi-agent RL framework for complex Text-to-SQL tasks. It decomposes the problem into specialized agents for grounding, generation, and validation, improving accuracy and handling of intricate queries.
- 10. Diversity-Aware Policy Optimization for Large Language Model Reasoning
Presents a systematic investigation into diversity's impact on LLM reasoning via RL. Proposes a diversity-aware policy optimization framework to enhance reasoning capabilities and stability, addressing limitations in current RL training methods.
- 11. A note on large deviations for interacting particle dynamics for finding mixed Nash equilibria with applications to GANs
Considers a method for finding mixed Nash equilibria in two-layer zero-sum games using entropic regularization. Applies interacting particle dynamics and large deviations theory to problems in GAN training and reinforcement learning.
- 12. SEPS: Semantic-enhanced Patch Slimming Framework for fine-grained cross-modal alignment
Proposes SEPS, a semantic-enhanced patch slimming framework for fine-grained cross-modal alignment. It addresses patch redundancy and ambiguity in MLLMs by optimizing patch selection for improved vision-language correspondence.
- 13. Assessing LLM Reasoning Steps via Principal Knowledge Grounding
Introduces a framework to assess LLM reasoning's knowledge grounding by collecting principal knowledge and evaluating intermediate reasoning steps. It comprises knowledge collection, grounding assessment, and fine-grained analysis to ensure factual alignment.
- 14. On the Variance, Admissibility, and Stability of Empirical Risk Minimization
Proves that suboptimality of Empirical Risk Minimization (ERM) is due to large bias, with variance bounded by the minimax rate. Provides an elementary proof in the fixed design setting and extends it to the random design.
- 15. Low-Rank Adaptation for Foundation Models: A Comprehensive Review
Provides a comprehensive review of Low-Rank Adaptation (LoRA) for foundation models. It analyzes LoRA's effectiveness in adapting large models to downstream tasks, addressing parameter efficiency challenges and exploring various techniques.
- 16. Leveraging the Cross-Domain & Cross-Linguistic Corpus for Low Resource NMT: A Case Study On Bhili-Hindi-English Parallel Corpus
Introduces the Bhili-Hindi-English Parallel Corpus (BHEPC), the largest of its kind. Leverages cross-domain and cross-linguistic data to address low-resource Neural Machine Translation challenges for underrepresented tribal languages like Bhili.
- 17. Efficiency vs. Alignment: Investigating Safety and Fairness Risks in Parameter-Efficient Fine-Tuning of LLMs
Investigates safety and fairness risks in parameter-efficient fine-tuning (PEFT) of LLMs. Compares four PEFT methods (LoRA, DoRA, ICL, Prompt Tuning) to assess trade-offs between efficiency and alignment, providing insights into responsible adaptation.
- 18. Contextual Tokenization for Graph Inverted Indices
Introduces CORGII, a graph indexing framework for efficient subgraph isomorphism retrieval. Uses contextual graph representations and inverted indices to overcome limitations of exhaustive scoring in large graph corpora.
- 19. DTS: Enhancing Large Reasoning Models via Decoding Tree Sketching
Introduces DTS, a framework for enhancing large reasoning models by pruning over-long chain-of-thought traces. It uses decoding tree sketching to identify short, accurate reasoning paths, reducing inference cost and improving correctness.
- 20. Bayesian Additive Main Effects and Multiplicative Interaction Models using Tensor Regression for Multi-environmental Trials
Proposes a Bayesian tensor regression model for phenotype prediction across multiple factors. Incorporates spike-and-slab structures to identify relevant interactions and uses prior distributions to resolve parameter identifiability.
AI for Science
- 1. A note on large deviations for interacting particle dynamics for finding mixed Nash equilibria with applications to GANs
Proposes a method for finding mixed Nash equilibria in two-layer zero-sum games using entropic regularization. This approach connects to training generative adversarial networks and reinforcement learning, offering insights into equilibrium point computation.
- 2. GeneFlow: Translation of Single-cell Gene Expression to Histopathological Images via Rectified Flow
Introduces GeneFlow, a framework mapping transcriptomics to cellular images using attention and rectified flow. Generates high-resolution images with different staining, enabling biomolecular discovery by aligning transcriptomes with histopathological morphology.
- 3. Bayesian Additive Main Effects and Multiplicative Interaction Models using Tensor Regression for Multi-environmental Trials
Proposes a Bayesian tensor regression model for phenotype prediction across multiple factors and environments. Incorporates a spike-and-slab structure to identify relevant interactions, outperforming existing methods in simulation experiments.
- 4. Applying Medical Imaging Tractography Techniques to Painterly Rendering of Images
Explores using medical imaging tractography algorithms, like those from DTI, for painterly image rendering. Mimics human artist brush strokes by tracking fibers, analogous to tissue structure visualization, offering novel artistic rendering techniques.
- 5. Contextual Tokenization for Graph Inverted Indices
Introduces CORGII, a graph indexing framework for efficient retrieval of graphs containing specific subgraphs. Addresses limitations of exhaustive scoring by enabling faster retrieval from large graph corpora.
- 6. Investigating Label Bias and Representational Sources of Age-Related Disparities in Medical Segmentation
Investigates algorithmic bias in medical segmentation, specifically age-related disparities in breast cancer segmentation. Audits the MAMA-MIA dataset to establish quality and uncover sources of bias, contributing to fairer medical AI.
- 7. Leveraging the Cross-Domain & Cross-Linguistic Corpus for Low Resource NMT: A Case Study On Bhili-Hindi-English Parallel Corpus
Introduces the Bhili-Hindi-English Parallel Corpus (BHEPC), the largest such corpus, to address machine translation challenges for low-resource languages. Facilitates improved translation for underrepresented tribal languages.
- 8. Three-dimensional narrow volume reconstruction method with unconditional stability based on a phase-field Lagrange multiplier approach
Presents an algorithm for 3D reconstruction using a phase-field model with Lagrange multipliers. Reconstructs narrow shells from scattered data points by solving a governing equation enhanced with edge detection, ensuring unconditional stability.
- 9. Complex QA and language models hybrid architectures, Survey
Surveys state-of-the-art LLM architectures and strategies for complex question-answering, focusing on hybrid approaches. Addresses limitations of current LLMs for specific, complex queries requiring specialized knowledge.
- 10. SciTextures: Collecting and Connecting Visual Patterns, Models, and Code Across Science and Art
Introduces SciTextures, a large-scale dataset of visual patterns from science and art, linked to generative models and code. Facilitates deep visual understanding by connecting patterns with their underlying formation mechanisms.
- 11. On the Variance, Admissibility, and Stability of Empirical Risk Minimization
Proves that suboptimality in Empirical Risk Minimization (ERM) is primarily due to large bias, not variance. Establishes that the variance error term is bounded by the minimax rate under mild assumptions.
- 12. Image-based ground distance detection for crop-residue-covered soil
Proposes an image-based method for ground distance detection in conservation agriculture. Addresses the challenge of precise seeding depth control on residue-covered soil where traditional sensors fail, enabling better agricultural practices.
AI Safety & Ethics
- 1. Adversarial D\'ej\`a Vu: Jailbreak Dictionary Learning for Stronger Generalization to Unseen Attacks
Proposes dictionary learning for adversarial training against LLM jailbreaks. This method improves robustness to unseen attacks by learning representations of adversarial perturbations, enhancing safety guardrails against harmful outputs.
- 2. Investigating Label Bias and Representational Sources of Age-Related Disparities in Medical Segmentation
Investigates label bias and representational sources of age-related disparities in medical segmentation. Establishes a quantitative audit of the MAMA-MIA dataset to understand performance differences, aiming to improve fairness in AI applications.
- 3. Erasing 'Ugly' from the Internet: Propagation of the Beauty Myth in Text-Image Models
Investigates how generative AI models encode beauty norms and erase 'ugliness.' Explores the societal implications of these models exaggerating Western beauty standards, potentially harming users' self-image and mental health.
- 4. Learning to Seek Evidence: A Verifiable Reasoning Agent with Causal Faithfulness Analysis
Proposes an interactive agent that generates verifiable explanations through an auditable action sequence. Optimizes a policy using reinforcement learning to strategically seek external visual evidence for diagnostic reasoning.
- 5. PADBen: A Comprehensive Benchmark for Evaluating AI Text Detectors Against Paraphrase Attacks
Introduces PADBen, a benchmark for AI text detectors against paraphrase attacks. Reveals that iterative paraphrasing creates an intermediate region evading detection, demonstrating significant weaknesses in current AI-generated text identification systems.
- 6. Epistemic Uncertainty for Generated Image Detection
Introduces a framework for detecting AI-generated images using epistemic uncertainty. Leverages distributional discrepancies between training and testing data to identify generated images.
- 7. Targeted Attack Improves Protection against Unauthorized Diffusion Customization
Proposes a targeted attack to improve protection against unauthorized diffusion customization. Leverages untargeted attacks for protection by adding watermarks and poisoning diffusion models.
- 8. On the Variance, Admissibility, and Stability of Empirical Risk Minimization
Analyzes Empirical Risk Minimization (ERM), proving that suboptimality stems from large bias, not variance, under mild assumptions. Provides theoretical insights into the stability and performance limitations of ERM in machine learning.
- 9. Been There, Scanned That: Nostalgia-Driven LiDAR Compression for Self-Driving Cars
Introduces DejaView, a LiDAR compression framework for self-driving cars. Reduces network and storage costs by leveraging inter-frame redundancies and a nostalgia-driven approach.
- 10. Complex QA and language models hybrid architectures, Survey
Surveys state-of-the-art LLM architectures and strategies for complex question-answering. Focuses on hybrid architectures that address limitations of standard LLMs in solving specific, complex problems requiring specialized knowledge.
- 11. FIRE: Robust Detection of Diffusion-Generated Images via Frequency-Guided Reconstruction Error
Observes that diffusion models struggle to reconstruct mid-band frequency information, using this as a cue for detection. Proposes FIRE for robust detection via frequency-guided reconstruction error.
- 12. Reevaluating Self-Consistency Scaling in Multi-Agent Systems
Reevaluates self-consistency scaling for modern LLMs on reasoning tasks. Compares pooling outputs from varying sampled reasoning paths against single chains, examining trade-offs in performance and efficiency for multi-agent systems.
- 13. Risk-adaptive Activation Steering for Safe Multimodal Large Language Models
Proposes risk-adaptive activation steering for safe multimodal LLMs. Addresses safety alignment by steering activations at inference time to mitigate risks from multimodal queries.
- 14. Bayesian Additive Main Effects and Multiplicative Interaction Models using Tensor Regression for Multi-environmental Trials
Proposes a Bayesian tensor regression model for phenotype prediction across multiple factors. Incorporates spike-and-slab structures to identify relevant interactions, improving prediction accuracy in complex multi-environmental trials.
- 15. A Generative Adversarial Approach to Adversarial Attacks Guided by Contrastive Language-Image Pre-trained Model
Proposes a generative adversarial attack method using CLIP to create effective adversarial perturbations. Leverages CLIP's alignment ability to create visually imperceptible adversarial examples.
AI Theory & Foundations
- 1. A note on large deviations for interacting particle dynamics for finding mixed Nash equilibria with applications to GANs
Proposes a method for finding mixed Nash equilibria in two-layer zero-sum games using entropic regularization and interacting particle dynamics. Addresses equilibrium and robustness issues relevant to training generative adversarial networks and reinforcement learning.
- 2. Reasoning Beyond Language: A Comprehensive Survey on Latent Chain-of-Thought Reasoning
Surveys latent Chain-of-Thought (CoT) reasoning in LLMs, exploring methods that embed reasoning in latent spaces, decoupling it from explicit verbalization. Discusses broader applicability for abstract reasoning tasks beyond language.
- 3. On the Variance, Admissibility, and Stability of Empirical Risk Minimization
Proves that suboptimality of Empirical Risk Minimization (ERM) is due to large bias, not variance, under mild assumptions. Offers an elementary proof for fixed design settings and extends it to the random design setting.
- 4. Adversarial D\'ej\`a Vu: Jailbreak Dictionary Learning for Stronger Generalization to Unseen Attacks
Investigates dictionary learning for jailbreak attacks on large language models to improve robustness against unseen attacks. Addresses optimization challenges and threat model definition issues inherent in adversarial training paradigms.
- 5. Large Stepsizes Accelerate Gradient Descent for Regularized Logistic Regression
Shows large, constant stepsizes accelerate gradient descent for $\ell_2$-regularized logistic regression. Achieves $\widetilde{\mathcal{O}}(\sqrt{\kappa})$ convergence, theoretically explaining accelerated optimization compared to classical small stepsizes.
- 6. Characterization and Learning of Causal Graphs from Hard Interventions
Characterizes causal graph learning from multiple hard interventions and observational data. Links conditional independence invariances to graphical constraints via d-separation, enabling deconfounding and causal discovery.
- 7. Reevaluating Self-Consistency Scaling in Multi-Agent Systems
Examines self-consistency scaling in large language models by analyzing trade-offs of increasing sampled reasoning paths. Reevaluates earlier research claims using current models on HotpotQA and Math-500 datasets.
- 8. Efficient Neural SDE Training using Wiener-Space Cubature
Introduces efficient training for neural SDEs using Wiener-Space Cubature. Optimizes neural network parameters for objective functionals on path-space via path-wise gradients.
- 9. Contextual Tokenization for Graph Inverted Indices
Introduces CORGII, a graph indexing framework for efficient subgraph isomorphism retrieval. Addresses limitations of current multi-vector graph representations by enabling retrieval without exhaustive corpus graph scoring.
- 10. Bayesian Additive Main Effects and Multiplicative Interaction Models using Tensor Regression for Multi-environmental Trials
Proposes a Bayesian tensor regression model for phenotype prediction considering multiple factors. Incorporates priors for identifiability and a spike-and-slab structure to identify relevant interactions.
- 11. A Free Probabilistic Framework for Denoising Diffusion Models: Entropy, Transport, and Reverse Processes
Develops a probabilistic framework for denoising diffusion models using free entropy and transport. Formulates processes with operator-valued dynamics whose spectral measures evolve via additive convolution.
Computer Vision
- 1. Contextual Tokenization for Graph Inverted Indices
Introduces CORGII, a graph indexing framework for efficient subgraph isomorphism retrieval. It leverages contextual tokenization and multi-vector graph representations to enable fast querying of large graph corpora, applicable to various real-world graph retrieval tasks.
- 2. TIR-Bench: A Comprehensive Benchmark for Agentic Thinking-with-Images Reasoning
Introduces TIR-Bench, a new benchmark for agentic thinking-with-images reasoning. It evaluates advanced capabilities beyond basic operations, enabling comprehensive assessment of current thinking-with-images methods.
- 3. Erasing 'Ugly' from the Internet: Propagation of the Beauty Myth in Text-Image Models
Investigates how generative AI models encode beauty norms and erase 'ugliness', analyzing their societal implications. Creates a method to study these encoded biases, aiming to understand and mitigate the propagation of harmful beauty standards in generated content.
- 4. Fast-SmartWay: Panoramic-Free End-to-End Zero-Shot Vision-and-Language Navigation
Proposes Fast-SmartWay, an end-to-end zero-shot framework for Vision-and-Language Navigation. Eliminates panoramic views and two-stage pipelines, enabling faster, real-world applicable navigation.
- 5. A note on large deviations for interacting particle dynamics for finding mixed Nash equilibria with applications to GANs
Proposes a method for finding mixed equilibria in two-layer zero-sum games using entropic regularization and interacting particle dynamics. This approach is relevant for training generative adversarial networks and reinforcement learning agents in complex game theory settings.
- 6. Applying Medical Imaging Tractography Techniques to Painterly Rendering of Images
Explores using medical imaging tractography algorithms for painterly image rendering. Mimics human artistic processes by placing brush strokes analogously to fiber tracking in DTI.
- 7. Bayesian Additive Main Effects and Multiplicative Interaction Models using Tensor Regression for Multi-environmental Trials
Proposes a Bayesian tensor regression model for phenotype prediction across multiple environments. Incorporates spike-and-slab priors to identify relevant interactions, outperforming existing methods in simulation experiments for complex trait analysis.
- 8. Been There, Scanned That: Nostalgia-Driven LiDAR Compression for Self-Driving Cars
Introduces DejaView, a LiDAR compression framework for self-driving cars. Leverages interframe redundancies and nostalgia-driven techniques to reduce network and storage costs.
- 9. PADBen: A Comprehensive Benchmark for Evaluating AI Text Detectors Against Paraphrase Attacks
Introduces PADBen, a benchmark for evaluating AI text detectors against paraphrasing attacks. Reveals that iterative paraphrasing evades current detectors by creating an intermediate 'laundering region', highlighting a critical vulnerability in AI-generated text detection.
- 10. GDROS: A Geometry-Guided Dense Registration Framework for Optical-SAR Images under Large Geometric Transformations
Proposes GDROS, a geometry-guided framework for dense registration of optical-SAR images. Handles large geometric transformations and modal discrepancies effectively for fusion and navigation tasks.
- 11. Leveraging the Cross-Domain & Cross-Linguistic Corpus for Low Resource NMT: A Case Study On Bhili-Hindi-English Parallel Corpus
Introduces the Bhili-Hindi-English Parallel Corpus (BHEPC), the largest parallel corpus for Bhili. This resource addresses low-resource machine translation challenges, enabling improved translation systems for underrepresented languages like Bhili by leveraging cross-domain and cross-linguistic data.
- 12. Image-based ground distance detection for crop-residue-covered soil
Introduces an image-based method for detecting ground distance on crop-residue-covered soil. Addresses the challenge of precise seeding depth control in conservation agriculture where sensors fail.
- 13. Adversarial D\'ej\`a Vu: Jailbreak Dictionary Learning for Stronger Generalization to Unseen Attacks
Proposes dictionary learning for adversarial training to improve LLM robustness against unseen jailbreak attacks. Aims to create models that generalize better to novel adversarial prompts, enhancing AI safety guardrails against harmful content generation.
- 14. Three-dimensional narrow volume reconstruction method with unconditional stability based on a phase-field Lagrange multiplier approach
Presents an algorithm for 3D narrow volume reconstruction using a phase-field Lagrange multiplier approach. Solves governing equations enhanced with edge detection for stable reconstruction from point clouds.
- 15. Complex QA and language models hybrid architectures, Survey
Surveys state-of-the-art LLM architectures and strategies for complex question-answering, focusing on hybrid approaches. Addresses limitations of standard LLMs for intricate queries and explores methods to enhance their capabilities in specialized domains.
- 16. Investigating Label Bias and Representational Sources of Age-Related Disparities in Medical Segmentation
Investigates label bias and representational sources of age-related disparities in medical segmentation. Audits datasets to establish quantitative fairness measures, addressing performance gaps for younger patients.
- 17. On the Variance, Admissibility, and Stability of Empirical Risk Minimization
Analyzes the suboptimality of Empirical Risk Minimization (ERM), proving that variance error is bounded by the minimax rate and suboptimality stems from bias. Provides theoretical insights into the limitations and performance characteristics of ERM.
- 18. SonarSweep: Fusing Sonar and Vision for Robust 3D Reconstruction via Plane Sweeping
Introduces SonarSweep, a framework fusing sonar and vision for 3D reconstruction in underwater environments. Addresses limitations of single-modality approaches and flawed fusion techniques.
- 19. Reevaluating Self-Consistency Scaling in Multi-Agent Systems
Reevaluates self-consistency scaling in multi-agent systems using modern LLMs like Gemini 2.5. Compares pooled outputs from varying sampled reasoning paths against single-chain-of-thought, examining trade-offs in performance and efficiency for complex reasoning tasks.
- 20. GeneFlow: Translation of Single-cell Gene Expression to Histopathological Images via Rectified Flow
Proposes GeneFlow, a framework mapping transcriptomics to cellular images via rectified flow. Combines RNA encoder and conditional UNet to generate high-resolution images for biomolecular discovery.
Efficient AI
- 1. Contextual Tokenization for Graph Inverted Indices
Introduces CORGII, a graph indexing framework for efficient subgraph isomorphism retrieval. It leverages contextual tokenization and multi-vector graph representations to enable faster searching of large graph corpora, enabling practical retrieval applications.
- 2. Been There, Scanned That: Nostalgia-Driven LiDAR Compression for Self-Driving Cars
Introduces DejaView, a LiDAR compression framework for autonomous vehicles that leverages interframe redundancies. It reduces network and storage costs by effectively compressing 3D point cloud data, enabling more efficient data transfer and storage for ML training and analysis.
- 3. A note on large deviations for interacting particle dynamics for finding mixed Nash equilibria with applications to GANs
Proposes a method for finding mixed Nash equilibria in two-layer zero-sum games using entropic regularization and interacting particle dynamics. This addresses key problems in training GANs and reinforcement learning, offering a path to more stable equilibrium finding.
- 4. FastBoost: Progressive Attention with Dynamic Scaling for Efficient Deep Learning
Presents FastBoost, a parameter-efficient neural architecture achieving state-of-the-art CIFAR performance via Dynamically Scaled Progressive Attention. It sets new efficiency frontiers with fewer parameters, demonstrating adaptive fusion and efficient attention mechanisms.
- 5. Reevaluating Self-Consistency Scaling in Multi-Agent Systems
Re-examines self-consistency scaling in LLMs using modern models like Gemini 2.5 on HotpotQA and Math-500. It investigates the trade-offs of increasing sampled reasoning paths, providing updated insights on its effectiveness for improving reasoning performance.
- 6. FIPER: Factorized Features for Robust Image Super-Resolution and Compression
Proposes Factorized Features, a unified representation for Image Super-Resolution and Compression. It addresses shared principles between tasks by recovering and preserving fine image details, offering a robust approach for both tasks.
- 7. Bayesian Additive Main Effects and Multiplicative Interaction Models using Tensor Regression for Multi-environmental Trials
Proposes a Bayesian tensor regression model for phenotype prediction considering multiple factors. It incorporates spike-and-slab priors to identify relevant interactions, outperforming existing methods in simulation experiments for complex trait analysis.
- 8. Scalable Autoregressive Image Generation with Mamba
Introduces AiM, an autoregressive image generative model based on Mamba architecture. It replaces Transformers for AR image generation, aiming for superior quality and enhanced inference speed with linear time complexity for long sequences.
- 9. Complex QA and language models hybrid architectures, Survey
Surveys state-of-the-art LLM architectures and strategies for complex question-answering, focusing on hybrid approaches. It addresses the limitations of current LLMs for complex queries and explores methods to enhance their capabilities in this domain.
- 10. MVSMamba: Multi-View Stereo with State Space Model
Proposes MVSMamba, leveraging Mamba for Multi-View Stereo feature representation. It captures long-range dependencies with linear complexity, addressing challenges of Transformer-based MVS methods to balance performance and efficiency.
- 11. Leveraging the Cross-Domain & Cross-Linguistic Corpus for Low Resource NMT: A Case Study On Bhili-Hindi-English Parallel Corpus
Introduces the Bhili-Hindi-English Parallel Corpus (BHEPC), the first large parallel corpus for low-resource Bhili. It addresses machine translation challenges in linguistically diverse regions by providing crucial resources for underrepresented languages.
- 12. Efficiently Training A Flat Neural Network Before It has been Quantizated
Investigates efficient training of model-agnostic neural networks for quantization. Discovers the relationship between well-trained networks and quantized models, aiming to reduce quantization error for post-training quantization methods.
Generative AI
- 1. PADBen: A Comprehensive Benchmark for Evaluating AI Text Detectors Against Paraphrase Attacks
Introduces PADBen, a benchmark for evaluating AI text detectors against paraphrase attacks. Reveals that iteratively-paraphrased text evades detection by creating an intermediate laundering region, demonstrating a critical vulnerability in current AIGT identification systems.
- 2. A note on large deviations for interacting particle dynamics for finding mixed Nash equilibria with applications to GANs
Proposes a method for finding mixed equilibria in two-layer zero-sum games using entropic regularization. This work addresses key problems in machine learning, particularly the training of generative adversarial networks, by focusing on mixed equilibrium points.
- 3. Unified Diffusion VLA: Vision-Language-Action Model via Joint Discrete Denoising Diffusion Process
Introduces Unified Diffusion VLA, a vision-language-action model using a joint discrete denoising diffusion process. It enables unified understanding, generation, and action by reading text/images and producing future images/actions, addressing modality unification challenges.
- 4. Erasing 'Ugly' from the Internet: Propagation of the Beauty Myth in Text-Image Models
Investigates how generative AI models may encode 'beauty' norms and erase 'ugliness'. Studies the societal implications of artificially generated content exacerbating Western beauty standards, particularly impacting women and girls and contributing to negative self-image.
- 5. Scalable Autoregressive Image Generation with Mamba
Introduces AiM, an autoregressive image generative model based on Mamba architecture. It utilizes Mamba's linear complexity for long-sequence modeling to enhance generation quality and inference speed compared to Transformers.
- 6. Adversarial D\'ej\`a Vu: Jailbreak Dictionary Learning for Stronger Generalization to Unseen Attacks
Develops adversarial dictionary learning to improve LLM robustness against unseen jailbreak attacks. This approach aims to create stronger generalization for safety guardrails, addressing the critical challenge of defending against novel attacks that bypass safety mechanisms.
- 7. ReviveDiff: A Universal Diffusion Model for Restoring Images in Adverse Weather Conditions
Proposes ReviveDiff, a universal diffusion model for restoring images degraded by adverse weather. It overcomes limitations of task-specific solutions by effectively restoring images from various challenging conditions.
- 8. Complex QA and language models hybrid architectures, Survey
Surveys state-of-the-art LLM architectures and hybrid strategies for complex question-answering. Addresses the limitations of LLMs for complex queries and explores techniques to enhance their capabilities beyond common problem-solving.
- 9. Applying Medical Imaging Tractography Techniques to Painterly Rendering of Images
Explores applying diffusion tensor imaging (DTI) and tractography techniques to painterly image rendering. Uses a tractography algorithm to place brush strokes mimicking human artists' painting process.
- 10. Reevaluating Self-Consistency Scaling in Multi-Agent Systems
Reevaluates self-consistency scaling in modern LLMs like Gemini 2.5 on complex tasks. Examines the trade-offs of increasing sampled reasoning paths, comparing pooled outputs to single chain-of-thought for improved results before plateauing.
- 11. Wonder3D++: Cross-domain Diffusion for High-fidelity 3D Generation from a Single Image
Introduces Wonder3D++, a method for high-fidelity textured mesh generation from single images using cross-domain diffusion. It overcomes time-consuming optimization and inconsistent geometry issues of previous Score Distillation Sampling methods.
- 12. On the Variance, Admissibility, and Stability of Empirical Risk Minimization
Proves that suboptimality in Empirical Risk Minimization (ERM) is due to large bias, with variance bounded by the minimax rate. Provides proofs for fixed design settings and extends to more general scenarios.
- 13. Multi-scale Latent Point Consistency Models for 3D Shape Generation
Proposes Multi-scale Latent Point Consistency Models (MLPCM) for 3D shape generation using latent diffusion. MLPCM introduces hierarchical latent representations from point-level to super-point levels for accelerated sampling.
- 14. Bayesian Additive Main Effects and Multiplicative Interaction Models using Tensor Regression for Multi-environmental Trials
Proposes a Bayesian tensor regression model for phenotype prediction considering multiple factors. Incorporates prior distributions and spike-and-slab structures to identify relevant interactions, outperforming existing methods in simulation experiments.
- 15. Diffusion Classifiers Understand Compositionality, but Conditions Apply
Investigates diffusion classifiers' compositional understanding capabilities. Shows that while they excel at synthesizing complex scenes, their discriminative performance depends on specific conditions and prompt formulation.
- 16. Contextual Tokenization for Graph Inverted Indices
Introduces CORGII, a graph indexing framework for efficient retrieval of graphs containing query subgraphs. Leverages contextual representations and inverted indexing to overcome limitations of exhaustive scoring in graph retrieval applications.
- 17. FIRE: Robust Detection of Diffusion-Generated Images via Frequency-Guided Reconstruction Error
Proposes FIRE, a frequency-guided reconstruction error method for robust detection of diffusion-generated images. It leverages the observation that diffusion models struggle to reconstruct mid-band frequencies accurately.
- 18. Leveraging the Cross-Domain & Cross-Linguistic Corpus for Low Resource NMT: A Case Study On Bhili-Hindi-English Parallel Corpus
Introduces the Bhili-Hindi-English Parallel Corpus (BHEPC), the largest corpus for Bhili low-resource NMT. Addresses machine translation challenges for underrepresented languages using expertly curated parallel sentences.
- 19. Reg-DPO: SFT-Regularized Direct Preference Optimization with GT-Pair for Improving Video Generation
Introduces Reg-DPO, a SFT-regularized Direct Preference Optimization method for video generation. It addresses challenges of video tasks like data construction and training stability, improving generation quality.
- 20. Diff4Splat: Controllable 4D Scene Generation with Latent Dynamic Reconstruction Models
Introduces Diff4Splat, a feed-forward method for controllable 4D scene generation from a single image. It unifies video diffusion model priors with geometry and motion constraints learned from 4D data for direct synthesis.
Graph Neural Networks
- 1. Contextual Tokenization for Graph Inverted Indices
Introduces CORGII, a graph indexing framework using contextual tokenization. It enables efficient retrieval of graphs containing query subgraphs by scoring corpus graphs exhaustively, improving subgraph isomorphism testing in large-scale applications.
- 2. CMI-MTL: Cross-Mamba interaction based multi-task learning for medical visual question answering
Introduces CMI-MTL, a medical visual question answering framework using cross-Mamba interactions for multi-task learning. It improves cross-modal alignment over self-attention methods and adapts to free-form answers, enabling better clinical decision support.
- 3. A note on large deviations for interacting particle dynamics for finding mixed Nash equilibria with applications to GANs
Proposes a method for finding mixed Nash equilibria in two-layer zero-sum games using entropic regularization and interacting particle dynamics. This addresses key problems in training generative adversarial networks and reinforcement learning.
- 4. GraphGeo: Multi-Agent Debate Framework for Visual Geo-localization with Heterogeneous Graph Neural Networks
Introduces GraphGeo, a multi-agent debate framework for visual geo-localization using heterogeneous Graph Neural Networks. It enhances reasoning by allowing agents to collaborate and refine location predictions, improving accuracy in diverse geographic and scene complexities.
- 5. Bayesian Additive Main Effects and Multiplicative Interaction Models using Tensor Regression for Multi-environmental Trials
Proposes a Bayesian tensor regression model for phenotype prediction across multiple factors and environments. It incorporates prior distributions to resolve identifiability issues and uses a spike-and-slab structure to identify relevant interactions.
- 6. Scalable Multi-Task Learning for Particle Collision Event Reconstruction with Heterogeneous Graph Neural Networks
Proposes scalable multi-task learning for particle collision event reconstruction using heterogeneous Graph Neural Networks. Addresses challenges of increased particle multiplicities and vertex misassociations by leveraging graph structure for holistic reconstruction.
- 7. Can Classic GNNs Be Strong Baselines for Graph-level Tasks? Simple Architectures Meet Excellence
Investigates classic GNNs as baselines for graph-level tasks, comparing them against Graph Transformers. Finds GNNs can achieve competitive performance, challenging the notion that Transformers are always superior by employing simple, efficient architectures.
- 8. Leveraging the Cross-Domain & Cross-Linguistic Corpus for Low Resource NMT: A Case Study On Bhili-Hindi-English Parallel Corpus
Introduces the Bhili-Hindi-English Parallel Corpus (BHEPC), the largest parallel corpus for the underrepresented Bhili language. It facilitates low-resource Neural Machine Translation by providing 110,000 curated sentences across three languages.
- 9. On the Variance, Admissibility, and Stability of Empirical Risk Minimization
Analyzes Empirical Risk Minimization (ERM), proving that its suboptimality stems from large bias, with variance bounded by the minimax rate. Provides elementary proofs in fixed design settings and extends to random designs.
- 10. GraphTeam: Facilitating Large Language Model-based Graph Analysis via Multi-Agent Collaboration
Introduces GraphTeam, a multi-agent framework for LLM-based graph analysis. It combines GNNs for specific tasks with LLMs' reasoning, enabling more effective analysis of relational data by leveraging agent collaboration for improved performance.
- 11. Leveraging Compact Satellite Embeddings and Graph Neural Networks for Large-Scale Poverty Mapping
Proposes a graph-based approach using compact satellite embeddings and Graph Neural Networks for large-scale poverty mapping. Models spatial relations between clusters to predict wealth indices, improving poverty estimation in data-scarce regions.
- 12. Adversarial D\'ej\`a Vu: Jailbreak Dictionary Learning for Stronger Generalization to Unseen Attacks
Investigates dictionary learning for improving large language model robustness against unseen jailbreak attacks. Aims to create models that generalize better to novel adversarial prompts bypassing safety guardrails.
- 13. DynBERG: Dynamic BERT-based Graph neural network for financial fraud detection
Introduces DynBERG, a dynamic BERT-based Graph Neural Network for financial fraud detection in cryptocurrency networks. Mitigates over-smoothing issues using a Transformer architecture for more robust analysis of static and dynamic graph data.
- 14. Reevaluating Self-Consistency Scaling in Multi-Agent Systems
Reevaluates self-consistency scaling in modern LLMs using Gemini 2.5 on HotpotQA and Math-500. Compares pooled outputs from varying sampled reasoning paths against single chains-of-thought to examine trade-offs.
- 15. Scaling Graph Chain-of-Thought Reasoning: A Multi-Agent Framework with Efficient LLM Serving
Presents GLM, a multi-agent Graph Chain-of-Thought system for LLM reasoning over graphs. Decomposes reasoning into specialized agents and optimizes LLM serving for improved accuracy, reduced token usage, lower latency, and higher throughput.
- 16. PADBen: A Comprehensive Benchmark for Evaluating AI Text Detectors Against Paraphrase Attacks
Introduces PADBen, a benchmark for evaluating AI text detectors against iterative paraphrasing. Reveals that such attacks create an intermediate laundering region evading detection systems designed for direct AI-generated text.
Large Language Models
- 1. Reevaluating Self-Consistency Scaling in Multi-Agent Systems
Reevaluates self-consistency scaling for modern LLMs on HotpotQA and Math-500. Compares pooled outputs from varying sampled reasoning paths against single chains-of-thought to understand trade-offs of scaling reasoning.
- 2. Complex QA and language models hybrid architectures, Survey
Surveys state-of-the-art LLM architectures and hybrid strategies for complex question-answering. Addresses limitations of standard LLMs for specific, complex queries, highlighting needs for specialized approaches beyond common chatbot capabilities.
- 3. Measuring Chain of Thought Faithfulness by Unlearning Reasoning Steps
Introduces Faithfulness by Unlearning Reasoning steps (FUR) to measure parametric faithfulness of LLM reasoning. Demonstrates a framework to assess if verbalized reasoning aligns with model beliefs, enabling more reliable CoT outputs.
- 4. Adversarial D\'ej\`a Vu: Jailbreak Dictionary Learning for Stronger Generalization to Unseen Attacks
Proposes dictionary learning for adversarial training to improve LLM robustness against unseen jailbreak attacks. Aims to enhance AI safety by making models more resilient to methods bypassing safety guardrails.
- 5. PolyMath: Evaluating Mathematical Reasoning in Multilingual Contexts
Introduces PolyMath, a multilingual mathematical reasoning benchmark across 18 languages and 4 difficulty levels. Evaluates LLMs, finding even advanced models achieve limited performance, highlighting challenges in multilingual math reasoning.
- 6. A Closer Look at Bias and Chain-of-Thought Faithfulness of Large (Vision) Language Models
Presents the first comprehensive study of Chain-of-Thought faithfulness in large vision-language models. Investigates how text and image biases affect reasoning and bias articulation using a novel fine-grained evaluation pipeline.
- 7. PADBen: A Comprehensive Benchmark for Evaluating AI Text Detectors Against Paraphrase Attacks
Introduces PADBen, a benchmark for evaluating AI text detectors against paraphrase attacks. Reveals iterative paraphrasing evades detection by creating an intermediate region, impacting AIGT identification.
- 8. Reasoning Beyond Language: A Comprehensive Survey on Latent Chain-of-Thought Reasoning
Surveys latent Chain-of-Thought (CoT) reasoning, where reasoning is embedded in latent spaces. Decouples reasoning from explicit verbalization, enabling broader applicability to abstract reasoning tasks beyond language.
- 9. A note on large deviations for interacting particle dynamics for finding mixed Nash equilibria with applications to GANs
Considers a method for finding mixed Nash equilibria in two-layer zero-sum games using entropic regularization. Applies to training generative adversarial networks and reinforcement learning by addressing equilibrium point challenges.
- 10. Enhancing Reasoning Abilities of Small LLMs with Cognitive Alignment
Enhances reasoning capabilities of small LLMs with cognitive alignment. Addresses the challenge of small models having different reasoning capacities compared to large ones, enabling more effective distillation.
- 11. On the Variance, Admissibility, and Stability of Empirical Risk Minimization
Proves that suboptimality of Empirical Risk Minimization (ERM) is due to large bias, not variance. Establishes that ERM's variance error is bounded by the minimax rate under mild assumptions.
- 12. The Curse of CoT: On the Limitations of Chain-of-Thought in In-Context Learning
Investigates limitations of Chain-of-Thought (CoT) prompting in in-context learning. Demonstrates CoT consistently underperforms direct prediction in pattern-based ICL across 16 LLMs and nine datasets.
- 13. Erasing 'Ugly' from the Internet: Propagation of the Beauty Myth in Text-Image Models
Investigates how generative AI models encode 'beauty' and erase 'ugliness', potentially exaggerating Western beauty norms. Discusses implications for society and negative self-image due to AI-generated content.
- 14. Do LLM Evaluators Prefer Themselves for a Reason?
Examines LLM self-preference bias in automatic evaluation. Investigates whether LLMs favoring their own outputs reflects genuine quality or harmful bias, providing insights into LLM evaluator reliability.
- 15. Contextual Tokenization for Graph Inverted Indices
Introduces CORGII, a graph indexing framework using contextual tokenization. Enables efficient retrieval of graphs containing subgraphs isomorphic to a query graph, overcoming limitations of exhaustive scoring.
- 16. Self-Adaptive Cognitive Debiasing for Large Language Models in Decision-Making
Proposes self-adaptive cognitive debiasing for LLMs in decision-making. Addresses inherent cognitive biases by adapting debiasing strategies to LLM outputs, enhancing decision-making reliability.
- 17. Bayesian Additive Main Effects and Multiplicative Interaction Models using Tensor Regression for Multi-environmental Trials
Proposes a Bayesian tensor regression model for phenotype prediction across multiple factors. Incorporates spike-and-slab structures to identify relevant interactions, outperforming existing methods in simulation experiments.
- 18. XIFBench: Evaluating Large Language Models on Multilingual Instruction Following
Introduces XIFBench, a constraint-based benchmark for evaluating multilingual instruction-following LLMs. Assesses performance across 558 instructions in multiple languages, providing fine-grained analysis.
- 19. Leveraging the Cross-Domain & Cross-Linguistic Corpus for Low Resource NMT: A Case Study On Bhili-Hindi-English Parallel Corpus
Introduces the Bhili-Hindi-English Parallel Corpus (BHEPC), the first large parallel corpus for low-resource Bhili. Addresses machine translation challenges for underrepresented Indian languages.
- 20. Medical Hallucinations in Foundation Models and Their Impact on Healthcare
Defines and evaluates 'medical hallucination' in foundation models. Analyzes how autoregressive training fosters overconfidence, potentially altering clinical decisions, and assesses 11 models for this issue.
Multimodal Learning
- 1. Complex QA and language models hybrid architectures, Survey
Surveys state-of-the-art large language model architectures and strategies for complex question-answering, focusing on hybrid architectures. Addresses limitations of LLMs for specific, complex queries by exploring specialized approaches and potential integrations.
- 2. Fast-SmartWay: Panoramic-Free End-to-End Zero-Shot Vision-and-Language Navigation
Introduces Fast-SmartWay, an end-to-end zero-shot VLN-CE framework eliminating panoramic views. Achieves faster, more applicable navigation by leveraging MLLMs without waypoint predictors, enabling efficient real-world deployment.
- 3. Contextual Tokenization for Graph Inverted Indices
Introduces CORGII, a graph indexing framework for efficient retrieval of graphs containing subgraphs. Leverages contextual representations and inverted indexing to overcome limitations of exhaustive scoring, enabling faster subgraph isomorphism tests in large corpora.
- 4. MARS: Multi-Agent Robotic System with Multimodal Large Language Models for Assistive Intelligence
Presents MARS, a multi-agent robotic system powered by MLLMs for assistive intelligence in smart homes. Addresses risk-aware planning and grounding language plans into skills, enhancing robotic support for people with disabilities.
- 5. Leveraging the Cross-Domain & Cross-Linguistic Corpus for Low Resource NMT: A Case Study On Bhili-Hindi-English Parallel Corpus
Introduces the Bhili-Hindi-English Parallel Corpus (BHEPC), the largest such resource, to address low-resource machine translation challenges. Demonstrates leveraging cross-domain and cross-linguistic data for improved translation quality in underrepresented languages.
- 6. Unified Diffusion VLA: Vision-Language-Action Model via Joint Discrete Denoising Diffusion Process
Introduces Unified Diffusion VLA, a vision-language-action model using joint discrete denoising diffusion. Unifies understanding, generation, and action by processing text and images to produce future images and actions.
- 7. A note on large deviations for interacting particle dynamics for finding mixed Nash equilibria with applications to GANs
Considers a method for finding mixed Nash equilibria in two-layer zero-sum games using entropic regularization and interacting particle dynamics. Explores connections to training generative adversarial networks and reinforcement learning.
- 8. SonarSweep: Fusing Sonar and Vision for Robust 3D Reconstruction via Plane Sweeping
Presents SonarSweep, fusing sonar and vision for robust 3D reconstruction in underwater environments. Addresses limitations of single-modality approaches and flawed fusion techniques for improved accuracy.
- 9. Adversarial D\'ej\`a Vu: Jailbreak Dictionary Learning for Stronger Generalization to Unseen Attacks
Proposes jailbreak dictionary learning to enhance LLM robustness against unseen jailbreak attacks. Aims to improve generalization of defenses by learning adversarial perturbations, addressing a critical challenge in AI safety.
- 10. GDROS: A Geometry-Guided Dense Registration Framework for Optical-SAR Images under Large Geometric Transformations
Introduces GDROS, a geometry-guided framework for dense registration of optical-SAR images. Addresses challenges of modal discrepancy and large geometric transformations for improved image fusion and navigation.
- 11. Erasing 'Ugly' from the Internet: Propagation of the Beauty Myth in Text-Image Models
Investigates how generative AI models encode 'beauty' norms and erase 'ugliness'. Studies the implications of these encoded biases, particularly concerning the propagation of Western beauty standards in text-image generation.
- 12. TIR-Bench: A Comprehensive Benchmark for Agentic Thinking-with-Images Reasoning
Presents TIR-Bench, a benchmark for agentic thinking-with-images reasoning. Captures advanced capabilities of models creating and operating tools to transform images for problem-solving, going beyond basic operations.
- 13. Reevaluating Self-Consistency Scaling in Multi-Agent Systems
Reevaluates self-consistency scaling in modern large language models for multi-agent systems. Examines trade-offs of increasing sampled reasoning paths on benchmarks like HotpotQA and Math-500.
- 14. SciTextures: Collecting and Connecting Visual Patterns, Models, and Code Across Science and Art
Introduces SciTextures, a dataset connecting visual patterns with generative models and code across science and art. Enables deeper visual understanding by linking patterns to their underlying generative mechanisms.
- 15. PADBen: A Comprehensive Benchmark for Evaluating AI Text Detectors Against Paraphrase Attacks
Introduces PADBen, a benchmark for evaluating AI text detectors against paraphrase attacks. Reveals that iterative paraphrasing creates an intermediate region evading detection by exploiting semantic displacement while preserving generation patterns.
- 16. GeneFlow: Translation of Single-cell Gene Expression to Histopathological Images via Rectified Flow
Presents GeneFlow, a framework translating single-cell gene expression to histopathological images using rectified flow. Generates high-resolution images with different staining methods to highlight cellular structures.
Natural Language Processing
- 1. Leveraging the Cross-Domain & Cross-Linguistic Corpus for Low Resource NMT: A Case Study On Bhili-Hindi-English Parallel Corpus
Introduces the Bhili-Hindi-English Parallel Corpus (BHEPC), the first large parallel corpus for Bhili, a low-resource language. Leverages cross-domain and cross-linguistic data to improve low-resource Neural Machine Translation (NMT), addressing linguistic diversity challenges in India.
- 2. Complex QA and language models hybrid architectures, Survey
Surveys state-of-the-art large language model (LLM) architectures and strategies for complex question-answering, focusing on hybrid approaches. Addresses LLM limitations for specific, complex queries by exploring architectures capable of handling more challenging information retrieval and synthesis tasks.
- 3. MARS: Multi-Agent Robotic System with Multimodal Large Language Models for Assistive Intelligence
Introduces MARS, a Multi-Agent Robotic System leveraging MLLMs for assistive intelligence in smart homes. Addresses risk-aware planning, user personalization, and grounding language plans into executable skills for people with disabilities.
- 4. PADBen: A Comprehensive Benchmark for Evaluating AI Text Detectors Against Paraphrase Attacks
Introduces PADBen, a benchmark for evaluating AI text detectors against paraphrase attacks. Investigates why iteratively paraphrased AI-generated text evades detection, revealing a 'laundering region' characterized by semantic displacement and preserved generation patterns, enabling development of more robust detectors.
- 5. Applying Medical Imaging Tractography Techniques to Painterly Rendering of Images
Explores applying diffusion tensor imaging (DTI) and tractography techniques to painterly image rendering. Uses a tractography algorithm to place brush strokes mimicking human artists, analogous to fiber tracking in DTI.
- 6. Reevaluating Self-Consistency Scaling in Multi-Agent Systems
Reevaluates self-consistency scaling in multi-agent systems using modern LLMs like Gemini 2.5 on HotpotQA and Math-500. Examines trade-offs of increasing sampled reasoning paths, comparing pooled outputs to single chain-of-thought, to understand performance gains and plateaus in current models.
- 7. Learning to Seek Evidence: A Verifiable Reasoning Agent with Causal Faithfulness Analysis
Proposes an interactive agent that produces verifiable explanations through an auditable action sequence. Learns a policy to strategically seek external visual evidence for diagnostic reasoning, optimized via reinforcement learning.
- 8. Adversarial D\'ej\`a Vu: Jailbreak Dictionary Learning for Stronger Generalization to Unseen Attacks
Proposes dictionary learning for jailbreak attacks on LLMs to improve generalization to unseen attacks. Addresses the critical challenge of defending against novel jailbreaks by developing more robust models that can bypass safety guardrails and elicit harmful outputs effectively.
- 9. Three-dimensional narrow volume reconstruction method with unconditional stability based on a phase-field Lagrange multiplier approach
Presents an algorithm for reconstructing narrow shells using a phase-field model and Lagrange multipliers. Solves the governing equation enhanced with an edge detection function derived from unsigned distance functions for stability.
- 10. A note on large deviations for interacting particle dynamics for finding mixed Nash equilibria with applications to GANs
Considers a method for finding mixed Nash equilibria in two-layer zero-sum games using entropic regularization. Addresses the key problem of finding equilibrium points in continuous minmax games, with applications to training generative adversarial networks (GANs) and reinforcement learning.
- 11. GDROS: A Geometry-Guided Dense Registration Framework for Optical-SAR Images under Large Geometric Transformations
Introduces GDROS, a geometry-guided dense registration framework for optical-SAR images. Addresses challenges of modal discrepancy and large geometric transformations, improving registration reliability.
- 12. Erasing 'Ugly' from the Internet: Propagation of the Beauty Myth in Text-Image Models
Studies how generative AI models encode 'beauty' and erase 'ugliness', investigating implications for societal norms. Creates models to examine the propagation of Western beauty norms in text-image generation, addressing concerns about AI exacerbating negative self-image and body dysmorphia.
- 13. PROPEX-RAG: Enhanced GraphRAG using Prompt-Driven Prompt Execution
Presents PROPEX-RAG, a prompt-driven GraphRAG framework that underscores prompt formulation's significance. Enhances LLMs with external knowledge for intricate reasoning, focusing on prompt design's influence on retrieval and reasoning.
- 14. Contextual Tokenization for Graph Inverted Indices
Introduces CORGII (Contextual Representation of Graphs for Inverted Indexing), a graph indexing framework. Enables efficient retrieval of graphs containing query subgraphs by developing contextual tokenization for graph representations, overcoming limitations of exhaustive scoring in current retrieval systems.
- 15. Fast-SmartWay: Panoramic-Free End-to-End Zero-Shot Vision-and-Language Navigation
Proposes Fast-SmartWay, an end-to-end zero-shot Vision-and-Language Navigation framework eliminating panoramic views. Achieves zero-shot navigation using multimodal LLMs, improving efficiency and real-world applicability by avoiding multi-stage pipelines.
Reinforcement Learning
- 1. Contextual Tokenization for Graph Inverted Indices
Introduces CORGII, a graph indexing framework using contextual representations. This enables efficient retrieval of graphs containing specific subgraphs by overcoming limitations of exhaustive scoring in traditional methods for graph databases.
- 2. Bayesian Additive Main Effects and Multiplicative Interaction Models using Tensor Regression for Multi-environmental Trials
Proposes a Bayesian tensor regression model for phenotype prediction considering multiple factors. Incorporates spike-and-slab structures to identify relevant interactions, outperforming existing methods in simulation experiments for multi-environmental trials.
- 3. Complex QA and language models hybrid architectures, Survey
Surveys state-of-the-art LLM architectures and strategies for complex question-answering, focusing on hybrid approaches. Addresses limitations of current LLMs for specific, complex queries by exploring integrated solutions.
- 4. On the Variance, Admissibility, and Stability of Empirical Risk Minimization
Proves that suboptimality in Empirical Risk Minimization (ERM) is due to large bias, not variance. Demonstrates that the variance error term is bounded by the minimax rate under mild assumptions, offering theoretical insights into ERM performance.
- 5. PADBen: A Comprehensive Benchmark for Evaluating AI Text Detectors Against Paraphrase Attacks
Introduces PADBen, a benchmark for evaluating AI text detectors against paraphrasing attacks. Reveals that iterative paraphrasing creates an evasion region, causing detectors to fail catastrophically against AI-generated text.
- 6. Leveraging the Cross-Domain & Cross-Linguistic Corpus for Low Resource NMT: A Case Study On Bhili-Hindi-English Parallel Corpus
Introduces the Bhili-Hindi-English Parallel Corpus (BHEPC), the first large parallel corpus for low-resource Indian languages. This resource aims to significantly improve machine translation capabilities for underrepresented tribal languages.
- 7. Reevaluating Self-Consistency Scaling in Multi-Agent Systems
Examines the impact of increasing sampled reasoning paths in self-consistency for LLMs on tasks like HotpotQA and Math-500. Reevaluates claims from earlier research regarding performance plateaus and trade-offs in multi-agent systems.
- 8. Breaking the Performance Ceiling in Reinforcement Learning requires Inference Strategies
Explores how inference strategies can overcome performance ceilings in complex RL tasks. Demonstrates that specific strategies can improve zero-shot inference on unseen tasks, enabling more robust real-world applications.
- 9. A note on large deviations for interacting particle dynamics for finding mixed Nash equilibria with applications to GANs
Proposes a method for finding mixed Nash equilibria in two-layer zero-sum games using entropic regularization. This approach connects to training generative adversarial networks and reinforcement learning, addressing equilibrium point finding in continuous minmax games.
- 10. RobustVLA: Robustness-Aware Reinforcement Post-Training for Vision-Language-Action Models
Proposes RobustVLA, an RL post-training framework for VLA models to improve robustness against real-world disturbances. Demonstrates enhanced generalization and resilience in safety-critical robotic manipulation tasks.
- 11. Diversity-Aware Policy Optimization for Large Language Model Reasoning
Investigates the impact of diversity in RL-based training for LLM reasoning. Proposes diversity-aware policy optimization to improve LLM reasoning capabilities and stability.
- 12. Bellman Diffusion Models
Explores using diffusion models for successor state measure in policies. Enforces Bellman flow constraints for a simple Bellman update on diffusion step distribution, advancing RL policy modeling.
- 13. MARS-SQL: A multi-agent reinforcement learning framework for Text-to-SQL
Introduces MARS-SQL, a multi-agent RL framework for Text-to-SQL. Combines task decomposition and interactive RL with specialized agents to improve complex query generation and validation.
- 14. Neuro-Symbolic Imitation Learning: Discovering Symbolic Abstractions for Skill Learning
Proposes a neuro-symbolic imitation learning framework for long, multi-step tasks. Discovers symbolic abstractions to sequence skills, enabling robots to learn extended behaviors beyond simple actions.
- 15. Single-agent Reinforcement Learning Model for Regional Adaptive Traffic Signal Control
Introduces a single-agent RL framework for regional adaptive traffic signal control. Addresses scalability challenges of multi-agent systems and improves robustness and generalization for traffic management.
- 16. GrowthHacker: Automated Off-Policy Evaluation Optimization Using Code-Modifying LLM Agents
Presents GrowthHacker, a framework using LLM agents for automated off-policy evaluation. Optimizes experiments using logged data, enabling efficient assessment of technologies without live A/B testing.
Robotics & Embodied AI
- 1. A note on large deviations for interacting particle dynamics for finding mixed Nash equilibria with applications to GANs
Proposes a method for finding mixed equilibria in two-layer zero-sum games using entropic regularization. Connects interacting particle dynamics to finding equilibrium points, relevant for GAN training and reinforcement learning.
- 2. Fast-SmartWay: Panoramic-Free End-to-End Zero-Shot Vision-and-Language Navigation
Proposes Fast-SmartWay, an end-to-end zero-shot framework for Vision-and-Language Navigation that eliminates panoramic views and two-stage pipelines. Achieves faster, more practical navigation for embodied agents in continuous environments.
- 3. Contextual Tokenization for Graph Inverted Indices
Introduces CORGII, a graph indexing framework for efficient subgraph isomorphism retrieval. Leverages contextual representation and graph-based inverted indices to enable faster retrieval from large graph corpora.
- 4. MARS: Multi-Agent Robotic System with Multimodal Large Language Models for Assistive Intelligence
Introduces MARS, a Multi-Agent Robotic System powered by MLLMs for assistive intelligence in smart homes. Focuses on risk-aware planning, user personalization, and grounding language plans into executable skills for robots.
- 5. On the Variance, Admissibility, and Stability of Empirical Risk Minimization
Proves that suboptimality of ERM is due to large bias, with variance bounded by the minimax rate. Provides an elementary proof using the probabilistic method for fixed design settings.
- 6. Kinematify: Open-Vocabulary Synthesis of High-DoF Articulated Objects
Presents Kinematify for open-vocabulary synthesis of high-DoF articulated objects. Addresses the challenge of modeling complex systems like robots or objects with many degrees of freedom for physical simulation and motion planning.
- 7. Reevaluating Self-Consistency Scaling in Multi-Agent Systems
Reevaluates self-consistency scaling in multi-agent systems using Gemini 2.5 models. Compares pooling outputs from varying sampled reasoning paths against single chains-of-thought to understand trade-offs in current LLMs.
- 8. SonarSweep: Fusing Sonar and Vision for Robust 3D Reconstruction via Plane Sweeping
Introduces SonarSweep, a framework fusing sonar and vision for robust 3D reconstruction via plane sweeping. Addresses challenges in visually-degraded underwater environments where single-modality approaches fail.
- 9. Leveraging the Cross-Domain & Cross-Linguistic Corpus for Low Resource NMT: A Case Study On Bhili-Hindi-English Parallel Corpus
Introduces the Bhili-Hindi-English Parallel Corpus (BHEPC), the first large parallel corpus for Bhili. Addresses machine translation challenges for underrepresented languages using cross-domain and cross-linguistic data.
- 10. Been There, Scanned That: Nostalgia-Driven LiDAR Compression for Self-Driving Cars
Introduces DejaView for nostalgia-driven LiDAR compression for self-driving cars. Aims to reduce network and storage costs by leveraging interframe redundancies in terabytes of sensor data generated daily.
- 11. Bayesian Additive Main Effects and Multiplicative Interaction Models using Tensor Regression for Multi-environmental Trials
Proposes a Bayesian tensor regression model for phenotype prediction considering multiple factors. Incorporates spike-and-slab structures to identify relevant interactions for inclusion in the linear predictor.
- 12. LiDAR-VGGT: Cross-Modal Coarse-to-Fine Fusion for Globally Consistent and Metric-Scale Dense Mapping
Proposes LiDAR-VGGT, a framework for globally consistent and metric-scale dense mapping. Fuses LiDAR and vision data using coarse-to-fine techniques to overcome limitations of existing LiDAR-IV systems and 3D vision models.
- 13. PADBen: A Comprehensive Benchmark for Evaluating AI Text Detectors Against Paraphrase Attacks
Introduces PADBen, a benchmark for evaluating AI text detectors against paraphrase attacks. Reveals that iterative paraphrasing creates an intermediate region evading detection systems designed for direct AIGT.
- 14. Unified Diffusion VLA: Vision-Language-Action Model via Joint Discrete Denoising Diffusion Process
Presents Unified Diffusion VLA, a vision-language-action model using joint discrete denoising diffusion. Aims to unify understanding, generation, and action for embodied agents, overcoming reliance on external experts or separate image generation/action processes.
- 15. Adversarial D\'ej\`a Vu: Jailbreak Dictionary Learning for Stronger Generalization to Unseen Attacks
Investigates dictionary learning for jailbreak attacks on LLMs to improve generalization to unseen attacks. Aims to create more robust defenses against safety guardrail bypasses for harmful outputs.
- 16. Image-based ground distance detection for crop-residue-covered soil
Proposes an image-based ground distance detection method for crop-residue-covered soil. Addresses the challenge of precisely controlling seeding depth by providing ground distance information, overcoming limitations of current sensors.
Speech & Audio
- 1. Leveraging the Cross-Domain & Cross-Linguistic Corpus for Low Resource NMT: A Case Study On Bhili-Hindi-English Parallel Corpus
Introduces Bhili-Hindi-English Parallel Corpus (BHEPC), the first large parallel corpus for low-resource Bhili. Enables improved machine translation for underrepresented Indian languages by providing 110,000 curated sentence pairs across three languages.
- 2. As Good as It KAN Get: High-Fidelity Audio Representation
Introduces Kolmogorov-Arnold Networks (KANs) as Implicit Neural Representations for audio. Achieves state-of-the-art fidelity with lower Log-Spectral Distance and higher Perceptual Evaluation of Speech Quality, enabling more efficient audio representation.
- 3. A note on large deviations for interacting particle dynamics for finding mixed Nash equilibria with applications to GANs
Proposes a method for finding mixed equilibria in two-layer zero-sum games using entropic regularization. Addresses key problems in training generative adversarial networks and reinforcement learning by focusing on mixed equilibrium points.
- 4. A Topology-Aware Graph Convolutional Network for Human Pose Similarity and Action Quality Assessment
Proposes a topology-aware Graph Convolutional Network (GCN) for pose similarity and action quality assessment. Uses a Siamese architecture with contrastive regression, outperforming coordinate-based baselines for skeletal joint analysis.
- 5. Contextual Tokenization for Graph Inverted Indices
Introduces CORGII, a graph indexing framework for efficient subgraph isomorphism retrieval. Enables faster retrieval from large graph corpora by using contextual graph representations and inverted indices, overcoming limitations of exhaustive scoring.
- 6. SegDebias: Test-Time Bias Mitigation for ViT-Based CLIP via Segmentation
Introduces SegDebias, a test-time bias mitigation method for Vision-Language Models using segmentation. Avoids training data and group labels, offering practical debiasing for real-world applications.
- 7. PADBen: A Comprehensive Benchmark for Evaluating AI Text Detectors Against Paraphrase Attacks
Proposes PADBen, a benchmark to evaluate AI text detectors against paraphrase attacks. Reveals that iterative paraphrasing creates an intermediate laundering region that evades detection, highlighting a critical weakness in current AI-generated text identification systems.
- 8. CoT-Saliency: Unified Chain-of-Thought Reasoning for Heterogeneous Saliency Tasks
Presents the first unified framework for Saliency Object Detection, Co-Salient Object Detection, and Saliency Instance Segmentation using Chain-of-Thought reasoning in Vision-Language Models.
- 9. On the Variance, Admissibility, and Stability of Empirical Risk Minimization
Proves that suboptimality in Empirical Risk Minimization (ERM) is primarily due to large bias, not variance. Provides theoretical grounding for understanding ERM's limitations and offers insights into its fixed design and random design settings.
- 10. AnyEnhance: A Unified Generative Model with Prompt-Guidance and Self-Critic for Voice Enhancement
Introduces AnyEnhance, a unified generative model for voice enhancement (speech and singing). Handles denoising, dereverberation, and more via prompt-guidance and self-critic, enabling flexible voice processing.
- 11. Multi-head Temporal Latent Attention
Proposes Multi-head Temporal Latent Attention (MTLA) to compress KV cache along the temporal dimension for LLM inference. Significantly reduces memory footprint by compressing KV cache, improving efficiency.
- 12. Complex QA and language models hybrid architectures, Survey
Surveys state-of-the-art LLM architectures and strategies for complex question-answering using hybrid approaches. Addresses the limitations of current LLMs for specific, complex queries by reviewing methods designed to enhance their problem-solving capabilities.