# Academic Research Intelligence
Deep dive into AI research papers for researchers and academics
---
Executive Summary
- 1. Can LLMs subtract numbers?
Investigates subtraction accuracy in eight LLMs, finding it lags behind addition. Errors in (a-b) are consistently related to errors in (b-a), suggesting models struggle with non-commutativity. This highlights a limitation in basic arithmetic reasoning for LLMs.
- 2. Enhancing Federated Learning Privacy with QUBO
Proposes a QUBO formulation to enhance privacy in federated learning by bounding the risk of membership inference attacks. This method aims to improve data protection while maintaining model utility in distributed training scenarios.
- 3. Fast, Private, and Protected: Safeguarding Data Privacy and Defending Against Model Poisoning Attacks in Federated Learning
Introduces Fast, Private, and Protected (FPP), a novel approach for federated learning that safeguards data privacy and defends against model poisoning attacks. It aims to ensure secure and robust distributed model training.
- 4. IG-Pruning: Input-Guided Block Pruning for Large Language Models
Proposes IG-Pruning, a novel input-aware method for pruning transformer layers in LLMs. It dynamically removes layers based on input, reducing computational costs for efficient inference without significant performance degradation.
- 5. LTD-Bench: Evaluating Large Language Models by Letting Them Draw
Introduces LTD-Bench, a benchmark for evaluating LLMs' spatial reasoning capabilities through drawing. It addresses the limitations of opaque numerical metrics by providing an intuitive understanding of model abilities for physical world applications.
- 6. In Good GRACEs: Principled Teacher Selection for Knowledge Distillation
Proposes GRACE, a lightweight score to quantify teacher model effectiveness for student model distillation. It measures distributional properties of student gradients without a verifier, enabling principled teacher selection for efficient knowledge transfer.
- 7. SEAL - A Symmetry EncourAging Loss for High Energy Physics
Introduces SEAL, a symmetry-encouraging loss function for high energy physics. It improves robustness and data efficiency of machine learning models by explicitly respecting physical symmetries, even with experimental imperfections.
- 8. Optimizing Attention on GPUs by Exploiting GPU Architectural NUMA Effects
Addresses GPU NUMA effects in large-scale attention workloads by proposing Swizzle, a novel kernel scheduling strategy. It exploits NUMA-aware locality to optimize attention performance, mitigating memory latency and bandwidth variations.
- 9. Eliminating Multi-GPU Performance Taxes: A Systems Approach to Efficient Distributed LLMs
Introduces the 'Three Taxes' framework to analyze performance inefficiencies in distributed LLMs. Proposes moving beyond BSP to achieve efficient multi-GPU inference by addressing bulk synchronous, locality, and kernel launch overheads.
- 10. PrivGNN: High-Performance Secure Inference for Cryptographic Graph Neural Networks
Proposes PrivGNN, a high-performance secure inference protocol for graph neural networks. It addresses the challenge of securing GNNs and graph data in privacy-critical cloud environments, enabling secure analysis of graph-structured data.
- 11. An Automated Framework for Strategy Discovery, Retrieval, and Evolution in LLM Jailbreak Attacks
Reveals a jailbreak strategy that evades defenses by extracting information from failed attacks and evolving itself. It provides an automated framework for discovering, retrieving, and evolving strategies to probe LLM vulnerabilities.
- 12. AutoAdv: Automated Adversarial Prompting for Multi-Turn Jailbreaking of Large Language Models
Presents AutoAdv, a training-free framework for automated multi-turn jailbreaking of LLMs. It achieves high attack success rates by combining adaptive adversarial prompting and prompt refinement, improving LLM security analysis.
- 13. AI Research Agents for Machine Learning: Search, Exploration, and Generalization in MLE-bench
Formalizes AI research agents as search policies navigating solution spaces using operators. Focuses on improving agent performance in MLE-bench by enhancing search, exploration, and generalization for solving real-world ML problems.
- 14. Multi-Personality Generation of LLMs at Decoding-time
Proposes a novel Multi-Personality Generation (MPG) framework for LLMs at decoding time. It flexibly controls multiple personalities without retraining, enhancing adaptability and robustness for user-specific applications.
- 15. Rethinking LLM Human Simulation: When a Graph is What You Need
Identifies a class of simulation problems where Graph Neural Networks (GNNs) outperform LLMs. Introduces Graph-based Models (GEMs) that match or surpass LLM baselines for human simulation despite being smaller.
- 16. The Sequential Edge: Inverse-Entropy Voting Beats Parallel Self-Consistency at Matched Compute
Compares sequential and parallel self-consistency for LLM reasoning, finding sequential voting with inverse entropy outperforms parallel methods at equal compute. This demonstrates a more efficient scaling strategy for reasoning tasks.
- 17. Path-Consistency with Prefix Enhancement for Efficient Inference in LLMs
Introduces path-consistency, leveraging confidence of earlier answers to guide generation and enhance LLM inference efficiency. It identifies promising prefixes to reduce computational cost and time compared to standard self-consistency.
- 18. On Extending Direct Preference Optimization to Accommodate Ties
Derives and investigates two DPO variants that explicitly model ties in pairwise comparisons. Experiments show explicit tie handling can be added without performance degradation, improving DPO's robustness in preference learning.
- 19. Regularization Through Reasoning: Systematic Improvements in Language Model Classification via Explanation-Enhanced Fine-Tuning
Fine-tunes LLMs for classification by attaching explanations to labels, systematically improving naturalness, comprehensiveness, and adherence. This explanation-enhanced approach yields better conversational response quality across diverse datasets.
- 20. ExplicitLM: Decoupling Knowledge from Parameters via Explicit Memory Banks
Proposes ExplicitLM, a novel architecture with a million-scale external memory bank storing human-readable knowledge. This decouples knowledge from parameters, enabling direct inspection and modification for improved interpretability and updates.
AI for Science
- 1. Are Foundational Atomistic Models Reliable for Finite-Temperature Molecular Dynamics?
Evaluates foundational atomistic models for molecular dynamics simulations, comparing their reliability to quantum-mechanical accuracy and classical efficiency. Adopts a practitioner's viewpoint to assess universal force fields across the periodic table.
- 2. CytoNet: A Foundation Model for the Human Cerebral Cortex
Introduces CytoNet, a foundation model encoding high-resolution microscopic cerebral cortex images into expressive features via self-supervised learning. Enforces spatial proximity as a training signal to enable comprehensive brain analyses.
- 3. Scalable and Cost-Efficient de Novo Template-Based Molecular Generation
Proposes Recursive Cost Guidance for template-based GFlowNets, addressing challenges in minimizing synthesis cost, scaling to large libraries, and utilizing small fragment sets. Aims for efficient molecular generation in drug design.
- 4. Causal Graph Neural Networks for Healthcare
Combines graph-based representations of biomedical data with causal inference to address distribution shift, discrimination, and inscrutability in healthcare AI. Aims to learn causal mechanisms rather than just statistical associations.
- 5. Learning phases with Quantum Monte Carlo simulation cell
Proposes using Quantum Monte Carlo (QMC) simulation cells as machine learning input data. Demonstrates effectiveness in capturing phase transitions and enabling regression tasks using supervised ML.
- 6. Cross-modal Diffusion Modelling for Super-resolved Spatial Transcriptomics
Proposes a cross-modal diffusion model for super-resolving spatial transcriptomics data by integrating histology images and gene expressions. Addresses low resolution and restoration uncertainty in ST platforms.
AI Safety & Ethics
- 1. An Automated Framework for Strategy Discovery, Retrieval, and Evolution in LLM Jailbreak Attacks
Proposes an automated framework that discovers, retrieves, and evolves jailbreak strategies for LLMs by extracting information from failed attacks. Demonstrates a strategy that evades current defenses and self-evolves, enhancing LLM security research.
- 2. The Realignment Problem: When Right becomes Wrong in LLMs
Identifies and addresses the 'Alignment-Reality Gap' in LLMs, where models misalign with evolving norms. Proposes a framework to update LLMs efficiently without costly re-annotation, aiming for more reliable long-term use.
- 3. ValueCompass: A Framework for Measuring Contextual Value Alignment Between Human and LLMs
Introduces ValueCompass, a framework grounded in psychological theory for measuring contextual value alignment between humans and LLMs. Enables systematic assessment of AI alignment with diverse individual and societal values.
- 4. I Want to Break Free! Persuasion and Anti-Social Behavior of LLMs in Multi-Agent Settings with Social Hierarchy
Analyzes persuasion and anti-social behavior of LLM agents in hierarchical multi-agent settings. Investigates emergent phenomena and potential risks through simulated interactions, offering insights into AI agent behavior.
- 5. CytoNet: A Foundation Model for the Human Cerebral Cortex
Introduces CytoNet, a foundation model encoding high-resolution cerebral cortex images into expressive features using self-supervised learning. Enables comprehensive brain analyses by capturing cellular architecture and spatial proximity.
- 6. Feature compression is the root cause of adversarial fragility in neural network classifiers
Explains adversarial fragility in neural networks by identifying feature compression as the root cause. Provides a matrix-theoretic explanation showing how robustness degrades with input compression.
- 7. MammoClean: Toward Reproducible and Bias-Aware AI in Mammography through Dataset Harmonization
Presents MammoClean, a framework for standardizing mammography datasets and quantifying biases. Addresses heterogeneity in data quality and metadata to improve generalizability and clinical deployment of AI models.
- 8. LiveSecBench: A Dynamic and Culturally-Relevant AI Safety Benchmark for LLMs in Chinese Context
Proposes LiveSecBench, a dynamic benchmark for Chinese-context LLM safety evaluation. Covers legality, ethics, factuality, privacy, adversarial robustness, and reasoning safety rooted in Chinese frameworks.
AI Theory & Foundations
- 1. Training Convolutional Neural Networks with the Forward-Forward algorithm
Proposes the Forward-Forward (FF) algorithm as a biologically inspired alternative to backpropagation for training CNNs. Demonstrates FF's effectiveness by extending the paradigm to CNNs, offering a novel approach to deep learning training.
- 2. A Kullback-Leibler divergence method for input-system-state identification
Introduces a Kullback-Leibler divergence method within the Kalman filter framework for input-system-state identification. Addresses uncertainty by using information from prior to posterior distributions, improving parameter-state estimation.
- 3. An unscented Kalman filter method for real time input-parameter-state estimation
Examines an unscented Kalman filter method for real-time input-parameter-state estimation in linear and nonlinear systems. Estimates unknown inputs in two stages, demonstrating effectiveness using perturbation analysis.
- 4. Limit Theorems for Stochastic Gradient Descent in High-Dimensional Single-Layer Networks
Studies high-dimensional scaling limits of online stochastic gradient descent (SGD) for single-layer networks. Focuses on critical step-size regimes, revealing new correction terms beyond ballistic limits.
- 5. Improving Bayesian inference in PTA data analysis: importance nested sampling with Normalizing Flows
Enhances Bayesian inference for pulsar timing array data using importance nested sampling with normalizing flows. Achieves accurate posteriors and reliable evidence estimates with substantial computational scaling and stability.
- 6. Stability of mixed-state phases under weak decoherence
Proves Gibbs states of classical and commuting-Pauli Hamiltonians are stable under weak local decoherence. Shows decoherence effects can be locally reversed, applying to critical points and ordered phases.
Computer Vision
- 1. RoMA: Scaling up Mamba-based Foundation Models for Remote Sensing
Introduces RoMA, scaling Mamba-based foundation models for remote sensing with linear complexity. Addresses scalability barriers of Vision Transformers for large models and high-resolution images in supervised tasks.
- 2. HAGI++: Head-Assisted Gaze Imputation and Generation
Proposes HAGI++, a diffusion-based multimodal approach for imputing and generating missing gaze data in real-world and XR environments. Addresses challenges like blinks and tracking errors, enabling better behavioral research and HCI applications.
- 3. Resource-efficient Automatic Refinement of Segmentations via Weak Supervision from Light Feedback
Introduces a resource-efficient method for automatic segmentation refinement using weak supervision from light feedback. Addresses limitations of foundation models in medical imaging by improving performance with less labor-intensive annotation.
- 4. Training Convolutional Neural Networks with the Forward-Forward algorithm
Extends the Forward-Forward (FF) algorithm for training Convolutional Neural Networks (CNNs). Proposes a biologically inspired alternative to backpropagation, enabling CNN training with locally defined goodness functions.
- 5. Deep Fourier-embedded Network for RGB and Thermal Salient Object Detection
Proposes FreqSal, a purely Fourier Transform-based model for RGB-T salient object detection. Overcomes quadratic complexity limitations of Transformer models, enabling efficient bimodal feature fusion for high-resolution images.
- 6. Mobile Robotic Multi-View Photometric Stereo
Introduces a new mobile robotic system for Multi-View Photometric Stereo (MVPS) 3D acquisition. Enables MVPS benefits on movable platforms, expanding 3D acquisition capabilities for mobile robotics applications.
- 7. Prompt to Restore, Restore to Prompt: Cyclic Prompting for Universal Adverse Weather Removal
Proposes CyclicPrompt, a cyclic prompting approach for Universal Adverse Weather Removal (UAWR). Enhances effectiveness, adaptability, and generalizability of weather-free image restoration using prompt learning with vision-language models.
- 8. Breaking Down Monocular Ambiguity: Exploiting Temporal Evolution for 3D Lane Detection
Proposes a Geometry-aware Temporal Aggregation Network to address monocular ambiguity in 3D lane detection. Exploits temporal evolution information to improve geometric predictions and lane integrity, especially for distant lanes.
- 9. GeoSDF: Plane Geometry Diagram Synthesis via Signed Distance Field
Proposes GeoSDF, a text-to-3D framework for generating 3D plane geometry diagrams using Signed Distance Fields. Addresses challenges in creating intricate structures by leveraging 3D priors and 2D diffusion models.
- 10. Crucial-Diff: A Unified Diffusion Model for Crucial Image and Annotation Synthesis in Data-scarce Scenarios
Proposes Crucial-Diff, a unified diffusion model for synthesizing crucial images and annotations in data-scarce scenarios. Addresses model overfitting and dataset imbalance by generating targeted training samples to improve detection and segmentation.
Efficient AI
- 1. From the Laboratory to Real-World Application: Evaluating Zero-Shot Scene Interpretation on Edge Devices for Mobile Robotics
Evaluates zero-shot scene interpretation on edge devices for mobile robotics using LLMs and VLMs. Demonstrates capabilities in perception, reasoning, and zero-shot tasks, enabling domain-specific applications with potential for real-world deployment.
- 2. IG-Pruning: Input-Guided Block Pruning for Large Language Models
Proposes IG-Pruning, an input-aware block pruning method for LLMs to reduce computational costs. Achieves significant reduction in computational demands while maintaining performance across different tasks and inputs, enabling more efficient inference.
- 3. Path-Consistency with Prefix Enhancement for Efficient Inference in LLMs
Introduces path-consistency for efficient LLM inference by leveraging prefix confidence to guide generation. Reduces computational expense and time compared to self-consistency by identifying promising prefixes, enabling faster and more efficient reasoning.
- 4. Eliminating Multi-GPU Performance Taxes: A Systems Approach to Efficient Distributed LLMs
Proposes a systems approach to efficient distributed LLM inference across multiple GPUs by moving beyond the bulk synchronous parallel model. Addresses performance inefficiencies and bottlenecks for improved scalability and resource utilization.
- 5. Flashlight: PyTorch Compiler Extensions to Accelerate Attention Variants
Introduces Flashlight, PyTorch compiler extensions for efficient attention variants. Leverages tiling and kernel fusion to optimize attention mechanisms, supporting new variants for enhanced model quality and efficiency in LLMs.
- 6. Apriel-H1: Towards Efficient Enterprise Reasoning Models
Develops Apriel-H1 for efficient enterprise reasoning models by addressing transformer complexity. Reduces quadratic time/memory complexity and caching overheads to improve throughput and scalability for agentic tasks and high request loads.
Generative AI
- 1. HAGI++: Head-Assisted Gaze Imputation and Generation
Proposes HAGI++, a multimodal diffusion-based approach for gaze data imputation, addressing missing values due to blinks and errors. Achieves better imputation and generation for mobile eye tracking applications.
- 2. 3DBonsai: Structure-Aware Bonsai Modeling Using Conditioned 3D Gaussian Splatting
Introduces 3DBonsai, a novel text-to-3D framework for generating complex bonsai structures using conditioned 3D Gaussian Splatting. Addresses limitations of prior methods lacking detailed structural information.
- 3. Crucial-Diff: A Unified Diffusion Model for Crucial Image and Annotation Synthesis in Data-scarce Scenarios
Proposes Crucial-Diff, a unified diffusion model for synthesizing crucial images and annotations in data-scarce scenarios. Addresses repetitive and simplistic synthetic samples by targeting downstream model weaknesses.
- 4. A Practical Investigation of Spatially-Controlled Image Generation with Transformers
Investigates spatially-controlled image generation using transformers, focusing on practical aspects. Provides a detailed and fair scientific comparison of methods for fine-grained image control.
- 5. Light Future: Multimodal Action Frame Prediction via InstructPix2Pix
Proposes Light Future, an efficient approach for robot action prediction by adapting InstructPix2Pix. Offers reduced computational cost and latency compared to conventional video prediction models.
- 6. Prompt to Restore, Restore to Prompt: Cyclic Prompting for Universal Adverse Weather Removal
Introduces CyclicPrompt, an innovative cyclic prompting approach for universal adverse weather removal. Enhances effectiveness, adaptability, and generalizability of prompt learning for image restoration.
- 7. Robust Identity Perceptual Watermark Against Deepfake Face Swapping
Proposes a robust identity perceptual watermark to combat deepfake face swapping. Addresses performance damping and generalizability issues faced by passive detection models.
- 8. Training Convolutional Neural Networks with the Forward-Forward algorithm
Extends the Forward-Forward (FF) algorithm to Convolutional Neural Networks (CNNs). Explores a biologically inspired alternative to backpropagation for training CNNs.
- 9. MediQ-GAN: Quantum-Inspired GAN for High Resolution Medical Image Generation
Presents MediQ-GAN, a quantum-inspired Generative Adversarial Network for high-resolution medical image generation. Addresses computational and sample resource demands of classical generative models.
- 10. GeoSDF: Plane Geometry Diagram Synthesis via Signed Distance Field
Proposes GeoSDF, a novel text-to-3D framework for generating 3D bonsai with complex structures. Leverages 3D priors combined with 2D diffusion to address limitations in generating intricate structures.
Graph Neural Networks
- 1. Rethinking LLM Human Simulation: When a Graph is What You Need
Proposes a Graph Neural Network (GNN) approach for human simulation, matching or surpassing LLM baselines on choice-among-discrete-options tasks. Demonstrates GNNs can be orders of magnitude smaller than LLMs while achieving comparable performance.
- 2. Link prediction Graph Neural Networks for structure recognition of Handwritten Mathematical Expressions
Proposes a GNN-based approach for Handwritten Mathematical Expression recognition by modeling expressions as graphs. Uses a deep BLSTM for initial graph formation, refined by GNN link prediction for structure recognition.
- 3. Causal Graph Neural Networks for Healthcare
Combines causal inference with graph neural networks to address brittleness and discrimination in healthcare AI. Aims to learn causal mechanisms rather than just statistical associations for improved reliability and interpretability.
- 4. Predicting Microbial Interactions Using Graph Neural Networks
Utilizes Graph Neural Networks to predict interspecies interactions in microbial communities using monoculture growth capabilities, interactions, and phylogeny data. Aims to capture critical factors for community structure and activity.
- 5. PrivGNN: High-Performance Secure Inference for Cryptographic Graph Neural Networks
Develops secure inference protocols for Graph Neural Networks (GNNs) and graph-structured data in privacy-critical environments. Addresses the underexplored challenge of securing GNNs, focusing on high-performance protocols.
- 6. Influence-aware Causal Autoencoder Network for Node Importance Ranking in Complex Networks
Proposes an influence-aware causal autoencoder network for node importance ranking in complex networks. Aims to design node importance without direct reliance on network topology, addressing privacy concerns and improving generalization.
- 7. UFGraphFR: Graph Federation Recommendation System based on User Text description features
Derives insights from user text descriptions to build a federated recommendation system based on graph structures. Addresses data localization challenges in federated learning by capturing global user relationships.
- 8. IVGAE-TAMA-BO: A novel temporal dynamic variational graph model for link prediction in global food trade networks with momentum structural memory and Bayesian optimization
Introduces a novel dynamic variational graph model for link prediction in global food trade networks. Captures temporal patterns using momentum structural memory and Bayesian optimization to model evolving network structures.
Large Language Models
- 1. TPS-Bench: Evaluating AI Agents' Tool Planning \& Scheduling Abilities in Compounding Tasks
Introduces TPS-Bench, a benchmark for evaluating AI agents' tool planning and scheduling for compounding tasks. Assesses LLM agents' ability to select and order tools for efficient real-world problem-solving.
- 2. ExplicitLM: Decoupling Knowledge from Parameters via Explicit Memory Banks
Proposes ExplicitLM, an architecture with explicit external memory banks for human-readable knowledge. Enables direct inspection and modification of knowledge, improving LLM interpretability and updateability.
- 3. Regularization Through Reasoning: Systematic Improvements in Language Model Classification via Explanation-Enhanced Fine-Tuning
Proposes regularization through reasoning by attaching explanations to labels during LLM fine-tuning. Achieves systematic improvements in classification performance and naturalness across diverse datasets.
- 4. Can LLMs subtract numbers?
Conducts a systematic study on LLM subtraction capabilities, revealing significantly lower accuracy compared to addition. Identifies systematic errors and proposes insights into LLM arithmetic limitations.
- 5. Repetitions are not all alike: distinct mechanisms sustain repetition in language models
Investigates distinct mechanisms behind LLM repetition, contrasting conditions eliciting repetitive loops. Reveals that repetitions arise from different underlying causes, offering insights into LLM behavior and training.
- 6. GeoLLaVA-8K: Scaling Remote-Sensing Multimodal Large Language Models to 8K Resolution
Introduces GeoLLaVA-8K, a multimodal LLM for remote sensing, trained on novel high-resolution datasets. Achieves state-of-the-art performance on VQA tasks, enabling detailed Earth observation analysis.
- 7. LTD-Bench: Evaluating Large Language Models by Letting Them Draw
Proposes LTD-Bench, a new benchmark for evaluating LLM spatial reasoning by requiring them to draw. Demonstrates current LLMs struggle with spatial tasks, highlighting a critical evaluation gap for physical world understanding.
- 8. IG-Pruning: Input-Guided Block Pruning for Large Language Models
Introduces IG-Pruning, an input-aware method for pruning LLM layers to improve efficiency. Achieves significant reductions in computation while maintaining performance across tasks, enabling practical LLM deployment.
- 9. Multi-Personality Generation of LLMs at Decoding-time
Proposes a decoding-time framework for multi-personality LLM generation without retraining. Achieves flexible control over multiple attributes, enhancing LLM adaptability and user experience.
- 10. Large Language Models are Unreliable for Cyber Threat Intelligence
Evaluates LLM reliability for Cyber Threat Intelligence, quantifying consistency and confidence. Finds LLMs are unreliable for CTI tasks, highlighting limitations in practical application and the need for robust evaluation.
Multimodal Learning
- 1. GeoLLaVA-8K: Scaling Remote-Sensing Multimodal Large Language Models to 8K Resolution
Introduces GeoLLaVA-8K, a multimodal LLM adapted for 8K resolution remote sensing data. Proposes SuperRS-VQA and HighRS-VQA datasets to address data scarcity, enabling improved Earth observation analysis.
- 2. SigmaCollab: An Application-Driven Dataset for Physically Situated Collaboration
Introduces SigmaCollab, a multimodal dataset for physically situated human-AI collaboration. Captures audio, egocentric video, depth, and tracking data from 85 sessions to enable research on assistive AI agents.
- 3. Talk2Event: Grounded Understanding of Dynamic Scenes from Event Cameras
Introduces Talk2Event, a benchmark for language-driven object grounding in event-based perception. Provides over 30,000 validated referring expressions with grounding attributes for dynamic scene understanding.
- 4. Prompt to Restore, Restore to Prompt: Cyclic Prompting for Universal Adverse Weather Removal
Proposes CyclicPrompt, a cyclic prompting approach for universal adverse weather removal. Leverages vision-language models with degradation-aware prompts to enhance image restoration effectiveness and generalizability.
- 5. HAGI++: Head-Assisted Gaze Imputation and Generation
Introduces HAGI++, a multimodal diffusion-based approach for gaze imputation and generation. Addresses missing eye-tracking data due to blinks or errors, enhancing analysis for XR and behavioral research.
- 6. Deep Fourier-embedded Network for RGB and Thermal Salient Object Detection
Proposes Deep Fourier-embedded Network (FreqSal), a Fourier Transform-based model for RGB-T salient object detection. Addresses memory limitations of Transformer models by leveraging Fourier transforms for bimodal feature fusion.
- 7. NMCSE: Noise-Robust Multi-Modal Coupling Signal Estimation Method via Optimal Transport for Cardiovascular Disease Detection
Introduces NMCSE, a noise-robust multi-modal method using optimal transport for coupling signal estimation from ECG and PCG. Encodes electrophysiological and hemodynamic interplay for enhanced cardiac function representation.
- 8. DetectiumFire: A Comprehensive Multi-modal Dataset Bridging Vision and Language for Fire Understanding
Introduces DetectiumFire, a large-scale multimodal dataset for fire understanding. Comprises 22.5k images and 2.5k videos, bridging vision and language to address the lack of annotated fire domain data for multimodal models.
Natural Language Processing
- 1. GraphChain: Large Language Models for Large-scale Graph Analysis via Tool Chaining
Introduces GraphChain, a framework enabling LLMs to analyze large graphs via dynamic tool sequences. Achieves balanced tool usage and graph distillation, mimicking human exploratory intelligence for complex graph analysis.
- 2. Deciphering Personalization: Towards Fine-Grained Explainability in Natural Language for Personalized Image Generation Models
Proposes a fine-grained natural language explainability method for personalized image generation models. Enables users to understand how personalization affects image generation, addressing limitations of current coarse-grained explanations.
- 3. H-NeiFi: Non-Invasive and Consensus-Efficient Multi-Agent Opinion Guidance
Introduces H-NeiFi, a non-invasive framework for multi-agent opinion guidance towards consensus. Utilizes opinion guidance without direct intervention, promoting global consensus efficiently and preserving user autonomy.
- 4. GeoLLaVA-8K: Scaling Remote-Sensing Multimodal Large Language Models to 8K Resolution
Introduces GeoLLaVA-8K, scaling remote-sensing multimodal LLMs to 8K resolution. Presents SuperRS-VQA and HighRS-VQA datasets to address data scarcity and token explosion, enabling improved Earth observation analysis.
- 5. Hybrid Retrieval-Augmented Generation Agent for Trustworthy Legal Question Answering in Judicial Forensics
Proposes a hybrid RAG agent for trustworthy legal QA in judicial forensics. Integrates retrieval with LLM reasoning to combat hallucination and provide verifiable, up-to-date legal information, ensuring reliable consultation.
- 6. Regularization Through Reasoning: Systematic Improvements in Language Model Classification via Explanation-Enhanced Fine-Tuning
Investigates using explanations attached to labels during LLM fine-tuning for classification. Demonstrates systematic improvements in conversational response quality across various datasets and tasks.
- 7. Speech-DRAME: A Framework for Human-Aligned Benchmarks in Speech Role-Play
Introduces Speech-DRAME, a framework for human-aligned speech role-play benchmarks. Addresses ALLM limitations by incorporating paralinguistic cues and real-world role context for more accurate evaluation.
- 8. Modulation of temporal decision-making in a deep reinforcement learning agent under the dual-task paradigm
Explores temporal processing interference in a dual-task paradigm using DRL agents. Investigates how concurrent tasks modulate time production and number comparison, providing insights into AI decision-making mechanisms.
Reinforcement Learning
- 1. Natural-gas storage modelling by deep reinforcement learning
Introduces GasRL, a simulator coupling a calibrated natural gas market with deep reinforcement learning-trained storage policies. Achieves superior performance with SAC, optimizing stockpile management to affect equilibrium prices and market dynamics.
- 2. Reinforcement learning based data assimilation for unknown state model
Proposes a reinforcement learning-based approach for data assimilation with unknown system dynamics. Leverages RL to construct a surrogate state transition model, overcoming reliance on pre-computed, noise-free training datasets.
- 3. Optimizing Multi-Lane Intersection Performance in Mixed Autonomy Environments
Presents a novel traffic signal control framework combining Graph Attention Networks with Soft Actor-Critic RL. Models dynamic graph-structured traffic flow to optimize coordination between human-driven and autonomous vehicles.
- 4. Interpretable end-to-end Neurosymbolic Reinforcement Learning agents
Instantiates the SCoBots framework for interpretable neurosymbolic RL agents. Decomposes RL tasks into interpretable representations, addressing deep RL's shortcut learning and generalization issues from raw pixel states.
- 5. Path-Coordinated Continual Learning with Neural Tangent Kernel-Justified Plasticity: A Theoretical Framework with Near State-of-the-Art Performance
Proposes a path-coordinated continual learning framework combining Neural Tangent Kernel theory, statistical validation, and multiple path quality metrics. Addresses catastrophic forgetting by justifying plasticity bounds for improved performance.
- 6. SAIL-RL: Guiding MLLMs in When and How to Think via Dual-Reward RL Tuning
Introduces SAIL-RL, a reinforcement learning post-training framework to enhance multimodal LLM reasoning. Teaches models when and how to think using dual-reward RL tuning, addressing outcome-only supervision and uniform thinking strategies.
- 7. Automata-Conditioned Cooperative Multi-Agent Reinforcement Learning
Presents Automata-Conditioned Cooperative Multi-Agent Reinforcement Learning for multi-task cooperative objectives. Uses automata to decompose tasks for agents, improving sample efficiency and enabling multi-task learning.
- 8. Large-scale automatic carbon ion treatment planning for head and neck cancers via parallel multi-agent reinforcement learning
Develops a large-scale automatic treatment planning system for carbon ion therapy using parallel multi-agent reinforcement learning. Optimizes numerous treatment planning parameters to improve dose conformity and OAR sparing.
Robotics & Embodied AI
- 1. SigmaCollab: An Application-Driven Dataset for Physically Situated Collaboration
Introduces SigmaCollab, a dataset for human-AI collaboration research featuring multimodal streams like egocentric video and tracking data. Enables study of AI agents guiding humans in physical tasks, advancing embodied AI and human-robot interaction.
- 2. Mobile Robotic Multi-View Photometric Stereo
Proposes a mobile robotic multi-view photometric stereo system for 3D acquisition. Addresses limitations of fixed setups, enabling 3D reconstruction for mobile robotics applications by adapting MVPS to movable platforms.
- 3. Learning Interactive World Model for Object-Centric Reinforcement Learning
Introduces FIOC-WM, a framework learning structured representations of objects and interactions for world models. Captures environment dynamics with disentangled modules, improving robustness and transferability for object-centric RL agents.
- 4. Maestro: Orchestrating Robotics Modules with Vision-Language Models for Zero-Shot Generalist Robots
Proposes Maestro, a framework augmenting off-the-shelf VLMs with robotics modules for generalist robots. Leverages VLM capabilities and curated modules to reduce data needs, enabling zero-shot control in physical tasks.
- 5. Interpretable end-to-end Neurosymbolic Reinforcement Learning agents
Instantiates the SCoBots framework for interpretable, neurosymbolic RL agents addressing shortcut learning. Decomposes tasks into interpretable representations, enabling generalization from raw pixel states for robotics and embodied AI.
- 6. Endowing GPT-4 with a Humanoid Body: Building the Bridge Between Off-the-Shelf VLMs and the Physical World
Enables off-the-shelf VLMs like GPT-4 to control humanoid agents by augmenting their generalization with specialized modules. Addresses data scarcity for humanoid robotics, bridging the gap between language models and physical interaction.
- 7. URDF-Anything: Constructing Articulated Objects with 3D Multimodal Language Model
Proposes URDF-Anything, an automatic framework for reconstructing articulated objects using 3D multimodal LLMs. Creates digital twins for robotic simulation by generating URDF models from point clouds and text inputs.
- 8. GenDexHand: Generative Simulation for Dexterous Hands
Addresses data scarcity for dexterous manipulation by generating feasible and trainable tasks. Proposes GenDexHand for creating specialized simulation environments, overcoming limitations of existing methods for embodied AI.
Speech & Audio
- 1. Condition-Invariant fMRI Decoding of Speech Intelligibility with Deep State Space Model
Introduces a deep state space model for condition-invariant fMRI decoding of speech intelligibility. Achieves state-of-the-art performance, demonstrating condition-invariant neural codes across diverse listening environments for speech processing.
- 2. Data-driven Learning of Interaction Laws in Multispecies Particle Systems with Gaussian Processes: Convergence Theory and Applications
Develops a Gaussian process framework to learn interaction kernels in multispecies particle systems from trajectory data. Establishes convergence theory for single-species systems and extends to second-order models, enabling better multiscale modeling.
- 3. EchoLSTM: A Self-Reflective Recurrent Network for Stabilizing Long-Range Memory
Proposes EchoLSTM, a self-reflective recurrent network using output-conditioned gating to stabilize long-range memory. Enhances memory retention in sequences with noisy or misleading information, improving performance over standard LSTMs.
- 4. Interpretable end-to-end Neurosymbolic Reinforcement Learning agents
Instantiates the SCoBots framework for interpretable neurosymbolic RL agents, decomposing tasks into interpretable representations. Addresses shortcut learning in deep RL by using object-centric states, improving generalization.
- 5. ProtoTSNet: Interpretable Multivariate Time Series Classification With Prototypical Parts
Presents ProtoTSNet for interpretable multivariate time series classification using prototypical parts. Enhances ProtoPNet for critical domains like industry and medicine, providing accurate and understandable decisions.
- 6. CFL: On the Use of Characteristic Function Loss for Domain Alignment in Machine Learning
Introduces Characteristic Function Loss (CFL) for domain alignment in machine learning. Addresses distribution shift by learning models that perform well in real-world scenarios, improving robustness.