AIPapers.ai - AI Research Papers Daily

Today's Large Language Models Research Top Papers

Wednesday, November 5, 2025

📊 Read Full Intelligence Reports:

TPS-Bench: Evaluating AI Agents' Tool Planning \& Scheduling Abilities in Compounding Tasks

Introduces TPS-Bench, a benchmark for evaluating AI agents' tool planning and scheduling for compounding tasks. Assesses LLM agents' ability to select and order tools for efficient real-world problem-solving.

ExplicitLM: Decoupling Knowledge from Parameters via Explicit Memory Banks

Proposes ExplicitLM, an architecture with explicit external memory banks for human-readable knowledge. Enables direct inspection and modification of knowledge, improving LLM interpretability and updateability.

Regularization Through Reasoning: Systematic Improvements in Language Model Classification via Explanation-Enhanced Fine-Tuning

Proposes regularization through reasoning by attaching explanations to labels during LLM fine-tuning. Achieves systematic improvements in classification performance and naturalness across diverse datasets.

Can LLMs subtract numbers?

Conducts a systematic study on LLM subtraction capabilities, revealing significantly lower accuracy compared to addition. Identifies systematic errors and proposes insights into LLM arithmetic limitations.

Repetitions are not all alike: distinct mechanisms sustain repetition in language models

Investigates distinct mechanisms behind LLM repetition, contrasting conditions eliciting repetitive loops. Reveals that repetitions arise from different underlying causes, offering insights into LLM behavior and training.

GeoLLaVA-8K: Scaling Remote-Sensing Multimodal Large Language Models to 8K Resolution

Introduces GeoLLaVA-8K, a multimodal LLM for remote sensing, trained on novel high-resolution datasets. Achieves state-of-the-art performance on VQA tasks, enabling detailed Earth observation analysis.

LTD-Bench: Evaluating Large Language Models by Letting Them Draw

Proposes LTD-Bench, a new benchmark for evaluating LLM spatial reasoning by requiring them to draw. Demonstrates current LLMs struggle with spatial tasks, highlighting a critical evaluation gap for physical world understanding.

IG-Pruning: Input-Guided Block Pruning for Large Language Models

Introduces IG-Pruning, an input-aware method for pruning LLM layers to improve efficiency. Achieves significant reductions in computation while maintaining performance across tasks, enabling practical LLM deployment.

Multi-Personality Generation of LLMs at Decoding-time

Proposes a decoding-time framework for multi-personality LLM generation without retraining. Achieves flexible control over multiple attributes, enhancing LLM adaptability and user experience.

Large Language Models are Unreliable for Cyber Threat Intelligence

Evaluates LLM reliability for Cyber Threat Intelligence, quantifying consistency and confidence. Finds LLMs are unreliable for CTI tasks, highlighting limitations in practical application and the need for robust evaluation.

Sort by:

arxiv_cv

Training Convolutional Neural Networks with the Forward-Forward algorithm

Abstract: Abstract: Recent successes in image analysis with deep neural networks are achieved almost exclusively with Convolutional Neural Networks (CNNs), typically trained using the backpropagation (BP) algorithm. In a 2022 preprint, Geoffrey Hinton proposed...

#Deep Learning Training Methods#Alternative Learning Algorithms#Neural Network Architectures#Computational Neuroscience#Image Analysis

17 hours ago

70%

arxiv_cv

GeoLLaVA-8K: Scaling Remote-Sensing Multimodal Large Language Models to 8K Resolution

Abstract: Abstract: Ultra-high-resolution (UHR) remote sensing (RS) imagery offers valuable data for Earth observation but pose challenges for existing multimodal foundation models due to two key bottlenecks: (1) limited availability of UHR training data, and ...

#Multimodal AI#Large Language Models#Remote Sensing#Computer Vision#Earth Observation

17 hours ago

96%

arxiv_cl

On Extending Direct Preference Optimization to Accommodate Ties

Abstract: Abstract: We derive and investigate two DPO variants that explicitly model the possibility of declaring a tie in pair-wise comparisons. We replace the Bradley-Terry model in DPO with two well-known modeling extensions, by Rao and Kupper and by Davids...

#Reinforcement Learning from Human Feedback (RLHF)#AI Alignment#Preference Learning#Natural Language Generation#Model Optimization

17 hours ago

95%

arxiv_ml

Repetitions are not all alike: distinct mechanisms sustain repetition in language models

Abstract: Abstract: Large Language Models (LLMs) can sometimes degrade into repetitive loops, persistently generating identical word sequences. Because repetition is rare in natural human language, its frequent occurrence across diverse tasks and contexts in L...

#LLM Behavior Analysis#Model Interpretability#Natural Language Generation Issues#Machine Learning Training Dynamics#Attention Mechanisms

17 hours ago

95%

arxiv_cl

MemSearcher: Training LLMs to Reason, Search and Manage Memory via End-to-End Reinforcement Learning

Abstract: Abstract: Typical search agents concatenate the entire interaction history into the LLM context, preserving information integrity but producing long, noisy contexts, resulting in high computation and memory costs. In contrast, using only the current ...

#Agent Systems#Memory in AI#Reinforcement Learning#Natural Language Processing#Search Algorithms

17 hours ago

90%

arxiv_cv

Language-Enhanced Generative Modeling for PET Synthesis from MRI and Blood Biomarkers

Abstract: Abstract: Background: Alzheimer's disease (AD) diagnosis heavily relies on amyloid-beta positron emission tomography (Abeta-PET), which is limited by high cost and limited accessibility. This study explores whether Abeta-PET spatial patterns can be p...

#Medical Imaging#Generative AI#Alzheimer's Disease#Machine Learning#Multimodal Learning#Biomarkers

17 hours ago

93%

arxiv_ml

Beyond Contrastive Learning: Synthetic Data Enables List-wise Training with Multiple Levels of Relevance

Abstract: Abstract: Although synthetic data has changed various aspects of information retrieval (IR) pipelines, the main training paradigm remains: contrastive learning with binary relevance labels, where one positive document is compared against several nega...

#Information Retrieval#Learning to Rank#Synthetic Data Generation#Deep Learning for IR#LLM Applications

17 hours ago

90%

arxiv_cv

ChartM$^3$: A Multi-Stage Code-Driven Pipeline for Constructing Multi-Dimensional and Multi-Step Visual Reasoning Data in Chart Comprehension

Abstract: Abstract: Complex chart understanding tasks demand advanced visual recognition and reasoning capabilities from multimodal large language models (MLLMs). However, current research provides limited coverage of complex chart scenarios and computation-in...

#Natural Language Processing#Computer Vision#Multimodal AI#Large Language Models#Data Generation#Visual Reasoning

17 hours ago

95%

arxiv_cl

Multi-Personality Generation of LLMs at Decoding-time

Abstract: Abstract: Multi-personality generation for LLMs, enabling simultaneous embodiment of multiple personalization attributes, is a fundamental challenge. Existing retraining-based approaches are costly and poorly scalable, while decoding-time methods oft...

#LLM Personalization#Conditional Text Generation#Efficient Inference#Decoding Strategies#Controllable Generation

17 hours ago

90%

arxiv_cl

TwT: Thinking without Tokens by Habitual Reasoning Distillation with Multi-Teachers' Guidance

Abstract: Abstract: Large Language Models (LLMs) have made significant strides in problem-solving by incorporating reasoning processes. However, this enhanced reasoning capability results in an increased number of output tokens during inference, leading to hig...

#Large Language Models#Efficient AI#Machine Learning Optimization#Knowledge Distillation#Reasoning in AI

17 hours ago

95%

arxiv_cl

An Evaluation of Interleaved Instruction Tuning on Semantic Reasoning Performance in an Audio MLLM

Abstract: Abstract: Standard training for Multi-modal Large Language Models (MLLMs) involves concatenating non-textual information, like vision or audio, with a text prompt. This approach may not encourage deep integration of modalities, limiting the model's a...

#Multimodal Learning#Instruction Tuning#Semantic Reasoning#Model Evaluation#Audio Processing

17 hours ago

95%

arxiv_cl

Identifying Aspects in Peer Reviews

Abstract: Abstract: Peer review is central to academic publishing, but the growing volume of submissions is straining the process. This motivates the development of computational approaches to support peer review. While each review is tailored to a specific pa...

#Natural Language Processing#Information Extraction#Academic Publishing#Machine Learning Applications#Text Mining

17 hours ago

85%

arxiv_cl

IG-Pruning: Input-Guided Block Pruning for Large Language Models

Abstract: Abstract: With the growing computational demands of large language models (LLMs), efficient inference has become increasingly critical for practical deployment. Depth pruning has emerged as a promising approach for reducing the computational costs of...

#Model Compression#Efficient Deep Learning#LLM Inference#Neural Network Architecture Search#Hardware Acceleration

17 hours ago

90%

arxiv_cl

How Teachers Can Use Large Language Models and Bloom's Taxonomy to Create Educational Quizzes

Abstract: Abstract: Question generation (QG) is a natural language processing task with an abundance of potential benefits and use cases in the educational domain. In order for this potential to be realized, QG systems must be designed and validated with pedag...

#Educational AI#Natural Language Generation#Pedagogy and AI#LLM Applications#Automated Assessment

17 hours ago

85%

arxiv_cl

Path-Consistency with Prefix Enhancement for Efficient Inference in LLMs

Abstract: Abstract: To enhance the reasoning capabilities of large language models (LLMs), self-consistency has become a popular approach, combining multiple samplings with majority voting. However, current methods are computationally expensive and time-consum...

#Large Language Models#Efficient AI#Machine Learning Optimization#Reasoning in AI#Inference Techniques

17 hours ago

95%

arxiv_cl

Understanding and Optimizing Agentic Workflows via Shapley value

Abstract: Abstract: Agentic workflows have become the dominant paradigm for building complex AI systems, orchestrating specialized components, such as planning, reasoning, action execution, and reflection, to tackle sophisticated real-world tasks. However, sys...

#AI System Design#Agent-based Systems#Explainable AI (XAI)#Optimization#Machine Learning Theory

17 hours ago

90%

arxiv_cv

Can Visual Input Be Compressed? A Visual Token Compression Benchmark for Large Multimodal Models

Abstract: Abstract: Large multimodal models (LMMs) often suffer from severe inference inefficiency due to the large number of visual tokens introduced by image encoders. While recent token compression methods, such as pruning and merging, have shown promise in...

#Multimodal AI#Large Language Models#Model Compression#Inference Optimization#AI Benchmarking

17 hours ago

95%

arxiv_cl

Beyond the Link: Assessing LLMs' ability to Classify Political Content across Global Media

Abstract: Abstract: The use of large language models (LLMs) is becoming common in political science and digital media research. While LLMs have demonstrated ability in labelling tasks, their effectiveness to classify Political Content (PC) from URLs remains un...

#Political Science#Digital Media Analysis#Natural Language Processing#LLM Evaluation#Computational Social Science

17 hours ago

93%

arxiv_cl

Beyond Single Embeddings: Capturing Diverse Targets with Multi-Query Retrieval

Abstract: Abstract: Most text retrievers generate \emph{one} query vector to retrieve relevant documents. Yet, the conditional distribution of relevant documents for the query may be multimodal, e.g., representing different interpretations of the query. We fir...

#Information Retrieval#Natural Language Processing#Machine Learning Architectures#Vector Search#Representation Learning

17 hours ago

90%

arxiv_cl

AyurParam: A State-of-the-Art Bilingual Language Model for Ayurveda

Abstract: Abstract: Current large language models excel at broad, general-purpose tasks, but consistently underperform when exposed to highly specialized domains that require deep cultural, linguistic, and subject-matter expertise. In particular, traditional m...

#Specialized AI Models#Computational Linguistics#Medical AI#Digital Health#Cross-lingual NLP

17 hours ago

95%

arxiv_cl

Visual Program Distillation with Template-Based Augmentation

Abstract: Abstract: Adapting visual programming or prompting large language models (LLMs) to generate executable code for visual tasks like visual question answering (VQA) for specialized tasks or domains remains challenging due to high annotation and inferenc...

#Visual Reasoning#Program Synthesis#LLM Adaptation#Efficient AI#Data Augmentation

17 hours ago

90%

arxiv_cl

Merging Continual Pretraining Models for Domain-Specialized LLMs: A Case Study in Finance

Abstract: Abstract: While LLMs excel at general tasks, they struggle in specialized domains like finance, requiring diverse skills in domain knowledge, mathematical reasoning, and multilingual processing. Merging domain-specific Continual Pre-training (CPT) "e...

#Large Language Models#Model Compression and Merging#Domain Specialization#Transfer Learning#Natural Language Processing

17 hours ago

95%

arxiv_ml

Evolutionary Machine Learning meets Self-Supervised Learning: a comprehensive survey

Abstract: Abstract: The number of studies that combine Evolutionary Machine Learning and self-supervised learning has been growing steadily in recent years. Evolutionary Machine Learning has been shown to help automate the design of machine learning algorithms...

#Machine Learning Automation#Representation Learning#Data Efficiency#Algorithm Design#Survey of ML Techniques

17 hours ago

90%

arxiv_cl

Controlling Performance and Budget of a Centralized Multi-agent LLM System with Reinforcement Learning

Abstract: Abstract: Large language models (LLMs) exhibit complementary strengths across domains and come with varying inference costs, motivating the design of multi-agent LLM systems where specialized models collaborate efficiently. Existing approaches predom...

#Multi-Agent Systems#Large Language Models#Reinforcement Learning#Resource Management#AI System Optimization

17 hours ago

95%

arxiv_cl

LTD-Bench: Evaluating Large Language Models by Letting Them Draw

Abstract: Abstract: Current evaluation paradigms for large language models (LLMs) represent a critical blind spot in AI research--relying on opaque numerical metrics that conceal fundamental limitations in spatial reasoning while providing no intuitive underst...

#LLM Evaluation#Spatial Reasoning#Multimodal AI#AI Benchmarking#Human-AI Interaction

17 hours ago

90%

arxiv_cl

CostBench: Evaluating Multi-Turn Cost-Optimal Planning and Adaptation in Dynamic Environments for LLM Tool-Use Agents

Abstract: Abstract: Current evaluations of Large Language Model (LLM) agents primarily emphasize task completion, often overlooking resource efficiency and adaptability. This neglects a crucial capability: agents' ability to devise and adjust cost-optimal plan...

#AI Agents#Planning and Reasoning#LLM Tool Use#Resource Optimization#Benchmarking

17 hours ago

90%

arxiv_cl

Prompting for Policy: Forecasting Macroeconomic Scenarios with Synthetic LLM Personas

Abstract: Abstract: We evaluate whether persona-based prompting improves Large Language Model (LLM) performance on macroeconomic forecasting tasks. Using 2,368 economics-related personas from the PersonaHub corpus, we prompt GPT-4o to replicate the ECB Survey ...

#Economics#Econometrics#Artificial Intelligence#Large Language Models#Forecasting Methods

17 hours ago

95%

arxiv_cl

Oolong: Evaluating Long Context Reasoning and Aggregation Capabilities

Abstract: Abstract: As model context lengths continue to grow, concerns about whether models effectively use the full context length have persisted. While several carefully designed long-context evaluations have recently been released, these evaluations tend t...

#Large Language Models#Context Window Management#Reasoning Capabilities#Benchmark Design#Natural Language Understanding

17 hours ago

95%

arxiv_cv

Adapting General-Purpose Foundation Models for X-ray Ptychography in Low-Data Regimes

Abstract: Abstract: The automation of workflows in advanced microscopy is a key goal where foundation models like Language Models (LLMs) and Vision-Language Models (VLMs) show great potential. However, adapting these general-purpose models for specialized scie...

#Machine Learning#Foundation Models#Domain Adaptation#Scientific AI#Microscopy#Ptychography#Low-Data Learning

17 hours ago

90%

arxiv_cl

Rethinking the Relationship between the Power Law and Hierarchical Structures

Abstract: Abstract: Statistical analysis of corpora provides an approach to quantitatively investigate natural languages. This approach has revealed that several power laws consistently emerge across different corpora and languages, suggesting universal mechan...

#Linguistics#Computational Linguistics#Statistical Language Modeling#Cognitive Science#Information Theory

17 hours ago

75%

Loading more papers...

📚 You've reached the end of the papers list

Today's Large Language Models Research Top Papers

Weekly Large Language Models Research Top Papers

Weekly Executive Briefing

Monday, November 3, 2025

Tuesday, November 4, 2025

Wednesday, November 5, 2025

Training Convolutional Neural Networks with the Forward-Forward algorithm

GeoLLaVA-8K: Scaling Remote-Sensing Multimodal Large Language Models to 8K Resolution

On Extending Direct Preference Optimization to Accommodate Ties

Repetitions are not all alike: distinct mechanisms sustain repetition in language models

MemSearcher: Training LLMs to Reason, Search and Manage Memory via End-to-End Reinforcement Learning

Language-Enhanced Generative Modeling for PET Synthesis from MRI and Blood Biomarkers

Beyond Contrastive Learning: Synthetic Data Enables List-wise Training with Multiple Levels of Relevance

ChartM$^3$: A Multi-Stage Code-Driven Pipeline for Constructing Multi-Dimensional and Multi-Step Visual Reasoning Data in Chart Comprehension

Multi-Personality Generation of LLMs at Decoding-time

TwT: Thinking without Tokens by Habitual Reasoning Distillation with Multi-Teachers' Guidance

An Evaluation of Interleaved Instruction Tuning on Semantic Reasoning Performance in an Audio MLLM

Identifying Aspects in Peer Reviews

IG-Pruning: Input-Guided Block Pruning for Large Language Models

How Teachers Can Use Large Language Models and Bloom's Taxonomy to Create Educational Quizzes

Path-Consistency with Prefix Enhancement for Efficient Inference in LLMs

Understanding and Optimizing Agentic Workflows via Shapley value

Can Visual Input Be Compressed? A Visual Token Compression Benchmark for Large Multimodal Models

Beyond the Link: Assessing LLMs' ability to Classify Political Content across Global Media

Beyond Single Embeddings: Capturing Diverse Targets with Multi-Query Retrieval

AyurParam: A State-of-the-Art Bilingual Language Model for Ayurveda

Visual Program Distillation with Template-Based Augmentation

Merging Continual Pretraining Models for Domain-Specialized LLMs: A Case Study in Finance

Evolutionary Machine Learning meets Self-Supervised Learning: a comprehensive survey

Controlling Performance and Budget of a Centralized Multi-agent LLM System with Reinforcement Learning

LTD-Bench: Evaluating Large Language Models by Letting Them Draw

CostBench: Evaluating Multi-Turn Cost-Optimal Planning and Adaptation in Dynamic Environments for LLM Tool-Use Agents

Prompting for Policy: Forecasting Macroeconomic Scenarios with Synthetic LLM Personas

Oolong: Evaluating Long Context Reasoning and Aggregation Capabilities

Adapting General-Purpose Foundation Models for X-ray Ptychography in Low-Data Regimes

Rethinking the Relationship between the Power Law and Hierarchical Structures