AIPapers.ai - AI Research Papers Daily

Your AI Papers Research Assistant

Today's Large Language Models Research Top Papers

Wednesday, November 5, 2025

📊 Read Full Intelligence Reports:

Academic Research

Intelligence

Productization & Investment

Intelligence

TPS-Bench: Evaluating AI Agents' Tool Planning \& Scheduling Abilities in Compounding Tasks

Introduces TPS-Bench, a benchmark for evaluating AI agents' tool planning and scheduling for compounding tasks. Assesses LLM agents' ability to select and order tools for efficient real-world problem-solving.

ExplicitLM: Decoupling Knowledge from Parameters via Explicit Memory Banks

Proposes ExplicitLM, an architecture with explicit external memory banks for human-readable knowledge. Enables direct inspection and modification of knowledge, improving LLM interpretability and updateability.

Regularization Through Reasoning: Systematic Improvements in Language Model Classification via Explanation-Enhanced Fine-Tuning

Proposes regularization through reasoning by attaching explanations to labels during LLM fine-tuning. Achieves systematic improvements in classification performance and naturalness across diverse datasets.

Can LLMs subtract numbers?

Conducts a systematic study on LLM subtraction capabilities, revealing significantly lower accuracy compared to addition. Identifies systematic errors and proposes insights into LLM arithmetic limitations.

Repetitions are not all alike: distinct mechanisms sustain repetition in language models

Investigates distinct mechanisms behind LLM repetition, contrasting conditions eliciting repetitive loops. Reveals that repetitions arise from different underlying causes, offering insights into LLM behavior and training.

GeoLLaVA-8K: Scaling Remote-Sensing Multimodal Large Language Models to 8K Resolution

Introduces GeoLLaVA-8K, a multimodal LLM for remote sensing, trained on novel high-resolution datasets. Achieves state-of-the-art performance on VQA tasks, enabling detailed Earth observation analysis.

LTD-Bench: Evaluating Large Language Models by Letting Them Draw

Proposes LTD-Bench, a new benchmark for evaluating LLM spatial reasoning by requiring them to draw. Demonstrates current LLMs struggle with spatial tasks, highlighting a critical evaluation gap for physical world understanding.

IG-Pruning: Input-Guided Block Pruning for Large Language Models

Introduces IG-Pruning, an input-aware method for pruning LLM layers to improve efficiency. Achieves significant reductions in computation while maintaining performance across tasks, enabling practical LLM deployment.

Multi-Personality Generation of LLMs at Decoding-time

Proposes a decoding-time framework for multi-personality LLM generation without retraining. Achieves flexible control over multiple attributes, enhancing LLM adaptability and user experience.

Large Language Models are Unreliable for Cyber Threat Intelligence

Evaluates LLM reliability for Cyber Threat Intelligence, quantifying consistency and confidence. Finds LLMs are unreliable for CTI tasks, highlighting limitations in practical application and the need for robust evaluation.

Sort by:

Loading more papers...

📚 You've reached the end of the papers list

Today's Large Language Models Research Top Papers

Weekly Large Language Models Research Top Papers

Weekly Executive Briefing

Monday, November 3, 2025

Tuesday, November 4, 2025

Wednesday, November 5, 2025