Menu

Today's Large Language Models Research Top Papers

Wednesday, November 5, 2025
Introduces TPS-Bench, a benchmark for evaluating AI agents' tool planning and scheduling for compounding tasks. Assesses LLM agents' ability to select and order tools for efficient real-world problem-solving.
Proposes ExplicitLM, an architecture with explicit external memory banks for human-readable knowledge. Enables direct inspection and modification of knowledge, improving LLM interpretability and updateability.
Proposes regularization through reasoning by attaching explanations to labels during LLM fine-tuning. Achieves systematic improvements in classification performance and naturalness across diverse datasets.
Conducts a systematic study on LLM subtraction capabilities, revealing significantly lower accuracy compared to addition. Identifies systematic errors and proposes insights into LLM arithmetic limitations.
Investigates distinct mechanisms behind LLM repetition, contrasting conditions eliciting repetitive loops. Reveals that repetitions arise from different underlying causes, offering insights into LLM behavior and training.
Introduces GeoLLaVA-8K, a multimodal LLM for remote sensing, trained on novel high-resolution datasets. Achieves state-of-the-art performance on VQA tasks, enabling detailed Earth observation analysis.
Proposes LTD-Bench, a new benchmark for evaluating LLM spatial reasoning by requiring them to draw. Demonstrates current LLMs struggle with spatial tasks, highlighting a critical evaluation gap for physical world understanding.
Introduces IG-Pruning, an input-aware method for pruning LLM layers to improve efficiency. Achieves significant reductions in computation while maintaining performance across tasks, enabling practical LLM deployment.
Proposes a decoding-time framework for multi-personality LLM generation without retraining. Achieves flexible control over multiple attributes, enhancing LLM adaptability and user experience.
Evaluates LLM reliability for Cyber Threat Intelligence, quantifying consistency and confidence. Finds LLMs are unreliable for CTI tasks, highlighting limitations in practical application and the need for robust evaluation.
Sort by:

Loading more papers...

📚 You've reached the end of the papers list