arxiv_cl 92% Match Research Paper RL Researchers,AI Researchers,Game Developers,Robotics Engineers 1 week ago

PARL: Prompt-based Agents for Reinforcement Learning

reinforcement-learning › game-playing

📄 Abstract

Abstract: Large language models (LLMs) have demonstrated high performance on tasks expressed in natural language, particularly in zero- or few-shot settings. These are typically framed as supervised (e.g., classification) or unsupervised (e.g., clustering) problems. However, limited work evaluates LLMs as agents in reinforcement learning (RL) tasks (e.g., playing games), where learning occurs through interaction with an environment and a reward system. While prior work focused on representing tasks that rely on a language representation, we study structured, non-linguistic reasoning - such as interpreting positions in a grid world. We therefore introduce PARL (Prompt-based Agent for Reinforcement Learning), a method that uses LLMs as RL agents through prompting, without any fine-tuning. PARL encodes actions, states, and rewards in the prompt, enabling the model to learn through trial-and-error interaction. We evaluate PARL on three standard RL tasks that do not entirely rely on natural language. We show that it can match or outperform traditional RL agents in simple environments by leveraging pretrained knowledge. However, we identify performance limitations in tasks that require complex mathematical operations or decoding states and actions.

Authors (2)

Yarik Menchaca Resendiz

Roman Klinger

Submitted

October 24, 2025

arXiv Category

cs.CL

arXiv PDF

Key Contributions

Introduces PARL (Prompt-based Agent for Reinforcement Learning), a method that enables LLMs to act as RL agents through prompting without fine-tuning. PARL encodes states, actions, and rewards into prompts, allowing LLMs to learn via trial-and-error interaction on structured, non-linguistic tasks.

Business Value

Enables the development of more versatile AI agents that can learn complex tasks through interaction, potentially reducing the need for extensive task-specific training data and fine-tuning.

Paper Metadata

Innovation Type

Methodology

Deployment Feasibility

Feasible for research and development of AI agents; deployment in real-world interactive systems requires further validation.

Limitations Addressed

Limited work evaluating LLMs as agents in reinforcement learning tasks, particularly those involving structured, non-linguistic reasoning, and the need for methods that avoid fine-tuning.

Performance Gains

Evaluated on three standard RL tasks, demonstrating the feasibility of the approach.

Technical Tags

reinforcement learninglarge language modelspromptingRL agentszero-shot learningfew-shot learningstructured reasoninggrid worldstrial-and-error learning

Research Topics

Reinforcement LearningArtificial IntelligenceMachine LearningLLM AgentsRobotics (potential)

Methods & Architectures

Prompt-based Reinforcement LearningLLM as RL agentEncoding states, actions, rewards in promptsTrial-and-error learning Large Language Models (LLMs)

Applications & Tasks

Game Playing Robotics Task Automation AI Agent Development Applying LLMs to RL tasksLearning through interactionStructured, non-linguistic reasoningAvoiding fine-tuning Playing gamesInterpreting states in grid worldsLearning policies via interaction

Related Fields

Machine LearningRoboticsGame AIAI Agents

Keywords

reinforcement learningLLMagentpromptingzero-shotfew-shotRLgame playinggrid worldtrial and errorstructured reasoningno fine-tuning

Academic Context

#Reinforcement Learning#Artificial Intelligence#Machine Learning#LLM Agents#Robotics (potential)

Commercial Potential

Potential Products

General-purpose RL agentsAI game playersRobotic control systems

Target Industries

GamingRoboticsAutomationAI Research

Use Case Examples

Training an AI to play complex strategy gamesDeveloping robots that can learn new tasks through interactionCreating AI agents for simulated environments

Competitive Edge

Offers a novel way to leverage LLMs for RL without fine-tuning, potentially simplifying agent development and enabling broader applicability.

Market Opportunity

Growing interest in LLM-based agents and RL applications.

Revenue Models

Licensing of RL agent frameworksdevelopment of AI game players.

Resource Requirements

Compute Needs

High, for running LLMs and RL training loops.

Data Requirements

Environments for RL tasks (e.g., game simulators, grid worlds).

Deployment Constraints

LLM inference latency can be a bottleneck for real-time RL applications.

Scalability

Scalability depends on the LLM size and the RL environment's complexity.

Regulatory Considerations

Ethical considerations for autonomous agents.

Production Readiness

Maturity Level

Research

Time to Market

Medium-term, for research prototypes.

View Full Paper Back to Papers