arxiv_ml 70% Match Research Paper Reinforcement Learning Researchers,Robotics Engineers,Healthcare AI Developers 3 weeks ago

Active Measuring in Reinforcement Learning With Delayed Negative Effects

reinforcement-learning › robotics-rl

📄 Abstract

Abstract: Measuring states in reinforcement learning (RL) can be costly in real-world settings and may negatively influence future outcomes. We introduce the Actively Observable Markov Decision Process (AOMDP), where an agent not only selects control actions but also decides whether to measure the latent state. The measurement action reveals the true latent state but may have a negative delayed effect on the environment. We show that this reduced uncertainty may provably improve sample efficiency and increase the value of the optimal policy despite these costs. We formulate an AOMDP as a periodic partially observable MDP and propose an online RL algorithm based on belief states. To approximate the belief states, we further propose a sequential Monte Carlo method to jointly approximate the posterior of unknown static environment parameters and unobserved latent states. We evaluate the proposed algorithm in a digital health application, where the agent decides when to deliver digital interventions and when to assess users' health status through surveys.

Authors (5)

Daiqi Gao

Ziping Xu

Aseel Rawashdeh

Predrag Klasnja

Susan A. Murphy

Submitted

October 16, 2025

arXiv Category

cs.LG

arXiv PDF

Key Contributions

Introduces the Actively Observable Markov Decision Process (AOMDP) where agents can choose to measure the state, which reveals it but incurs delayed negative effects. Proposes an online RL algorithm using belief states and sequential Monte Carlo for approximation, showing provable improvements in sample efficiency and policy value despite measurement costs.

Business Value

Enables more efficient and effective decision-making in real-world scenarios where observing the state is costly or has side effects, such as in healthcare or industrial control systems.

Paper Metadata

Innovation Type

Algorithmic

Deployment Feasibility

Moderate. Requires careful modeling of measurement costs and effects, and computational resources for belief state approximation.

Limitations Addressed

Costly state observation in real-world RL settings and negative delayed effects of measurement actions.

Technical Tags

Reinforcement LearningMarkov Decision ProcessBelief StatesSequential Monte CarloOnline LearningDelayed EffectsState MeasurementSample EfficiencyDigital Health

Research Topics

Reinforcement Learning TheoryDecision Making Under UncertaintyState EstimationOnline AlgorithmsApplication of RL

Methods & Architectures

Actively Observable Markov Decision Process (AOMDP)Periodic Partially Observable MDP formulationOnline RL algorithm based on belief statesSequential Monte Carlo method Partially Observable MDP

Applications & Tasks

Digital Health Healthcare Decision making with costly state observationLearning with delayed negative effectsState estimation in dynamic environments Optimal policy learningState measurement optimizationParameter and state estimation

Related Fields

Control TheoryOperations ResearchProbabilistic Robotics

Keywords

Reinforcement LearningMarkov Decision ProcessState MeasurementDelayed EffectsUncertaintyBelief StatesOnline LearningSample EfficiencyDigital HealthDecision MakingControlEstimation

Academic Context

#Reinforcement Learning Theory#Decision Making Under Uncertainty#State Estimation#Online Algorithms#Application of RL

Commercial Potential

Potential Products

Intelligent control systemsPersonalized treatment recommendation systems

Target Industries

HealthcareManufacturingEnergy

Use Case Examples

Optimizing patient treatment deliveryAdaptive control in industrial processes

Competitive Edge

Offers a principled way to handle state observation costs and delayed negative effects, which are often simplified or ignored in standard RL formulations.

Market Opportunity

Growing market for AI in healthcare and industrial automation.

Revenue Models

Licensing of algorithmsconsulting services.

Resource Requirements

Compute Needs

Moderate to High (for belief state approximation)

Data Requirements

Requires simulation environments or real-world data with state transition dynamics and measurement costs.

Deployment Constraints

Accurate modeling of measurement costs and effects,Real-time computation for belief state updates

Scalability

Scalability depends on the complexity of the state space and the efficiency of the belief state approximation method.

Regulatory Considerations

High for healthcare applications (e.g.FDA approval).

Production Readiness

Maturity Level

Research

Time to Market

2-5 years

Patent Potential

Low

View Full Paper Back to Papers