Overview
Real-world applications of Reinforcement Learning (RL) must balance reward maximization with safety constraints. While safety reachability analysis is a promising alternative to unstable min–max optimization, most reachability-based methods address only hard safety constraints rather than cumulative cost constraints.
We propose Budget-Conditioned Reachability RL (BCRL), a novel offline safe RL algorithm that learns a safe policy from a fixed dataset without environment interaction by using dynamic budgets to enforce safety constraints without adversarial optimization.
Main Contributions
- Budget-Conditioned Reachability: A framework that applies reachability analysis to continuous domains with cumulative cost constraints. It uses dynamic budgets to estimate persistently safe state–action sets.
- Stabilized Learning: Enforces safety constraints natively without relying on unstable min–max or Lagrangian optimization.
- Seamless Integration: Compatible with existing offline RL algorithms (IQL, XQL, SparseQL), requires no generative models or online rollouts, and generalizes to any budget constraint.
- Empirical Success: Matches or outperforms state-of-the-art baselines on standard offline safe-RL benchmarks and real-world maritime navigation tasks.