Beyond Hard Constraints: Budget-Conditioned Reachability For Safe Offline Reinforcement Learning

Janaka Chathuranga Brahmanage, Akshat Kumar

Overview

Real-world applications of Reinforcement Learning (RL) must balance reward maximization with safety constraints. While safety reachability analysis is a promising alternative to unstable min–max optimization, most reachability-based methods address only hard safety constraints rather than cumulative cost constraints.

We propose Budget-Conditioned Reachability RL (BCRL), a novel offline safe RL algorithm that learns a safe policy from a fixed dataset without environment interaction by using dynamic budgets to enforce safety constraints without adversarial optimization.

Main Contributions

Budget-Conditioned Reachability: A framework that applies reachability analysis to continuous domains with cumulative cost constraints. It uses dynamic budgets to estimate persistently safe state–action sets.
Stabilized Learning: Enforces safety constraints natively without relying on unstable min–max or Lagrangian optimization.
Seamless Integration: Compatible with existing offline RL algorithms (IQL, XQL, SparseQL), requires no generative models or online rollouts, and generalizes to any budget constraint.
Empirical Success: Matches or outperforms state-of-the-art baselines on standard offline safe-RL benchmarks and real-world maritime navigation tasks.