Budget Conditioned Action Pruning for Safe Offline Reinforcement Learning

Abstract: Reinforcement learning (RL) has achieved remarkable success, yet real-world applications often require balancing both optimality and safety, which is challenging. Existing safe offline RL methods, such as Lagrangian approaches, struggle with optimizing a min-max objective, and require separate policies for different safety thresholds, limiting their generalization. To address this, we propose Budget Conditioned Action Pruning (BCAP), a new safe offline RL algorithm. The BCAP introduces a step-wise, time-varying budget that guides both unsafe action pruning and value estimation, resulting in cautious yet adaptive policies. Our method integrates seamlessly with existing offline RL algorithms such as IQL and SparseQL, and does not query any out-of-distribution actions not in the offline dataset. Additionally, BCAP does not require retraining for each new cost budget; it can generalize to different possible budgets. Extensive experiments on standard safe offline RL benchmarks show that BCAP consistently improves both safety and performance over state-of-the-art baselines.

Project Details Coming Soon…!