Bellman
An open-source, experimental framework exploring the integration of classical reinforcement learning methodologies with large language models. Named after Richard Bellman, this project investigates how bandits, tabular RL, and policy gradients can be systematically combined with LLMs to create more robust agentic systems.
About the Framework
Named after Richard Bellman—the pioneering mathematician whose optimality principle revolutionized reinforcement learning—the Bellman framework represents an educational experiment exploring the integration of classical RL methodologies with large language models. With its central question "How has the classical RL framework evolved with modern agents?", this open-source platform challenges the tendency to use LLMs as standalone agents by investigating how bandits, tabular RL, and policy gradients can be systematically combined with language models to create more robust agentic systems.
This experimental project emphasizes building to learn, allowing AI researchers and practitioners to discover which combinations of techniques actually help agents make better decisions across diverse environments.
The Foundational Layers
At the core of the Bellman experiment lies its integration division—components designed to test different methods of combining classical reinforcement learning techniques with the reasoning capabilities of large language models:
1. Bandit Integration Agents
Experimental testbeds for multi-armed bandit algorithms working alongside LLMs, exploring different approaches to balancing exploration and exploitation when language models suggest multiple potential actions.
2. Tabular RL Agents
Serve as the framework's structured memory components, testing approaches to maintaining and updating state-action value tables that can be referenced by language models.
3. Policy Gradient Agents
Explore how to effectively train neural policies that can either complement or constrain LLM outputs, testing methods for learning optimal behaviors through direct experience.
4. Sequential Processing Agents
Test approaches where classical RL components and language models operate in a pipeline, exploring different sequences from RL-first to LLM-first approaches.
5. Parallel Processing Agents
Implement experimental architectures where RL and LLM components operate simultaneously, testing different approaches to weighing or combining outputs.
6. Hierarchical Integration Agents
Explore methodologies for creating multi-level decision systems where different techniques operate at different levels of abstraction.
The Bellman Core
Value-Guided Reasoning
Tests approaches to constraining LLM outputs based on value functions, ensuring actions align with learned value estimates.
Dynamic Consistency Verification
Explores approaches to checking whether LLM-generated plans satisfy the Bellman equation, redirecting reasoning when inconsistencies are detected.
Temporal Abstraction Integration
Experiments with combining hierarchical RL approaches like options or skills with language model planning, breaking down tasks across multiple time scales.
Uncertainty-Aware Decision Optimization
Explores methodologies for explicitly representing and reasoning about uncertainty in hybrid systems, balancing exploration and exploitation.
The Bellman framework was designed with a clear educational philosophy: to build systems that help us learn what actually works when combining classical reinforcement learning with large language models. The project explicitly embraces experimentation and discovery rather than assuming LLMs alone represent the optimal approach to agentic AI.
Key Research Areas
RL-LLM Integration Architectures
Exploring sequential, parallel, and hierarchical approaches to combining reinforcement learning algorithms with language model reasoning.
Value-Guided Language Generation
Investigating methods to align LLM outputs with value functions learned through reinforcement learning experiences.
Cross-Technique Knowledge Transfer
Developing approaches for bidirectional knowledge sharing between statistical learning and language model components.
Hybrid System Evaluation
Creating benchmarks and metrics to measure the performance of integrated systems against pure RL and pure LLM approaches.
Implementation Features
Modular Architecture
Flexible component design that enables rapid experimentation with different integration approaches and techniques.
Standardized Environments
Consistent testing environments spanning simple bandit problems to complex sequential decision tasks for comparative evaluation.
Extensible Integration Interfaces
Well-defined APIs for connecting different RL algorithms with various language models and approaches.
Comprehensive Evaluation Tools
Metrics, visualizations, and analysis tools specifically designed for hybrid system assessment and comparison.
Get Started
Bellman provides a comprehensive framework for experimenting with hybrid RL-LLM systems. Explore the codebase, run example integrations, or contribute to this educational research project.