Bellman

An open-source, experimental framework exploring the integration of classical reinforcement learning methodologies with large language models. Named after Richard Bellman, this project investigates how bandits, tabular RL, and policy gradients can be systematically combined with LLMs to create more robust agentic systems.

About the Framework

Named after Richard Bellman—the pioneering mathematician whose optimality principle revolutionized reinforcement learning—the Bellman framework represents an educational experiment exploring the integration of classical RL methodologies with large language models. With its central question "How has the classical RL framework evolved with modern agents?", this open-source platform challenges the tendency to use LLMs as standalone agents by investigating how bandits, tabular RL, and policy gradients can be systematically combined with language models to create more robust agentic systems.

This experimental project emphasizes building to learn, allowing AI researchers and practitioners to discover which combinations of techniques actually help agents make better decisions across diverse environments.

The Foundational Layers

At the core of the Bellman experiment lies its integration division—components designed to test different methods of combining classical reinforcement learning techniques with the reasoning capabilities of large language models:

1. Bandit Integration Agents

Experimental testbeds for multi-armed bandit algorithms working alongside LLMs, exploring different approaches to balancing exploration and exploitation when language models suggest multiple potential actions.

2. Tabular RL Agents

Serve as the framework's structured memory components, testing approaches to maintaining and updating state-action value tables that can be referenced by language models.

3. Policy Gradient Agents

Explore how to effectively train neural policies that can either complement or constrain LLM outputs, testing methods for learning optimal behaviors through direct experience.

4. Sequential Processing Agents

Test approaches where classical RL components and language models operate in a pipeline, exploring different sequences from RL-first to LLM-first approaches.

5. Parallel Processing Agents

Implement experimental architectures where RL and LLM components operate simultaneously, testing different approaches to weighing or combining outputs.

6. Hierarchical Integration Agents

Explore methodologies for creating multi-level decision systems where different techniques operate at different levels of abstraction.

The Bellman Core

Value-Guided Reasoning

Tests approaches to constraining LLM outputs based on value functions, ensuring actions align with learned value estimates.

Dynamic Consistency Verification

Explores approaches to checking whether LLM-generated plans satisfy the Bellman equation, redirecting reasoning when inconsistencies are detected.

Temporal Abstraction Integration

Experiments with combining hierarchical RL approaches like options or skills with language model planning, breaking down tasks across multiple time scales.

Uncertainty-Aware Decision Optimization

Explores methodologies for explicitly representing and reasoning about uncertainty in hybrid systems, balancing exploration and exploitation.

The Bellman framework was designed with a clear educational philosophy: to build systems that help us learn what actually works when combining classical reinforcement learning with large language models. The project explicitly embraces experimentation and discovery rather than assuming LLMs alone represent the optimal approach to agentic AI.

GitHub: Bellman Framework Bellman Project Page Email: info@humanitarians.ai

Key Research Areas

RL-LLM Integration Architectures

Exploring sequential, parallel, and hierarchical approaches to combining reinforcement learning algorithms with language model reasoning.

Value-Guided Language Generation

Investigating methods to align LLM outputs with value functions learned through reinforcement learning experiences.

Cross-Technique Knowledge Transfer

Developing approaches for bidirectional knowledge sharing between statistical learning and language model components.

Hybrid System Evaluation

Creating benchmarks and metrics to measure the performance of integrated systems against pure RL and pure LLM approaches.

Implementation Features

Modular Architecture

Flexible component design that enables rapid experimentation with different integration approaches and techniques.

Standardized Environments

Consistent testing environments spanning simple bandit problems to complex sequential decision tasks for comparative evaluation.

Extensible Integration Interfaces

Well-defined APIs for connecting different RL algorithms with various language models and approaches.

Comprehensive Evaluation Tools

Metrics, visualizations, and analysis tools specifically designed for hybrid system assessment and comparison.

Get Started

Bellman provides a comprehensive framework for experimenting with hybrid RL-LLM systems. Explore the codebase, run example integrations, or contribute to this educational research project.

Bellman GitHub Project Website