Skip to content

Agentic Development at the Frontier

Agenda

  • Environments are foundational for agentic AI development
  • Showcase tools, systems, and simulators needed for RL training
  • Demonstrate how environments integrate with RL training and LM post-training
  • Workshop jointly organized with Reflection AI, Hugging Face, Unsloth, and PyTorch Foundation

PyTorch RL

  • Building agents requires new infrastructure stack
    • Infrastructure: getting infra out of the way to focus on algorithms
    • Data: exposing models to as many environments and skills as possible
  • Key challenges in distributed RL systems
    • Orchestration - heterogeneous compute scheduling across different resource types
    • Programming model - torch distributed built for data parallel, not RL workflows
    • Performance - dual producer-consumer problems with replay buffer and parameter server bottlenecks
  • Monarch: PyTorch-native distributed programming framework
    • Actor-based system with imperative Python API
    • RDMA transfers as first-class citizens
    • Enables fault tolerance and heterogeneous scaling
  • Torch Forge built on Monarch
    • Control plane: service abstraction for routing and load balancing
    • Data plane: in-memory storage with automatic resharding
    • Reduces RL code from thousands of lines to simple, readable pseudocode
    • Supports both synchronous and asynchronous RL training

OpenEnv - Hugging Face

  • Problem: static datasets plateau in reward, models degrade when deployed to real world
  • Solution: unified, partially observable MDP interface for RL environments
    • Gymnasium-style API with step and reset functions
    • Can host environments locally or scale via Hugging Face Spaces API
  • DeepSeek-V3.2 trained on 1,800 distinct environments with 85,000+ complex prompts
  • Available environments
    • Coding environments for software engineering agents
    • Browser team environments
    • Wordle puzzles, games, and more
  • CLI tool works like uv package manager
    • open init creates project skeleton
    • open push deploys to Hugging Face Hub
  • Type-safe by design using Python data classes
    • Action, observation, and state classes prevent tensor mismatch errors
    • No external dependencies required

Unsloth PyTorch OpenEnv

  • Colab notebook demo: https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/OpenEnv_gpt_oss_(20B)_Reinforcement_Learning_2048_Game.ipynb
  • 2048 game reinforcement learning example
    • Goal: train language model to generate winning strategies
    • Reward good actions, penalize bad actions over many iterations
    • Only requires one prompt and one environment - no additional training data needed
  • Key RL principle: “more good, less bad” repeated over thousands of steps
  • Integration with Hugging Face TRL framework
  • Memory optimization: 50% reduction in memory usage with faster processing
  • Provides 2,000+ RL environments through Unsloth integration
  • Anti-cheating mechanisms to prevent strategy exploitation
  • GitHub: https://github.com/unslothai/unsloth

Reflection

  • Evolution from static benchmarks to agentic benchmarks requiring dynamic problem-solving
  • Task horizon trends
    • AI models completing hour-long tasks 50% of the time
    • Task length doubling every 7 months
    • Software engineering acceleration even faster
  • Required LLM capabilities for autonomous agents
    • Multi-step reasoning and planning
    • Tool use (already working well)
    • Self-correction and recovery from failed trajectories
    • Long horizon task coherence
  • Coding agents as primary success case
    • Deterministic, text-based, and verifiable
    • Easy verification through unit tests and tool-assisted feedback
    • Evolution from autocomplete to autonomous coding across multiple files
  • Pre-training vs post-training focus
    • Pre-training: reasoning ability and long horizon task solving
    • Post-training: environments provide both data and evaluation
    • Need diverse, challenging tasks at edge of current model capabilities
  • Reflection AI hiring for frontier open agentic models development