Invited Talk: The Art of (Artificial) Reasoning¶
Era of Smarter Scaling
- Brute-force scaling era ending, smarter scaling beginning
- Computing growing but data not growing fast enough
- Three approaches to data saturation:
- Learn better/faster with limited data (human-like efficiency)
- Synthesize more data artificially
- Reason beyond what’s in training data
- 2025: Year of Large Reasoning Models (LRMs) vs Large Language Models
Reinforcement Learning Research Findings
- Mixed evidence on RL effectiveness in reasoning
- Papers show RL can improve performance but questions remain about true reasoning vs probability shifting
- Pass@1 performance improves but Pass@K performance may worsen after RL
- Models show homogeneity across different LLMs, especially after post-training
- ProRL (Prolonged RL) results:
- 1.5B parameter model pushed to compete with 7B models
- Key insight: Entropy management crucial - clipping boundaries matter significantly
- Goldilocks zone: Low entropy but not too low prevents collapse
- RL as Pre-training (RLP):
- Information gain reward for predicting next tokens with vs without thought
- Performance gains survive post-training (SFT + RL)
- Works even with controlled compute/fewer tokens
Synthetic Data Innovation
- Prismatic Synthesis approach challenges conventional wisdom
- Used weaker teacher model (32B vs largest available) intentionally
- Key innovations:
- Gradient-based data representation for diversity measurement
- G-Bandy score in gradient space correlates with out-of-distribution performance
- Aggressive filtering (70-90% of generated data removed)
- Fully synthetic problems and solutions
- Results: Outperformed models using 20x larger teacher models with zero human-labeled data
Democratizing AI Philosophy
- AI should be “of humans, by humans, for humans”
- Ownership: Reflects values of all humanity, not just few countries/companies
- Creation: Developed by people worldwide, not just those who can afford it
- Beneficiary: Serves all humans, not just some or AI serving AI
- Unconventional collaboration example: OpenThought project
- Multi-institutional team across universities and startups
- Achieved remarkable results through effortful SFT competing with RL models
- Current AI relies heavily on human intelligence and massive human annotation efforts
Open Research Questions
- Need new theories of intelligence (plural) - LLMs may be one approach among many
- How to reach “dark matter of human knowledge” that current data doesn’t cover
- Human brain uses light bulb energy vs massive compute requirements
- Small working memory might be architectural advantage vs million-token windows
- Robotics lacks internet data - requires different approaches entirely