Artificial intelligence is advancing rapidly. Large language models, autonomous agents, and recursive self-improvement are no longer science fiction. This raises what many researchers consider the most important question of the 21st century: how do we ensure that AI systems far more capable than humans remain beneficial?
The alignment problem — ensuring an AI's goals match human intentions — is technically unsolved. Key challenges include: the intelligence explosion (recursive self-improvement leading to rapid capability gain), Goodhart's Law (optimizing a proxy objective that diverges from the true goal), mesa-optimization (learned optimizers with their own emergent goals), and the coordination problem (multiple actors racing to deploy powerful AI without adequate safety measures).
These simulations model the core dynamics of AI risk using mathematical frameworks from economics, game theory, and decision theory. Explore how different assumptions about the nature of intelligence growth lead to radically different outcomes, and why the alignment problem is so difficult to solve.