Best Chatbot Arena Model in July
Predictions for 2025
IMO Gold
LLM gets IMO gold
Humanity's Last Exam
Predicted top score
CodeForces
Frontier Math
Predicted top score
Pokemon
LLM becomes a Pokemon Master with minimal assistance
OpenAI Claims AGI
OpenAI claims to have achieved AGI by the end of 2025
Hacking
Probability of AI compromising systems by end of 2025
Long Term Predictions
ARC-AGI Grand Prize before 2030
Chance of claiming the ARC-AGI grand prize
Turing Test (Long Bets) before 2030
Chance of passing Long Bets variation of the Turing Test
Millennium Prize before 2030
Chance of solving a million-dollar math problem
AI Blackmail
Risk of AI being used for automated blackmail by 2028
AI Romantic Companions
At least 1/1000 Americans talks weekly with one by 2028
Fully AI-generated Movie
AI generates a high-quality movie with a single prompt by 2028
Reliable Household Robot
Reliable general household robot available by 2030
Discontinuous Change in Economic Variables
Break in trend for GDP growth, GDP/capita, productivity, or unemployment by 2028
AI Politically Relevant
AI as big as a political issue as abortion by 2028
Zero-shot Human-level Game Performance
AI plays a random computer game at human-level by 2028
Self-play Human-level Game Performance
AI plays a random computer game as well as a human after self-play by 2028