Benchmarks of Progress vs Benchmarks of Peril: The State of Dangerous Capability Evaluations
December 27, 2024•11 min read
An exploration of how we evaluate and benchmark AI capabilities, and the implications for safety and progress in artificial intelligence.