AI Glossary

What is Humanity's Last Exam?

Overview

Humanity's Last Exam is a difficult expert-level benchmark for testing frontier AI systems across broad academic and professional knowledge. It matters because many standard benchmarks are saturated, so labs use harder exams like HLE to show whether models can answer questions that still challenge specialists.

Why it matters

As frontier models approach older benchmark ceilings, harder expert tests help separate memorized competence from robust reasoning.

Where it appears in AI research

Frontier model launch reports
AI evaluation research
Reasoning benchmark comparisons
Safety and capability discussions

Related terms

GPQA MMLU Agent Evaluation

Related DeepSignal articles

The Decoder·Matthias Bastian

1w ago

FeaturedOriginal

Meta's Muse Spark 1.1 API pricing squeezes OpenAI and Anthropic as the AI price war heats up

AI Summary

Meta's Muse Spark 1.1 API, priced at $1.25 per million input tokens, undercuts competitors like OpenAI and Anthropic, intensifying the AI price war. The model excels in orchestration and coding tasks, leading benchmarks such as Atlas and , while also promising significant cost efficiency for developers.

#Agent #AI Coding #Open Source #Funding

4