Model Leaderboard
Compare AI models by capability and cost-effectiveness
Popular Comparisons
Programming & Development
54/253 modelsLiveCodeBench: Real-world coding tasks
Use Cases: Code completion, debugging, code review, script generation
Logical Reasoning
63/253 modelsHLE: Complex reasoning and problem-solving
Use Cases: Complex decision-making, multi-step analysis, logical reasoning
Knowledge Q&A
58/253 modelsMMLU Pro: Broad knowledge assessment
Use Cases: Expert Q&A, fact-checking, educational tutoring
Scientific Research
69/253 modelsGPQA: Graduate-level science questions
Use Cases: Academic research, scientific writing, experiment design
Mathematical Computation
49/253 modelsAIME: Competition-level math problems
Use Cases: Financial analysis, data computation, statistical reasoning
AI Agent
49/253 modelsTau2: Autonomous task completion
Use Cases: Automated workflows, multi-tool invocation, complex task decomposition
SciCode
53/253 modelsSciCode: Scientific coding challenges
Use Cases: Scientific computing, research code, data analysis scripts
Terminal
54/253 modelsTerminal-Bench: Command-line operations
Use Cases: Shell scripting, system administration, DevOps automation
Instruction
44/253 modelsIFEval: Instruction following accuracy
Use Cases: Precise task execution, format compliance, constraint adherence