AI Arenas
Discover platforms dedicated to testing, comparing, and ranking different AI models. These arenas provide insights into model performance through various benchmarks and evaluation methods.
Chatbot Arena
Compare and rank different LLMs through side-by-side conversations
HuggingFace Open LLM Leaderboard
Comprehensive leaderboard evaluating open-source language models on various benchmarks
AlpacaEval Leaderboard
Evaluates language models using automated benchmarks focused on instruction-following capabilities
MT-Bench Leaderboard
Multi-turn benchmark for evaluating chatbot performance across complex conversations
C-Eval Leaderboard
Comprehensive Chinese language model evaluation benchmark
Artificial Analysis Leaderboard
Comparison and ranking the performance of over 30 AI models (LLMs) across key metrics including quality, price, performance and speed (output speed - tokens per second & latency - TTFT), context window & others.
OpenRouter LLM Rankings
The sum of prompt and completion tokens per model, normalized using the GPT-4 tokenizer. Stats are updated every 10 minutes.