AI Arenas

Discover platforms dedicated to testing, comparing, and ranking different AI models. These arenas provide insights into model performance through various benchmarks and evaluation methods.

Chatbot Arena

Compare and rank different LLMs through side-by-side conversations

HuggingFace Open LLM Leaderboard

Comprehensive leaderboard evaluating open-source language models on various benchmarks

AlpacaEval Leaderboard

Evaluates language models using automated benchmarks focused on instruction-following capabilities

MT-Bench Leaderboard

Multi-turn benchmark for evaluating chatbot performance across complex conversations

C-Eval Leaderboard

Comprehensive Chinese language model evaluation benchmark

Artificial Analysis Leaderboard

Comparison and ranking the performance of over 30 AI models (LLMs) across key metrics including quality, price, performance and speed (output speed - tokens per second & latency - TTFT), context window & others.

OpenRouter LLM Rankings

The sum of prompt and completion tokens per model, normalized using the GPT-4 tokenizer. Stats are updated every 10 minutes.