Blazing fast inference for 100+ models
Instantly run popular and specialized models, including Llama3, Mixtral, and Stable Diffusion, optimized for peak latency, throughput, and context length. FireAttention, Fireworkβs custom CUDA kernel, serves models four times faster than vLLM without compromising quality.