Autoarena
Open-source tool for automated head-to-head evaluation of GenAI systems using LLM judges.
Autoarena Introduction
What is Autoarena?
AutoArena is an open-source tool designed to automate head-to-head evaluations of GenAI systems using LLM judges. It allows users to quickly and accurately generate leaderboards comparing different LLMs, RAG setups, or prompt variations. Users can fine-tune custom judges to fit their specific needs. AutoArena facilitates trustworthy evaluation of LLMs, RAG systems, and generative AI applications through automated head-to-head judgement.
How to use Autoarena?
Install AutoArena locally using `pip install autoarena`. Define your inputs (user prompts) and outputs (model responses) from your Generative AI system. Then, use the tool to run head-to-head evaluations with LLM judges to rank your systems. Collaborate with team members on AutoArena Cloud at autoarena.app.
Why Choose Autoarena?
Choose this if you're lookin for a tool that really gets the job done without fuss. It's built to be straightforward and reliable, making your work easier and faster.
Autoarena Features
AI Developer Tools
- ✓Automated head-to-head evaluation using LLM judges
- ✓Leaderboard generation for comparing LLMs, RAG setups, and prompt variations
- ✓Fine-tuning of custom judges
- ✓Elo score and Confidence Interval computation
- ✓Integration with GitHub for CI/CD
- ✓Parallelization, randomization, and rate limiting handling
FAQ?
Pricing
Open-Source
Unrestricted access to the Apache-2.0 licensed AutoArena application. Intended for students, researchers, hobbyists, and non-profits. Self-hosted.
Professional
Everything in Open-Source. Team collaboration on the cloud-hosted autoarena.app. Access to fine-tuned judge models with >10% more accurate preference votes than base foundation model APIs. Two-week free trial.
Enterprise
Everything in Professional. Private on-premise deployment on your AWS, GCP, Azure, or internal infrastructure. SSO and enterprise access controls. Prioritized feature requests, bug fixes, and product roadmap collaboration. Enterprise invoice and payment options.


