LLM Translation Arena Results : Gemini 2.5 Flash Tops Rankings (29 th Sep, 2025)

We conducted an LLM Arena-style head-to-head comparison with over 300 human votes to rank current closed-source models for translation quality. The results confirm two major intuitions: Gemini 2.5 Flash zero-shot translation is the current best performer, and the translation quality of recent Claude updates appears to be declining relative to competitors.

Models from the Claude 4 series (in Chinese) and DeepSeek were among the worst performers, failing to make the top 10 rankings.

Full Rankings

Top 10 Models (Chinese Translation)

Rank Model Name Score
1 google:gemini-2.5-flash-thinking 1091
2 anthropic:claude-3-5-sonnet-20241022 1050
3 google:gemini-2.5-flash 1043
4 google:gemini-2.5-pro-thinking 1037
5 anthropic:claude-3-7-sonnet-latest-thinking 1023
6 anthropic:claude-3-7-sonnet-latest 1017
7 google:gemini-1.5-flash 1000
8 google:gemini-2.5-pro 1000
9 google:gemini-1.5-pro 1000
10 anthropic:claude-3-opus-20240229 981

Top 10 Models (English Translation)

Rank Model Name Score
1 google:gemini-2.5-flash 1097
2 anthropic:claude-3-7-sonnet-latest-thinking 1095
3 google:gemini-2.5-pro-thinking 1067
4 anthropic:claude-3-5-sonnet-20241022 1027
5 google:gemini-2.5-pro 1008
6 google:gemini-1.5-pro 1008
7 anthropic:claude-sonnet-4-20250514 1006
8 anthropic:claude-3-opus-20240229 1002
9 google:gemini-2.5-flash-thinking 992
10 google:gemini-2.0-flash 981

Next Steps: Optimizing for Tibetan Buddhist Translation

We have identified Gemini 2.5 Flash as the definitive zero-shot baseline. We are now using new tools to compare its zero-shot output against various workflows and templates to find the best model and template combination for achieving high-fidelity Tibetan Buddhist translation.

1 Like