We conducted an LLM Arena-style head-to-head comparison with over 300 human votes to rank current closed-source models for translation quality. The results confirm two major intuitions: Gemini 2.5 Flash zero-shot translation is the current best performer, and the translation quality of recent Claude updates appears to be declining relative to competitors.
Models from the Claude 4 series (in Chinese) and DeepSeek were among the worst performers, failing to make the top 10 rankings.
Full Rankings
Top 10 Models (Chinese Translation)
| Rank | Model Name | Score | 
|---|---|---|
| 1 | google:gemini-2.5-flash-thinking | 1091 | 
| 2 | anthropic:claude-3-5-sonnet-20241022 | 1050 | 
| 3 | google:gemini-2.5-flash | 1043 | 
| 4 | google:gemini-2.5-pro-thinking | 1037 | 
| 5 | anthropic:claude-3-7-sonnet-latest-thinking | 1023 | 
| 6 | anthropic:claude-3-7-sonnet-latest | 1017 | 
| 7 | google:gemini-1.5-flash | 1000 | 
| 8 | google:gemini-2.5-pro | 1000 | 
| 9 | google:gemini-1.5-pro | 1000 | 
| 10 | anthropic:claude-3-opus-20240229 | 981 | 
Top 10 Models (English Translation)
| Rank | Model Name | Score | 
|---|---|---|
| 1 | google:gemini-2.5-flash | 1097 | 
| 2 | anthropic:claude-3-7-sonnet-latest-thinking | 1095 | 
| 3 | google:gemini-2.5-pro-thinking | 1067 | 
| 4 | anthropic:claude-3-5-sonnet-20241022 | 1027 | 
| 5 | google:gemini-2.5-pro | 1008 | 
| 6 | google:gemini-1.5-pro | 1008 | 
| 7 | anthropic:claude-sonnet-4-20250514 | 1006 | 
| 8 | anthropic:claude-3-opus-20240229 | 1002 | 
| 9 | google:gemini-2.5-flash-thinking | 992 | 
| 10 | google:gemini-2.0-flash | 981 | 
Next Steps: Optimizing for Tibetan Buddhist Translation
We have identified Gemini 2.5 Flash as the definitive zero-shot baseline. We are now using new tools to compare its zero-shot output against various workflows and templates to find the best model and template combination for achieving high-fidelity Tibetan Buddhist translation.