Through our work with Melong, we’ve found that breaking down machine translation into smaller, reasoning-driven tasks significantly enhances output quality compared to direct translation. Interestingly, this method mirrors how human translators approach their work, making the process more intuitive and aligned with human thought patterns. Encouraging large language models (LLMs) to “think before translating” has emerged as a promising strategy.
A compelling demonstration of this concept was presented by Hyung Won at OpenAI. He illustrated reasoning-driven translation using a corrupted Korean sentence. The corruption involved the addition of unnecessary consonants, which created confusion for AI but remained decipherable to native speakers. While GPT-4 struggled with the task, OpenAI’s newer O1 model excelled by employing a step-by-step approach: first deciphering the corrupted text, reasoning through the necessary corrections, and then translating the corrected version. This structured process produced an accurate translation, underscoring how reasoning can tackle even complex translation challenges.
The advantages of this approach extend beyond accuracy. By integrating reasoning into translation, we gain visibility into the model’s thought process. This interpretability is invaluable, offering insights into how the model understands and processes language—a feature our users consistently value.
For those interested in how i applied this method, detailed documentation is available at OpenPecha’s project board.
If you want try out the new model, here’s the link: DEMO. Give it a go!
Before you dive in, though, keep in mind that this is a proof of concept. There are still some issues to iron out, and like every LLM, it might occasionally start repeating itself. If that happens, just click the stop button.
Results:
We used BLEURT as our primary metric for testing as well as ranking, and from that, here are the results:
- TPO outperforms Claude.
- For Claude direct, we used a sample prompt similar to TPO. (Please translate the following …)
- For Claude technical, we used a prompt asking Claude to act as an expert Buddhist translator.
- Lastly, for Claude COT, we instructed it to act as an expert Buddhist translator with chain-of-thought reasoning.
BLEU is also considered here as it shows whether the technical words are included, which is crucial for domain-specific translations.
We compared the win rate of each model:
In conclusion, we have found that using TPO for translation offers its own unique advantages. Compared to approaches without reasoning, TPO currently performs slightly less effectively. This aligns with findings discussed in the TPO paper, as we only implemented two iterations of DPO during our experiments. With additional iterations of DPO, we anticipate that TPO will outperform non-reasoning approaches.
Another promising direction is the exploration of online DPO, particularly with the use of models like Claude as the reward model. This approach holds significant potential for enhancing results by continuously refining the model’s reasoning and translation capabilities. Furthermore, training a dedicated reward model specifically tailored for this task could open up new possibilities, making RLHF (Reinforcement Learning with Human Feedback) an even more compelling option. At present, the reasoning generated by the system is relatively concise. However, by extending the reasoning steps, there is a strong likelihood of achieving even greater improvements in performance and accuracy.
Looking ahead, I believe reasoning agents could represent the ultimate solution. An agent capable of reasoning can not only retrieve translations and commentaries but also perform searches across relevant sources, consult dictionaries, and take optimal steps to ensure high-quality translations. This ability to reason, search, and integrate information would be incredibly valuable in advancing translation systems to a whole new level.