PRD: Tibetan to English Translation Workflow Pilot
| Owning Group | Garchen Rinpoche SIG |
| Status | Draft |
| GitHub Project | Link to GitHub Project Board |
| Last Updated | 2025-09-10 |
1. Overview
This project is a pilot program to establish and validate an efficient workflow for translating Garchen Rinpoche’s spoken Tibetan teachings into high-quality, accurate English text. The core problem this project addresses is making these valuable teachings accessible to a global, English-speaking audience.
This pilot focuses on a batch of six audio files that have already undergone a rigorous human transcription and review process. The primary experiment is to compare two translation pathways: (1) translating directly from the spoken Tibetan transcript to English and (2) translating from a polished, written-form Tibetan transcript to English. The outcome will determine the most effective and scalable workflow for all future translation efforts.
2. Strategy & Research
Links to foundational research and planning documents that inform this project.
3. Goals & Success Metrics
Primary Goals:
- Define an Optimal Translation Workflow: Determine whether the ‘Spoken-to-Written’ conversion step is necessary for producing high-quality English translations.
- Evaluate LLM Translation Quality: Assess the performance of LLMs (e.g., Gemini Pro) in translating nuanced Tibetan Buddhist teachings into English for both spoken and written source transcripts.
- Produce High-Quality Translations: Deliver finalized, reviewed English SRT files for the six pilot audio files.
- Establish a Scalable Process: Create a documented, repeatable workflow that can be applied to the larger corpus of Garchen Rinpoche’s teachings.
Success Metrics:
- A definitive, data-backed decision from the review team on the preferred translation pathway (spoken-to-english vs. written-to-english).
- Qualitative review scores from David’s team on the accuracy, readability, and fidelity of the translations from both pathways.
- Successful generation of two sets of English SRT files for all six audio files.
- A finalized and approved PRD and workflow document for future translation projects based on the pilot’s findings.
4. Timeline & Milestones
This schedule begins from the availability of the final human-reviewed transcripts. The work plan starts on September 11, 2025.
| Activity | Est. Duration | Manpower | 11/09 | 12/09 | 13/09 | 14/09 | 15/09 | 16/09 | 17/09 | 18/09 | 19/09 |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Prerequisites | |||||||||||
| 1. Transcribing | - | - | |||||||||
| Pilot Execution | |||||||||||
| 2. Reviewing | 1 Day | 1 | |||||||||
| 2.5 Khenpo Review | 1 Day (TBD) | 1 | |||||||||
| 3. Spoken to Written | 3 Days | 1 | |||||||||
| 4. Written to English | 1 Day | 1 | |||||||||
| 5. Translation Review | 2 Days (TBD) | 1 | |||||||||
| Post-Pilot | |||||||||||
| Final Decision & Doc | 1 Day | - |
5. Scope & Features
What is included (In Scope):
- Processing for the six specified audio files (134-139).
- Conversion of spoken Tibetan transcripts to written Tibetan.
- Generation of English translations from both spoken and written transcripts into SRT format.
- A comparative review of the two resulting English translation sets.
- A final report and recommendation for the future workflow.
What is not included (Out of Scope):
- Fine-tuning a custom Speech-to-Text (STT) model for Garchen Rinpoche’s voice (this was part of a previous project phase).
- Transcription of the raw audio files (this is a prerequisite).
- Translation of any content other than Garchen Rinpoche’s speech (e.g., audience questions, recitations).
- Building a new API or user-facing application for translation.
6. Dependencies
- Khenpo Review: The entire pilot is dependent on the final, authoritative Tibetan transcripts reviewed by the Khenpo.
- David’s Review Team: Availability of the review team is critical for Phase 5 to evaluate the translations and provide the feedback needed to complete the pilot.
- LLM APIs: Access to and performance of third-party LLM APIs (e.g., Google Gemini).
7. Acceptance Criteria
This pilot project will be considered “done” when:
- Two complete sets of English SRT files (one from the spoken transcript, one from the written) have been generated for all six pilot audio files.
- The review team has completed their qualitative analysis of both sets of translations.
- A final decision has been made and documented regarding the necessity of the ‘Spoken-to-Written’ conversion step for future projects.
- A final workflow document is created and approved by the stakeholders, outlining the process for future translation work.