PRD: Tibetan to English Translation Workflow Pilot

PRD: Tibetan to English Translation Workflow Pilot

Owning Group Garchen Rinpoche SIG
Status Draft
GitHub Project Link to GitHub Project Board
Last Updated 2025-09-10

1. Overview

This project is a pilot program to establish and validate an efficient workflow for translating Garchen Rinpoche’s spoken Tibetan teachings into high-quality, accurate English text. The core problem this project addresses is making these valuable teachings accessible to a global, English-speaking audience.

This pilot focuses on a batch of six audio files that have already undergone a rigorous human transcription and review process. The primary experiment is to compare two translation pathways: (1) translating directly from the spoken Tibetan transcript to English and (2) translating from a polished, written-form Tibetan transcript to English. The outcome will determine the most effective and scalable workflow for all future translation efforts.

2. Strategy & Research

Links to foundational research and planning documents that inform this project.

3. Goals & Success Metrics

Primary Goals:

  1. Define an Optimal Translation Workflow: Determine whether the ‘Spoken-to-Written’ conversion step is necessary for producing high-quality English translations.
  2. Evaluate LLM Translation Quality: Assess the performance of LLMs (e.g., Gemini Pro) in translating nuanced Tibetan Buddhist teachings into English for both spoken and written source transcripts.
  3. Produce High-Quality Translations: Deliver finalized, reviewed English SRT files for the six pilot audio files.
  4. Establish a Scalable Process: Create a documented, repeatable workflow that can be applied to the larger corpus of Garchen Rinpoche’s teachings.

Success Metrics:

  • A definitive, data-backed decision from the review team on the preferred translation pathway (spoken-to-english vs. written-to-english).
  • Qualitative review scores from David’s team on the accuracy, readability, and fidelity of the translations from both pathways.
  • Successful generation of two sets of English SRT files for all six audio files.
  • A finalized and approved PRD and workflow document for future translation projects based on the pilot’s findings.

4. Timeline & Milestones

This schedule begins from the availability of the final human-reviewed transcripts. The work plan starts on September 11, 2025.

Activity Est. Duration Manpower 11/09 12/09 13/09 14/09 15/09 16/09 17/09 18/09 19/09
Prerequisites
1. Transcribing - -
Pilot Execution
2. Reviewing 1 Day 1
2.5 Khenpo Review 1 Day (TBD) 1
3. Spoken to Written 3 Days 1
4. Written to English 1 Day 1
5. Translation Review 2 Days (TBD) 1
Post-Pilot
Final Decision & Doc 1 Day -

5. Scope & Features

What is included (In Scope):

  • Processing for the six specified audio files (134-139).
  • Conversion of spoken Tibetan transcripts to written Tibetan.
  • Generation of English translations from both spoken and written transcripts into SRT format.
  • A comparative review of the two resulting English translation sets.
  • A final report and recommendation for the future workflow.

What is not included (Out of Scope):

  • Fine-tuning a custom Speech-to-Text (STT) model for Garchen Rinpoche’s voice (this was part of a previous project phase).
  • Transcription of the raw audio files (this is a prerequisite).
  • Translation of any content other than Garchen Rinpoche’s speech (e.g., audience questions, recitations).
  • Building a new API or user-facing application for translation.

6. Dependencies

  • Khenpo Review: The entire pilot is dependent on the final, authoritative Tibetan transcripts reviewed by the Khenpo.
  • David’s Review Team: Availability of the review team is critical for Phase 5 to evaluate the translations and provide the feedback needed to complete the pilot.
  • LLM APIs: Access to and performance of third-party LLM APIs (e.g., Google Gemini).

7. Acceptance Criteria

This pilot project will be considered “done” when:

  1. Two complete sets of English SRT files (one from the spoken transcript, one from the written) have been generated for all six pilot audio files.
  2. The review team has completed their qualitative analysis of both sets of translations.
  3. A final decision has been made and documented regarding the necessity of the ‘Spoken-to-Written’ conversion step for future projects.
  4. A final workflow document is created and approved by the stakeholders, outlining the process for future translation work.