PRD: Tibetan to English Translation Workflow Pilot

Ganga_Gyatso · September 10, 2025, 11:08am

PRD: Tibetan to English Translation Workflow Pilot


Owning Group	Garchen Rinpoche SIG
Status	Draft
GitHub Project	Link to GitHub Project Board
Last Updated	2025-09-10

1. Overview

This project is a pilot program to establish and validate an efficient workflow for translating Garchen Rinpoche’s spoken Tibetan teachings into high-quality, accurate English text. The core problem this project addresses is making these valuable teachings accessible to a global, English-speaking audience.

This pilot focuses on a batch of six audio files that have already undergone a rigorous human transcription and review process. The primary experiment is to compare two translation pathways: (1) translating directly from the spoken Tibetan transcript to English and (2) translating from a polished, written-form Tibetan transcript to English. The outcome will determine the most effective and scalable workflow for all future translation efforts.

2. Strategy & Research

Links to foundational research and planning documents that inform this project.

3. Goals & Success Metrics

Primary Goals:

Define an Optimal Translation Workflow: Determine whether the ‘Spoken-to-Written’ conversion step is necessary for producing high-quality English translations.
Evaluate LLM Translation Quality: Assess the performance of LLMs (e.g., Gemini Pro) in translating nuanced Tibetan Buddhist teachings into English for both spoken and written source transcripts.
Produce High-Quality Translations: Deliver finalized, reviewed English SRT files for the six pilot audio files.
Establish a Scalable Process: Create a documented, repeatable workflow that can be applied to the larger corpus of Garchen Rinpoche’s teachings.

Success Metrics:

A definitive, data-backed decision from the review team on the preferred translation pathway (spoken-to-english vs. written-to-english).
Qualitative review scores from David’s team on the accuracy, readability, and fidelity of the translations from both pathways.
Successful generation of two sets of English SRT files for all six audio files.
A finalized and approved PRD and workflow document for future translation projects based on the pilot’s findings.

4. Timeline & Milestones

This schedule begins from the availability of the final human-reviewed transcripts. The work plan starts on September 11, 2025.

Activity	Est. Duration	Manpower
Prerequisites
1. Transcribing	-	-
Pilot Execution
2. Reviewing	1 Day	1
2.5 Khenpo Review	1 Day (TBD)	1
3. Spoken to Written	3 Days	1
4. Written to English	1 Day	1
5. Translation Review	2 Days (TBD)	1
Post-Pilot
Final Decision & Doc	1 Day	-

5. Scope & Features

What is included (In Scope):

Processing for the six specified audio files (134-139).
Conversion of spoken Tibetan transcripts to written Tibetan.
Generation of English translations from both spoken and written transcripts into SRT format.
A comparative review of the two resulting English translation sets.
A final report and recommendation for the future workflow.

What is not included (Out of Scope):

Fine-tuning a custom Speech-to-Text (STT) model for Garchen Rinpoche’s voice (this was part of a previous project phase).
Transcription of the raw audio files (this is a prerequisite).
Translation of any content other than Garchen Rinpoche’s speech (e.g., audience questions, recitations).
Building a new API or user-facing application for translation.

6. Dependencies

Khenpo Review: The entire pilot is dependent on the final, authoritative Tibetan transcripts reviewed by the Khenpo.
David’s Review Team: Availability of the review team is critical for Phase 5 to evaluate the translations and provide the feedback needed to complete the pilot.
LLM APIs: Access to and performance of third-party LLM APIs (e.g., Google Gemini).

7. Acceptance Criteria

This pilot project will be considered “done” when:

Two complete sets of English SRT files (one from the spoken transcript, one from the written) have been generated for all six pilot audio files.
The review team has completed their qualitative analysis of both sets of translations.
A final decision has been made and documented regarding the necessity of the ‘Spoken-to-Written’ conversion step for future projects.
A final workflow document is created and approved by the stakeholders, outlining the process for future translation work.

Topic		Replies	Views
Meeting: Next Steps for Garchen Rinpoche Project Garchen Rinpoche Speech SIG minutes	0	27	September 4, 2025
📄 PRD: Garchen Rinpoche Speech Garchen Rinpoche Speech SIG	5	110	July 23, 2025
🕉️ Garchen Rinpoche Speech SIG Proposal Garchen Rinpoche Speech SIG	0	81	July 8, 2025
Meeting: Next Steps and planning for Garchen Rinpoche Project Garchen Rinpoche Speech SIG minutes	0	23	September 9, 2025
Custom Speech-to-Text Model for Garchen Rinpoche’s Teachings Garchen Rinpoche Speech SIG	0	60	July 1, 2025

PRD: Tibetan to English Translation Workflow Pilot

PRD: Tibetan to English Translation Workflow Pilot

1. Overview

2. Strategy & Research

3. Goals & Success Metrics

4. Timeline & Milestones

5. Scope & Features

6. Dependencies

7. Acceptance Criteria

Related topics