Meeting: Next Steps for Garchen Rinpoche Project

Lhakpa_Wangyal · September 4, 2025, 2:11pm

Attendees: @DavidYesheNyima - Champion, Zimmerman-Project Manager (Garchen Rinpoche’s Archive), @Ganga_Gyatso - developer, Mr. Jake Moore @Tenzin_Gayche @Lhakpa_Wangyal - coordinator

Date: September 04, 2025

Agenda:

Next Steps for Garchen Rinpoche Project:
1. Translation
2. Script Conversion
Developing an efficient AI-powered pipeline to transcribe and translate Garchen Rinpoche’s teachings.
Challenges in processing spoken Tibetan.

Key Discussion Points

Mr. David Newman begun the meeting by introducing the Garchen Rinpoche’s Archive as a digital platform designed to host Garchen Rinpoche’s audio and video teachings, featuring a fully searchable library and a flexible video player with multilingual transcripts and subtitles; to achieve this, he propose an AI-powered pipeline that extracts audio, generates transcripts with Garchen Rinpoche’s speech-to-text model, cleans and formats the text, translates it into multiple languages.
With reference to the discussion on ‘Translating Spoken Tibetan’, the team noted that current AI models struggle with direct translation due to its fillers, pauses, informal and repetitive nature. They suggested that spoken Tibetan must first be converted into clean written Tibetan as a necessary step for accurate machine translation.
With reference to maintaining accuracy for translation, the team discussed that transcripts need to be cleaned for translation by preserving Garchen Rinpoche’s lively and unique speaking style, requiring a balance between linguistic formality for the translation model and accuracy to the original spoken teaching for viewers.
With reference to timestamp alignment, the team noted that after translation, the text must be re-aligned with the original audio timestamps in order to serve as accurate transcript.
The team suggested to build a high-quality benchmark dataset, refine the ASR model, and use human evaluation to select the best translation model. They recommended that long recordings will be managed through chunking, while timestamps will be preserved with simple tags for accurate re-alignment.

Action Items

Create a 1 to 2 hours benchmark dataset with human verified transcript and translation.
Use the benchmark to evaluate and compare translation models.
Experiment with different chunking methods for long recordings.
Test timestamp tagging for accurate re-alignment after translation.

Decisions Made

It was a general discussion, and no final decisions were made.

Topic		Replies	Views
Custom Speech-to-Text Model for Garchen Rinpoche’s Teachings Garchen Rinpoche Speech SIG	0	53	July 1, 2025
Meeting: Work progress update Garchen Rinpoche Speech SIG minutes	0	46	August 5, 2025
🕉️ Garchen Rinpoche Speech SIG Proposal Garchen Rinpoche Speech SIG	0	79	July 8, 2025
Meeting: Work progress update and general discussion Garchen Rinpoche Speech SIG minutes	0	15	September 3, 2025
📄 PRD: Garchen Rinpoche Speech Garchen Rinpoche Speech SIG	5	106	July 23, 2025

Meeting: Next Steps for Garchen Rinpoche Project

Key Discussion Points

Action Items

Decisions Made

Related topics