Custom Speech-to-Text Model for Garchen Rinpoche’s Teachings

Ganga_Gyatso · July 1, 2025, 11:38am

Purpose and Demographic

This Speech-to-Text (STT) project focuses on developing a custom, high-accuracy transcription model tailored specifically for the voice of Garchen Rinpoche—a revered Tibetan Buddhist master. His extensive archive of oral teachings represents a rich cultural and spiritual heritage, yet remains largely inaccessible in text form due to limitations in manual transcription capacity and the challenges general-purpose STT models face with his speech.

By applying specialized AI modeling, this project aims to preserve, digitize, and make searchable Garchen Rinpoche’s spoken wisdom, improving access for scholars, practitioners, and future generations. Through a combination of fine-tuned machine learning and human review, the project reduces the time and cost of transcription while safeguarding the accuracy and integrity of these sacred teachings.

✦ Mission Statement

To build a high-quality, speaker-specific STT model that can accurately transcribe Garchen Rinpoche’s teachings, making them accessible, searchable, and preserved for generations to come.

✦ Target Demographic

Garchen Buddhist Institutes and Dharma Centers
Students and practitioners of Garchen Rinpoche
Digital archivists working to preserve Tibetan oral teachings
Scholars and translators working on Garchen Rinpoche’s lineage
Accessibility advocates supporting the deaf and hard-of-hearing community

✦ Problem Statement

Much of Garchen Rinpoche’s spiritual legacy exists only in audio or video formats. These teachings are difficult to access, search, or translate without high-quality transcripts. Manual transcription is time-consuming and expensive, while general-purpose STT models fail to capture the nuances of Garchen Rinpoche’s speech patterns, intonation, and specialized terminology. A dedicated STT model is urgently needed to bridge this gap.

Product Objectives

✦ Core Objectives

Develop a Garchen-specific STT model with Character Error Rate (CER) below 5%
Build a repeatable, scalable workflow for transcribing Rinpoche’s past and future recordings
Reduce manual transcription effort by 50% or more
Enable near real-time subtitle generation for live teachings or events

✦ Non-Goals

This product won’t replace human transcribers entirely, but augment their work
The model won’t focus on general-purpose speech recognition
We won’t build translation capabilities into this version of the product
We won’t implement speaker diarization (identifying who is speaking) in the initial release

✦ Impact Areas

Preservation of Garchen Rinpoche’s teachings in digital form
Easier access for students, scholars, and archivists
Improved inclusion for deaf/hard-of-hearing individuals
Support for creating searchable audio/video archives
Contribution to Tibetan linguistic research and cultural continuity

Example Use Cases

✦ Use Case: Garchen Institute Archivist

Digitize and transcribe hundreds of hours of legacy teachings from Garchen Rinpoche’s personal archive with minimal human correction required.

✦ Use Case: Translator

Extract clean transcripts from teaching sessions to create translated versions for international audiences.

✦ Use Case: Online Retreat Staff

Use the model to generate subtitles and transcripts of Garchen Rinpoche’s live online teachings in near real-time, supporting global accessibility.

Architectural Considerations

✦ Tech Stack

Programming Languages: Python
ML Frameworks: Hugging Face Transformers, PyTorch
Audio Processing: pyannote.audio or seliro
Base Models: Wav2Vec2 (300M parameters), Whisper (280M parameters)
Data Management: AWS S3, CSV files, DBeaver
Web Interface: Basic web application for model inference

✦ System Diagram

The system follows a five-phase workflow:

Cataloging audio/video sources
Filtering and splitting audio
Transcription and review
Data cleaning and organization
Model training and evaluation

✦ Security & Privacy

All audio data stored in secure AWS S3 buckets with appropriate access controls
Transcriptions reviewed and approved before use in training
Personal or sensitive information will be flagged and potentially redacted during review process
User permissions system to control access to different parts of the platform

✦ Dependencies

AWS S3 for audio storage
Hugging Face Hub for model and dataset hosting
GPU infrastructure for model training
Pecha tools for transcription review and corrections which uses DBeaver database systems for transcription data management
fast-antx library for aligning transcriptions

✦ Scalability & Maintenance

Modular design allows adding new speakers without rebuilding entire system
Training pipeline designed to accommodate incremental data additions
Models versioned and stored on Hugging Face for reproducibility
Regular evaluation against benchmark test sets to track performance over time

Participants

✦ Working Group Members

working group members

✦ Stakeholders

David Yeshe Nyima (Garchen STT)

✦ Point of Contact

Ganga Gyatso

Project Status

✦ Current Phase

Preparing for the first training run of Garchen Rinpoche’s custom STT model
Targeting 5 hours of clean, annotated training data to initiate training
Dataset curation, segmentation, and transcription alignment is actively ongoing
Benchmark subset design in progress to ensure well-distributed evaluation samples

✦ Milestones

Workflow and tooling setup completed
Public audio archive identified and segmented
Training data collection ongoing (goal: 5 hours within 4 weeks)
First model training run will begin after data goal is met
Benchmark test set preparation using diverse metadata samples

✦ Roadmap

Timeline	Milestone
Week 1–4	Collect and annotate at least 5 hours of training data
Week 5	Launch first fine-tuning run for Garchen STT model
Week 6–7	Evaluate initial model on benchmark test sets
Q3 2025	Refine model and aim for <5% CER
Q4 2025	Release v1 public demo + continue expanding dataset
Q1 2026	Explore real-time transcription pipeline for live events

Meeting Times

When does the group meet?

✦ Regular Schedule

E.g., Every Thursday at 5PM IST via Zoom

✦ Meeting Notes

Link to running minutes, past discussions, or decisions.

What We’re Working On

We maintain a public task board with all active issues and discussions.

View GitHub Project Board

Topic		Replies	Views
🕉️ Garchen Rinpoche Speech SIG Proposal Garchen Rinpoche Speech SIG	0	65	July 8, 2025
📄 PRD: Garchen Rinpoche Speech Garchen Rinpoche Speech SIG	5	82	July 23, 2025
Custom Speech To Text (STT) Model PRD 🚀 WG	0	28	June 9, 2025
Technical Discovery Phase Report Garchen Rinpoche Speech SIG	0	13	July 22, 2025
Meeting: Next Steps for Garchen Rinpoche Project Garchen Rinpoche Speech SIG minutes	1	7	September 5, 2025