Purpose and Demographic
This Speech-to-Text (STT) project focuses on developing a custom, high-accuracy transcription model tailored specifically for the voice of Garchen Rinpoche—a revered Tibetan Buddhist master. His extensive archive of oral teachings represents a rich cultural and spiritual heritage, yet remains largely inaccessible in text form due to limitations in manual transcription capacity and the challenges general-purpose STT models face with his speech.
By applying specialized AI modeling, this project aims to preserve, digitize, and make searchable Garchen Rinpoche’s spoken wisdom, improving access for scholars, practitioners, and future generations. Through a combination of fine-tuned machine learning and human review, the project reduces the time and cost of transcription while safeguarding the accuracy and integrity of these sacred teachings.
✦ Mission Statement
To build a high-quality, speaker-specific STT model that can accurately transcribe Garchen Rinpoche’s teachings, making them accessible, searchable, and preserved for generations to come.
✦ Target Demographic
- Garchen Buddhist Institutes and Dharma Centers
- Students and practitioners of Garchen Rinpoche
- Digital archivists working to preserve Tibetan oral teachings
- Scholars and translators working on Garchen Rinpoche’s lineage
- Accessibility advocates supporting the deaf and hard-of-hearing community
✦ Problem Statement
Much of Garchen Rinpoche’s spiritual legacy exists only in audio or video formats. These teachings are difficult to access, search, or translate without high-quality transcripts. Manual transcription is time-consuming and expensive, while general-purpose STT models fail to capture the nuances of Garchen Rinpoche’s speech patterns, intonation, and specialized terminology. A dedicated STT model is urgently needed to bridge this gap.
Product Objectives
✦ Core Objectives
- Develop a Garchen-specific STT model with Character Error Rate (CER) below 5%
- Build a repeatable, scalable workflow for transcribing Rinpoche’s past and future recordings
- Reduce manual transcription effort by 50% or more
- Enable near real-time subtitle generation for live teachings or events
✦ Non-Goals
-
This product won’t replace human transcribers entirely, but augment their work
-
The model won’t focus on general-purpose speech recognition
-
We won’t build translation capabilities into this version of the product
-
We won’t implement speaker diarization (identifying who is speaking) in the initial release
✦ Impact Areas
- Preservation of Garchen Rinpoche’s teachings in digital form
- Easier access for students, scholars, and archivists
- Improved inclusion for deaf/hard-of-hearing individuals
- Support for creating searchable audio/video archives
- Contribution to Tibetan linguistic research and cultural continuity
Example Use Cases
✦ Use Case: Garchen Institute Archivist
Digitize and transcribe hundreds of hours of legacy teachings from Garchen Rinpoche’s personal archive with minimal human correction required.
✦ Use Case: Translator
Extract clean transcripts from teaching sessions to create translated versions for international audiences.
✦ Use Case: Online Retreat Staff
Use the model to generate subtitles and transcripts of Garchen Rinpoche’s live online teachings in near real-time, supporting global accessibility.
Architectural Considerations
✦ Tech Stack
-
Programming Languages: Python
-
ML Frameworks: Hugging Face Transformers, PyTorch
-
Audio Processing: pyannote.audio or seliro
-
Base Models: Wav2Vec2 (300M parameters), Whisper (280M parameters)
-
Data Management: AWS S3, CSV files, DBeaver
-
Web Interface: Basic web application for model inference
✦ System Diagram
The system follows a five-phase workflow:
-
Cataloging audio/video sources
-
Filtering and splitting audio
-
Transcription and review
-
Data cleaning and organization
-
Model training and evaluation
✦ Security & Privacy
-
All audio data stored in secure AWS S3 buckets with appropriate access controls
-
Transcriptions reviewed and approved before use in training
-
Personal or sensitive information will be flagged and potentially redacted during review process
-
User permissions system to control access to different parts of the platform
✦ Dependencies
-
AWS S3 for audio storage
-
Hugging Face Hub for model and dataset hosting
-
GPU infrastructure for model training
-
Pecha tools for transcription review and corrections which uses DBeaver database systems for transcription data management
-
fast-antx library for aligning transcriptions
✦ Scalability & Maintenance
-
Modular design allows adding new speakers without rebuilding entire system
-
Training pipeline designed to accommodate incremental data additions
-
Models versioned and stored on Hugging Face for reproducibility
-
Regular evaluation against benchmark test sets to track performance over time
Participants
✦ Working Group Members
✦ Stakeholders
David Yeshe Nyima (Garchen STT)
✦ Point of Contact
Project Status
✦ Current Phase
- Preparing for the first training run of Garchen Rinpoche’s custom STT model
- Targeting 5 hours of clean, annotated training data to initiate training
- Dataset curation, segmentation, and transcription alignment is actively ongoing
- Benchmark subset design in progress to ensure well-distributed evaluation samples
✦ Milestones
Workflow and tooling setup completed
Public audio archive identified and segmented
Training data collection ongoing (goal: 5 hours within 4 weeks)
First model training run will begin after data goal is met
Benchmark test set preparation using diverse metadata samples
✦ Roadmap
| Timeline | Milestone |
|---|---|
| Week 1–4 | Collect and annotate at least 5 hours of training data |
| Week 5 | Launch first fine-tuning run for Garchen STT model |
| Week 6–7 | Evaluate initial model on benchmark test sets |
| Q3 2025 | Refine model and aim for <5% CER |
| Q4 2025 | Release v1 public demo + continue expanding dataset |
| Q1 2026 | Explore real-time transcription pipeline for live events |
Meeting Times
When does the group meet?
✦ Regular Schedule
E.g., Every Thursday at 5PM IST via Zoom
✦ Meeting Notes
Link to running minutes, past discussions, or decisions.
What We’re Working On
We maintain a public task board with all active issues and discussions.
