Purpose and Demographic
This project creates a custom AI model to accurately transcribe the teachings of Garchen Rinpoche, a Tibetan Buddhist master. It aims to preserve and digitise his extensive audio archive, making his wisdom searchable and accessible to a wider audience through efficient and accurate transcription.
✦ Mission Statement
Build an STT model to accurately transcribe and preserve Garchen Rinpoche’s teachings for accessibility and searchability.
✦ Target Demographic
- Garchen Buddhist Institutes and Dharma Centres.
- Students and practitioners of Garchen Rinpoche
- Digital archivists working to preserve Tibetan oral teachings
- Scholars and translators working on Garchen Rinpoche’s lineage
- Accessibility advocates supporting the deaf and hard-of-hearing community
✦ Problem Statement
Garchen Rinpoche’s teachings are inaccessible due to a lack of text format. Standard speech-to-text (STT) fails to accurately transcribe his unique speech, creating an urgent need for a specialised STT solution.
Product Objectives
✦ Core Objectives
- Develop a Garchen specific STT model with Character Error Rate (CER) below 5%
- Build a repeatable, scalable workflow for transcribing Rinpoche’s past and future recordings
- Reduce manual transcription effort by 50% or more
- Enable near real-time subtitle generation for live teachings or events
✦ Non-Goals
-
This product won’t replace human transcribers entirely, but augment their work
-
The model won’t focus on general-purpose speech recognition
-
We won’t build translation capabilities into this version of the product
-
We won’t implement speaker diarization (identifying who is speaking) in the initial release
✦ Impact Areas
- Preservation of Garchen Rinpoche’s teachings in digital form
- Easier access for students, scholars, and archivists
- Improved inclusion for deaf/hard-of-hearing individuals
- Support for creating searchable audio/video archives
- Contribution to Tibetan linguistic research and cultural continuity
Example Use Cases
✦ Use Case: Garchen Institute Archivist
Digitize and transcribe hundreds of hours of legacy teachings from Garchen Rinpoche’s personal archive with minimal human correction required.
✦ Use Case: Translator
Extract clean transcripts from teaching sessions to create translated versions for international audiences.
✦ Use Case: Online Retreat Staff
Use the model to generate subtitles and transcripts of Garchen Rinpoche’s live online teachings in near real-time, supporting global accessibility.
Architectural Considerations
✦ Tech Stack
-
Programming Languages: Python
-
ML Frameworks: Hugging Face Transformers, PyTorch
-
Audio Processing: pyannote.audio or seliro
-
Base Models: Wav2Vec2 (300M parameters), Whisper (280M parameters)
-
Data Management: AWS S3, CSV files, DBeaver
-
Web Interface: Basic web application for model inference
✦ System Diagram
The system follows a five-phase workflow:
-
Cataloging audio/video sources
-
Filtering and splitting audio
-
Transcription and review
-
Data cleaning and organisation
-
Model training and evaluation
✦ Security & Privacy
-
All audio data stored in secure AWS S3 buckets with appropriate access controls
-
Transcriptions reviewed and approved before use in training
-
Personal or sensitive information will be flagged and potentially redacted during review process
-
User permissions system to control access to different parts of the platform
✦ Dependencies
- AWS S3 for audio storage
- Hugging Face Hub for model and dataset hosting
- GPU infrastructure for model training
- Pecha tools for transcription review and corrections which uses DBeaver database systems for transcription data management
- fast-antx library for aligning transcriptions
✦ Scalability & Maintenance
- Modular design allows adding new speakers without rebuilding entire system
- Training pipeline designed to accommodate incremental data additions
- Models versioned and stored on Hugging Face for reproducibility
- Regular evaluation against benchmark test sets to track performance over time
Participants
✦ Working Group Members
✦ Stakeholders
David Yeshe Nyima (Garchen STT)
✦ Point of Contact
Ganga Gyatso
Lhakpa Wangyal
Project Status
✦ Current Phase
- Preparing for the first training run of Garchen Rinpoche’s custom STT model
- Targeting 5 hours of clean, annotated training data to initiate training
- Dataset curation, segmentation, and transcription alignment is actively ongoing
- Benchmark subset design in progress to ensure well-distributed evaluation samples
✦ Milestones
Workflow and tooling setup completed
Public audio archive identified and segmented
Training data collection ongoing (goal: 5 hours within 4 weeks)
First model training run will begin after data goal is met
Benchmark test set preparation using diverse metadata samples
✦ Roadmap
Timeline | Milestone |
---|---|
Week 1–4 | Collect and annotate at least 5 hours of training data |
Week 5 | Launch first fine-tuning run for Garchen STT model |
Week 6–7 | Evaluate initial model on benchmark test sets |
Q3 2025 | Refine model and aim for <5% CER |
Q4 2025 | Release v1 public demo + continue expanding dataset |
Q1 2026 | Explore real-time transcription pipeline for live events |
Meeting Times
When does the group meet?
✦ Regular Schedule
✦ Meeting Notes
Link to running minutes, past discussions, or decisions.
What We’re Working On
We maintain a public task board with all active issues and discussions.