🕉️ Garchen Rinpoche Speech SIG Proposal

AI-powered transcription of Garchen Rinpoche’s speech.

SUMMARY
Version: 1.1
Purpose Train a custom Speech-to-Text model and its training dataset to accurately transcribe Garchen Rinpoche’s oral teachings, making them accessible and searchable.
Champion(s) @DavidYesheNyima @Gade
Communication - Discord Channel - Github Team - Google Calendar
Documentation - PRD (Requirements) - Github Project Board - Visuals - Meeting Minutes

Proposal Details

1. Problem Statement / Motivation

Train a custom Speech-to-Text (STT) model which could automatically turn Rinpoche’s spoken teachings into subtitle-style transcripts (WebVTT). This will help preserve, search, and translate his teachings and making them accessible and searchable in multiple languages.

Creating accurate transcripts isn’t easy due to following reasons;

Rinpoche’s unique voice: His age, speech style, and Tibetan dialect make it hard for machines to understand and even for the professional and experienced transcribers.

Tibetan language issues: There aren’t many training materials available for Tibetan, especially for religious content.

Audio problems: Some recordings are noisy or unclear.

Special vocabulary: Buddhist terms used by rinpoche are complex and unfamiliar to regular speech software.

2. Previous work

  1. Audio Analysis: organised and analysed Rinpoche’s audio files to understand their scope, quality, and content.
  2. Model Selection: After testing three options, a general Tibetan model was chosen as our base because it surprisingly outperformed specialised religious speech models.
  3. Data Collection: Created a custom training dataset by transcribing over five hours of Rinpoche’s teachings to capture his unique speech.
  4. Custom Training: Fine-tuned the selected model with this custom data, which significantly reduced its error rate and improved transcription accuracy.
  5. Proven Efficiency: A final test showed that using the fine-tuned model saved human transcribers 38% of their time, while the basic model sometimes made the process slower.

3.Scope

  • In Scope:
    • Ideal Goal: 5% error rate in the text.
    • Realistic Goal: Around 15% error rate, which is still very useful
    • Main Focus: Make sure the transcript is good enough for translation and searching
    • Detecting only Rinpoche’s speech
    • Training and improving the speech model
    • Creating subtitle files with time stamps
    • Reducing background noise and fillers
    • Custom STT model for GR
  • Out of Scope:
    • Translating the teachings
    • Transcribing other speakers

3. Potential Deliverables

  • A custom model that makes less than 15% mistakes on Rinpoche’s clean speech
  • Subtitle-style transcripts that are easy to read and need very little correction
  • A semi-automated system that helps with future transcriptions
  • A public tool where users can upload audio and get automatic subtitles

4. Team

Members:

Annotators:

  • Kunchok Gawa @kunchok73kunchok73kunchok73kunchok73kunchok73kunchok73kunchok73kunchok73

  • @KarmatsepKarmatsepKarmatsepKarmatsep

    Karm@Ka@Karmatsepma@Karsepak@Kar Tsepak @Kar@jamluv227atsep@ja@jamluv227luv227k7

  • Ja@jamluv227jamluv227pa Lobsang @jamluv227

  • Kalsang Thardoe @K-Thardoe

SIG Meeting Calendar

Related SIGs and Groups

Useful Links