The Benefits of Custom Tokenization for Machine Translation
|
|
0
|
23
|
February 20, 2025
|
A Pipeline for Tibetan Language Text Clustering
|
|
0
|
34
|
February 18, 2025
|
Aggregating Publically Available Tibetan-English Parallel Corpora
|
|
0
|
39
|
February 8, 2025
|
Think and Translate: Enhancing Machine Translation with Thinking LLMs
|
|
0
|
63
|
January 23, 2025
|
Customizing Speech-to-Text: Fine-Tuning a Model for Kyabje Dilgo Kyentse Rinpoche’s Unique Voice
|
|
0
|
25
|
January 17, 2025
|
Customizing Speech-to-Text: Fine-Tuning a Model for Tai Situ Rinpoche’s Unique Voice
|
|
0
|
38
|
January 16, 2025
|
Sentence Length Proportions As Data Cleaning Heuristic
|
|
3
|
50
|
January 7, 2025
|
A custom ASR model to transcribe the speech of Kabjye Dilgo Khyentse Rinpoche
|
|
4
|
65
|
January 6, 2025
|
Validating Retrieval Augmented Translation With T5
|
|
0
|
29
|
January 5, 2025
|
Fine-Tuning a Multi-Dialect Speech Recognition Model for Tibetan Languages
|
|
0
|
54
|
January 3, 2025
|
Domain Tagging With Unsupervised Clustering for Retrieval Augmented Translation
|
|
0
|
26
|
December 25, 2024
|
A custom ASR model to transcribe the speech of Tai Situ Rinpoche
|
|
0
|
53
|
December 19, 2024
|
Translate the Bodhicharyavatara with RAT (Retrival-Augmented Translation)
|
|
1
|
70
|
December 20, 2024
|
Exploring BDRC’s Tibetan OCR: Training and Evaluation Repository Deep Dive
|
|
0
|
80
|
December 16, 2024
|
Guided by Guru Gemma2: Exploring Tibetan's Language Relatives
|
|
5
|
103
|
December 13, 2024
|
Validating Data Cleaning for Translation Model Training
|
|
0
|
44
|
December 7, 2024
|
Creating openpecha/cleaned_MT_v1.0.3
|
|
0
|
32
|
December 5, 2024
|
Training OCR Models for Tibetan Pecha: Challenges and Solutions
|
|
1
|
92
|
December 4, 2024
|
Enhancing Tibetan OCR with Fonts Created from Tibetan Pecha
|
|
0
|
32
|
November 13, 2024
|
Toward a Cleaner Translation Dataset
|
|
0
|
70
|
November 3, 2024
|
Hyperparameter Optimization for Topic Modeling
|
|
0
|
50
|
November 23, 2024
|
Topic Modeling Buddhist Material in the Translation Dataset
|
|
2
|
41
|
November 28, 2024
|
Modeling the Full Translation Dataset
|
|
2
|
38
|
November 28, 2024
|
A First Look at Topic Modeling for the Translation Dataset
|
|
2
|
54
|
November 28, 2024
|
མཐོང་མྱི།
|
|
0
|
16
|
September 10, 2024
|
ཟ་མྱི།
|
|
0
|
3
|
September 10, 2024
|
འགྲོོ་མྱི་
|
|
0
|
12
|
August 31, 2024
|
བསྡད་མྱི།
|
|
0
|
3
|
August 31, 2024
|
ད་གིན།
|
|
0
|
5
|
August 31, 2024
|
མེད་འགྲོ།
|
|
0
|
6
|
August 30, 2024
|