BDRC Etext Corpus Project
Mission: To publish a foundational dataset of Tibetan Buddhist Literature for AI and Research by collecting, OCRing, cataloging, cleaning, and aligning etexts.
Quick Actions
| Join our Discord channel | |
| Active Code Sprint / Active Data Sprint | |
| Excalidraw+ | |
| All minutes |
WG Meeting Calendar
- Annual: Strategic Planning (Board & Leads) to set the organizational vision and objectives.
- Bi-monthly: OKR & Epic Drafting Workshop; OKR and Epic Setting; Strategy & Roadmap Review
- Sprint (Bi-Weekly): A 14-day cycle comprising Community Hubs Review, Sprint Planning, Contributor Syncs, Demo & Vote, .
- Daily: Standup meetings
Strategic Roadmap (Active Epics)
Major initiatives we are prioritizing this quarter.
| Status | Epic Title | ||
|---|---|---|---|
| Establish Gold Standard Catalog* Collection of the most accurate (manually transcribed) digital versions of Buddhist texts.* | [View Spec] | ||
| Develop OCR Evaluation Frameworkdescription | View Spec | View Label | |
| Refine OCR Models & Training Data Frameworkdescription | View Spec | View Label | |
| Launch Modern Text Acquisitiondescription | View Spec | View Label | |
| Build Cataloging & Outlining Toolsdescription | View Spec | View Label | |
| Initiate Text Boundary Annotationdescription | View Spec | View Label |
Members and Roles
Voters:
- @Elie_Roux – BDRC
- @Trinley – OpenPecha / Dharmaduta
Non-voting advisors:
- @Tenz_Kuns - Full-stack & AI
- @gade - annotation specialist
Contributors:
-
@Elie_Roux - Tech Lead
-
Gabor
-
@Tashi_Tsering - OPS / AI engineer @tash
- Arihant - OPS / AI
-
@Kaldan - data / AI engineer
-
@Ganga_Gyatso - data / AI engineer
-
Tsethar (Cataloger)
-
Sonam_Gyaltso (Cataloger)
-
Tenzin_Norbu (Cataloger)
-
lhujam_tashi789 (Cataloger)
-
-
@Tashi_Dhondup - Data Collection
-
- Text Alignment
-
- Transcription
-
Contribution Zone
Ready to help? Pick a task based on your time availability.
New Contributors: Start with our “Good First Issues”.
Bug Hunters: Help us squash bugs in the Issue Tracker.
Developers: Review the Setup & Contribution Guide.
Library & Governance
Team Roster: Who is the Chair, Maintainers, and active members?
Meeting Minutes: Archive of past decisions (tag: wg-name+minutes).
Document Store: All PRDs, Technical Specs, and RFCs.