1. Overview
What is this project? What problem does it solve for whom?
This project aims to significantly increase the digital footprint of authentic Buddhist knowledge and languages, with an initial focus on Tibetan. Due to a lack of digitized resources, two primary problems have emerged: 1) NLP researchers are unable to advance research and development for Buddhist languages, and 2) modern AI models (LLMs) lack access to authentic sources, leading to results that are often not based on verifiable Buddhist teachings.
This project solves these problems by systematically uploading core Tibetan Buddhist literature to Wikimedia platforms (like Wikisource, Wikipedia, and Wikidata). This creates a foundational, open-access, and verifiable digital knowledge base for two key audiences:
-
AI/NLP Researchers: Providing them with a rich, authenticated corpus for training and developing language models for Tibetan and other Buddhist languages.
-
General Public & AI Users: Enabling AI models to generate more accurate, authentic, and source-based answers to queries about Buddhism, thereby improving the quality of information available to everyone.
2. Goals & Success Metrics
What are the primary goals? How will we measure success?
Primary Goals:
-
Digitize & Preserve: Create a comprehensive digital library of core Tibetan Buddhist texts on open-access platforms to ensure their preservation and accessibility.
-
Build Community: Cultivate a sustainable, global community of volunteers and monastic scholars dedicated to contributing and maintaining this digital knowledge base.
-
Enable Research: Produce a high-quality, structured dataset that can serve as a cornerstone for NLP and AI research in Buddhist languages.
-
Improve AI Accuracy: Provide verifiable source material for LLMs, improving the authenticity and reliability of AI-generated content on Buddhist topics.
Success Metrics:
-
Content Volume:
-
Number of new texts successfully uploaded and proofread on Tibetan Wikisource.
-
Number of new and substantially improved Tibetan Wikipedia articles on Buddhist topics.
-
Number of new structured data entries on Wikidata related to Buddhist texts, figures, and concepts.
-
-
Community Engagement:
-
Number of new, active contributors onboarded from workshops and online outreach.
-
Number of attendees at wiki workshops and edit-a-thons.
-
Retention rate of new contributors after 3 months.
-
-
Project Impact:
-
Qualitative feedback from monastic partners and the academic community.
-
Long-term: Citation of the project’s Wikimedia resources in academic papers or use in NLP projects.
-
3. Timeline & Quarterly Milestones
A high-level schedule for the project, broken down by quarter. This should align with the main Project Roadmap.
-
Q2 2025 (Completed):
-
Initial community onboarding and training program launched.
-
Core group of initial contributors established.
-
-
Q3 2025 (Completed):
-
Milestone 1: Uploaded over 30 foundational texts to Tibetan Wikisource.
-
Milestone 2: Created and/or significantly expanded over 30 Tibetan Wikipedia articles.
-
Milestone 3: Attended Wikimania to network with the global Wikimedia community and build partnerships.
-
-
Q4 2025 (In Progress):
-
Milestone 1: Plan and execute three Wiki awareness workshops and edit-a-thons in major monasteries to recruit and train new editors.
-
Milestone 2: Refine and deploy automation tools for text formatting, cleanup, and upload processes to increase efficiency.
-
Milestone 3: Onboard a minimum of 15 new, active editors from the monastic workshops.
-
-
Target for Next Phase: Q1 2026
4. Scope & Features / Data Schema
What is included? What is not included?
IN SCOPE:
-
Uploading proofread Tibetan-language Buddhist texts to Tibetan Wikisource.
-
Creating and improving Tibetan Wikipedia articles on Buddhist concepts, texts, and historical figures.
-
Linking these resources through structured data on Wikidata (e.g., linking an author’s page to the texts they composed).
-
Developing automation scripts (bots) to assist with repetitive tasks like formatting, data import, and quality checks.
-
Conducting community outreach and training, specifically workshops and edit-a-thons in monastic institutions.
OUT OF SCOPE (for this phase):
-
Translation: The project is focused on digitizing original Tibetan texts, not translating them into other languages.
-
Building a New Platform: All work will be done on existing, open Wikimedia infrastructure (Wikipedia, Wikisource, Wikidata). We are not building a separate website or library.
-
Developing NLP Models: This project’s goal is to create the foundational dataset; we will not be building or training our own proprietary NLP models.
-
Digitizing non-Tibetan Texts: This phase is exclusively focused on the Tibetan Buddhist canon. Other Buddhist languages (like Pali or Sanskrit) may be considered in future phases.
5. Dependencies
What other groups, projects, or resources does this work depend on?
-
Wikimedia Foundation & Community: Heavily dependent on the stability and availability of Wikipedia, Wikisource, and Wikidata infrastructure. Requires alignment with community policies and guidelines.
-
Monastic Institutions: Critical for access to authentic texts, subject matter experts for verification, and for hosting community workshops. Their partnership is essential for project success.
-
Technical Volunteers/Team: The development of automation tools depends on contributors with technical skills in Python, API integration, and bot development.
6. Acceptance Criteria
How will we know when this project phase is “done”?
This phase of the project will be considered complete when the following criteria are met:
-
All three planned Q4 workshops have been successfully conducted, with attendance and new editor metrics reported.
-
The automation toolchain is documented, functional, and demonstrably reduces the manual effort for uploading texts by at least 30%.
-
The list of target texts for 2025 is fully uploaded and has passed the initial proofreading stage on Wikisource.
-
A stable, active community of at least 25 editors is established and making regular contributions.