PRD - Manuscript & Text Cataloguing Tool

Kaldan · June 20, 2025, 10:27am

Purpose and Demographic

A structured, authority-aware cataloguing system for recording bibliographic metadata of Buddhist pechas and texts. The tool supports accurate, consistent, and interconnected metadata for use in digital preservation, scholarly research, and AI applications.

✦ Mission Statement

To build a reliable and scalable cataloguing tool that enables precise recording of bibliographic metadata—including title, author, lineage, volume info, physical condition, and intertextual relationships—while supporting authority linking with trusted sources like BDRC and Wikidata.

✦ Target Demographic

Buddhist scholars and textual historians
Metadata specialists and digital archivists
Monastic libraries and cultural preservation groups
OpenPecha’s internal text processing team
Contributors validating or enriching bibliographic records
Technologists building search, retrieval, and AI services on Buddhist texts

✦ Problem Statement

Many Buddhist manuscripts and digital texts remain uncatalogued, poorly catalogued, or inconsistently linked to authority sources. This hinders discoverability, limits interoperability, and weakens downstream applications such as AI-based translation, search, and summarization. A centralized, structured, and collaborative metadata tool is essential to fill this gap.

Product Objectives

✦ Core Objectives

Record complete bibliographic metadata for both digitized manuscripts and born-digital texts
Allow users to edit and enrich existing records
Support authority linking (e.g., BDRC IDs, Wikidata Q-numbers)
Provide structure for volume-level and collection-level metadata
Include fields for physical condition and notes for preservation workflows
Enable relationship mapping between derivative and related texts

✦ Non-Goals

Will not support image-based transcription or OCR workflows
Will not automatically infer metadata without user review
Is not a public discovery interface (like a search portal)

✦ Impact Areas

Improves data integrity and provenance across OpenPecha and partner systems
Supports AI and RAG applications with high-quality metadata and source linkage
Preserves the authenticity and structure of Buddhist manuscript traditions
Facilitates collaboration with global archives and scholars through shared standards

Example Use Cases

✦ Use Case: Lobsang – Archivist at a Monastery Library

Lobsang scans a physical pecha and uses the tool to create a cataloguing entry, including BDRC authority links and physical condition.
He connects multiple volumes in a series and flags some folios as damaged, which triggers preservation follow-up.

✦ Use Case: Dechen – Scholar Enriching Metadata for NLP Training

Dechen reviews inconsistently catalogued texts and updates lineage and author info with Wikidata references.
She maps a root text and its commentary to reflect citation relationships for use in translation alignment.

Architectural Considerations

✦ Tech Stack

Frontend: Vue.js or React
Backend: FastAPI (Python)
Database: PostgreSQL with support for graph-like relationships (e.g., via SQLAlchemy or Django ORM)
Authentication: Auth0 or OpenID
Storage: S3 for associated files, structured metadata in JSON or YAML formats

✦ System Diagram

Metadata Input ➝ Authority Linking ➝ Validation Layer ➝ Metadata DB ➝ Export to OpenPecha & Partners

✦ Security & Privacy

Role-based access for metadata creation, editing, and approval
Authority control audit logs
Secure storage and versioned changesets

✦ Dependencies

BDRC API for authority metadata
Wikidata SPARQL for author/lineage resolution
OpenPecha text ID system
Optional: Zotero-style reference plugins or metadata vocabularies

✦ Scalability & Maintenance

Extensible metadata schema
Admin dashboard for tracking uncatalogued or flagged records
Scheduled consistency checks
Export compatibility with JSON-LD or MARC-like formats for integration

Participants

✦ Working Group Members

Tashi (Tech Lead) – Architecture & implementation
Sonam (Product Manager) – Requirements & workflows
Ngawang (Metadata Expert) – Metadata schema, BDRC integration
Lhamo (Frontend Developer) – UI/UX for metadata forms
Jampa (QA & Reviewer) – Metadata validation and error reporting

✦ Stakeholders

OpenPecha Ecosystem Team
BDRC – Partner archive for canonical IDs
Esukhia Digitization Unit – Text providers
Translation AI Team – Downstream users of metadata

✦ Point of Contact

Sonam (PM) – sonam@openpecha.org

Project Status

✦ Current Phase

Planning + Wireframing

✦ Milestones

Schema Definition – Complete
UI/UX Prototype – In Progress
MVP Alpha – July 2025
Internal Review – August 2025
Beta Release – September 2025

✦ Roadmap

Month	Task
June	UI prototype, schema review
July	MVP dev + authority integration
August	Internal QA & test deployment
September	Beta with feedback loop
October	Public launch & contributor docs

Meeting Times

✦ Regular Schedule

Weekly team check-ins: Wednesdays at 4:30PM IST
Metadata policy review: Biweekly Fridays, 5PM IST

✦ Meeting Notes

View Notes on Notion

What We’re Working On

We maintain a public task board with all active issues and discussions.

View GitHub Project Board

Topic	Replies	Views
PRD - Pecha API Product Requirements Document 🚀 WG སྡེ་ཚན།	39	June 10, 2025
PRD of Pecha Server and API 💁‍♂️ Pecha API WG prd	42	September 22, 2025
🕉️ Homepage 💁‍♂️ Pecha API WG	42	June 5, 2025
PRD - Translation Editor 🚀 WG སྡེ་ཚན།	44	June 12, 2025
PRD - Critical & Collated Edition Editor 🚀 WG སྡེ་ཚན།	13	June 20, 2025