PRD - Manuscript & Text Cataloguing Tool

:compass: Purpose and Demographic

A structured, authority-aware cataloguing system for recording bibliographic metadata of Buddhist pechas and texts. The tool supports accurate, consistent, and interconnected metadata for use in digital preservation, scholarly research, and AI applications.

✦ Mission Statement

To build a reliable and scalable cataloguing tool that enables precise recording of bibliographic metadata—including title, author, lineage, volume info, physical condition, and intertextual relationships—while supporting authority linking with trusted sources like BDRC and Wikidata.

✦ Target Demographic

  • Buddhist scholars and textual historians
  • Metadata specialists and digital archivists
  • Monastic libraries and cultural preservation groups
  • OpenPecha’s internal text processing team
  • Contributors validating or enriching bibliographic records
  • Technologists building search, retrieval, and AI services on Buddhist texts

✦ Problem Statement

Many Buddhist manuscripts and digital texts remain uncatalogued, poorly catalogued, or inconsistently linked to authority sources. This hinders discoverability, limits interoperability, and weakens downstream applications such as AI-based translation, search, and summarization. A centralized, structured, and collaborative metadata tool is essential to fill this gap.


:bullseye: Product Objectives

✦ Core Objectives

  • :white_check_mark: Record complete bibliographic metadata for both digitized manuscripts and born-digital texts
  • :white_check_mark: Allow users to edit and enrich existing records
  • :white_check_mark: Support authority linking (e.g., BDRC IDs, Wikidata Q-numbers)
  • :white_check_mark: Provide structure for volume-level and collection-level metadata
  • :white_check_mark: Include fields for physical condition and notes for preservation workflows
  • :white_check_mark: Enable relationship mapping between derivative and related texts

✦ Non-Goals

  • :cross_mark: Will not support image-based transcription or OCR workflows
  • :cross_mark: Will not automatically infer metadata without user review
  • :cross_mark: Is not a public discovery interface (like a search portal)

✦ Impact Areas

  • :books: Improves data integrity and provenance across OpenPecha and partner systems
  • :link: Supports AI and RAG applications with high-quality metadata and source linkage
  • :person_in_lotus_position: Preserves the authenticity and structure of Buddhist manuscript traditions
  • :handshake: Facilitates collaboration with global archives and scholars through shared standards

:light_bulb: Example Use Cases

✦ Use Case: Lobsang – Archivist at a Monastery Library

  • Lobsang scans a physical pecha and uses the tool to create a cataloguing entry, including BDRC authority links and physical condition.
  • He connects multiple volumes in a series and flags some folios as damaged, which triggers preservation follow-up.

✦ Use Case: Dechen – Scholar Enriching Metadata for NLP Training

  • Dechen reviews inconsistently catalogued texts and updates lineage and author info with Wikidata references.
  • She maps a root text and its commentary to reflect citation relationships for use in translation alignment.

:building_construction: Architectural Considerations

✦ Tech Stack

  • Frontend: Vue.js or React
  • Backend: FastAPI (Python)
  • Database: PostgreSQL with support for graph-like relationships (e.g., via SQLAlchemy or Django ORM)
  • Authentication: Auth0 or OpenID
  • Storage: S3 for associated files, structured metadata in JSON or YAML formats

✦ System Diagram

Metadata Input âžť Authority Linking âžť Validation Layer âžť Metadata DB âžť Export to OpenPecha & Partners

✦ Security & Privacy

  • Role-based access for metadata creation, editing, and approval
  • Authority control audit logs
  • Secure storage and versioned changesets

✦ Dependencies

  • BDRC API for authority metadata
  • Wikidata SPARQL for author/lineage resolution
  • OpenPecha text ID system
  • Optional: Zotero-style reference plugins or metadata vocabularies

✦ Scalability & Maintenance

  • Extensible metadata schema
  • Admin dashboard for tracking uncatalogued or flagged records
  • Scheduled consistency checks
  • Export compatibility with JSON-LD or MARC-like formats for integration

:busts_in_silhouette: Participants

✦ Working Group Members

  • Tashi (Tech Lead) – Architecture & implementation
  • Sonam (Product Manager) – Requirements & workflows
  • Ngawang (Metadata Expert) – Metadata schema, BDRC integration
  • Lhamo (Frontend Developer) – UI/UX for metadata forms
  • Jampa (QA & Reviewer) – Metadata validation and error reporting

✦ Stakeholders

  • OpenPecha Ecosystem Team
  • BDRC – Partner archive for canonical IDs
  • Esukhia Digitization Unit – Text providers
  • Translation AI Team – Downstream users of metadata

✦ Point of Contact

  • Sonam (PM) – sonam@openpecha.org

:vertical_traffic_light: Project Status

✦ Current Phase

:yellow_circle: Planning + Wireframing

✦ Milestones

  • :white_check_mark: Schema Definition – Complete
  • :yellow_circle: UI/UX Prototype – In Progress
  • :soon_arrow: MVP Alpha – July 2025
  • :soon_arrow: Internal Review – August 2025
  • :soon_arrow: Beta Release – September 2025

✦ Roadmap

Month Task
June UI prototype, schema review
July MVP dev + authority integration
August Internal QA & test deployment
September Beta with feedback loop
October Public launch & contributor docs

:spiral_calendar: Meeting Times

✦ Regular Schedule

  • Weekly team check-ins: Wednesdays at 4:30PM IST
  • Metadata policy review: Biweekly Fridays, 5PM IST

✦ Meeting Notes

  • :link: View Notes on Notion

:hammer_and_wrench: What We’re Working On

We maintain a public task board with all active issues and discussions.

:right_arrow: View GitHub Project Board