PRD of MQM evaluation tool

Kaldan · September 22, 2025, 6:03am


Owning Group	Pecha AI Studio App Team
Status	Complete
GitHub Project	Buddhist AI Studio WG · GitHub
Last Updated	2025-09-22

1. Overview

This project is the development of a web-based evaluation tool based on the Multidimensional Quality Metrics (MQM) framework. It is designed specifically for the Buddhist translator community to address the critical challenge of reliably and consistently evaluating the quality of machine-translated texts. Translators currently lack a standardized tool to assess AI model outputs, making it difficult to select the best models and improve translation workflows. This tool will provide a structured, guideline-based interface for human evaluators to score translations, moving beyond unreliable or biased metrics.

2. Goals & Success Metrics

Primary Goals:

Standardize Evaluation: To provide the Buddhist translator community with a consistent, reliable, and standardized tool for assessing the quality of machine translations based on the MQM framework.
Improve Efficiency: To streamline and accelerate the translation evaluation process compared to ad-hoc manual methods.
Facilitate Model Selection: To empower translators and organizations to make data-driven decisions when choosing which LLM or AI model to use for specific translation tasks.

Success Metrics:

Adoption: At least 30 active users or 5 translator groups are using the tool for evaluation projects within two quarters of launch.
User Satisfaction: Achieve a satisfaction score of 8/10 or higher from beta tester feedback surveys.
Efficiency Gains: Users report a minimum 25% reduction in time spent on evaluation tasks compared to their previous methods.
Task Completion Rate: 95% of users are able to successfully complete an evaluation from start to finish without needing direct support.

3. Timeline & Quarterly Milestones

A high-level schedule for the project, broken down by quarter. This aligns with the main Project Roadmap.

Q2 2025: Research & Feasibility (Completed)
Researched MQM guidelines and standards.
Formulated a strategy for converting MQM guidelines into a software tool.
Q3 2025: Prototyping & MVP
Milestone 1: Design and finalize UI/UX mockups for the evaluation interface.
Milestone 2: Define and implement the core data schema for projects, documents, and evaluations.
Milestone 3: Develop and deploy a Minimum Viable Product (MVP) with core features for internal review.
Q4 2025: Testing & Refinement
Milestone 4: Write and finalize a comprehensive user manual and documentation.
Milestone 5: Onboard a selected group of beta testers for product testing.
Milestone 6: Gather feedback, identify bugs, and implement refinements based on tester reports.
Target Launch: Q1 2026

4. Scope & Features / Data Schema

Included Features:

Project Management Dashboard: Users can create, view, and manage different evaluation projects.
Side-by-Side Text View: An interface displaying the source text and the target (machine-translated) text in parallel for easy comparison.
MQM Annotation System: Users can highlight segments of the target text and assign error categories based on a pre-defined MQM typology (e.g., Accuracy, Fluency, Terminology, Style).
Severity Levels: Each annotated error can be assigned a severity level (e.g., Minor, Major, Critical).
Automated MQM Scoring: The tool will automatically calculate a final quality score based on the number and severity of annotated errors.
Exportable Reports: Users can export a summary of the evaluation, including the final score and a breakdown of error types.

Not Included (Out of Scope for V1):

Multi-user collaboration on a single evaluation in real-time.

5. Dependencies

Buddhist Translation Experts: The project requires a dedicated group of experienced translators to act as beta testers and provide domain-specific feedback.
MQM Framework Guidelines: The tool’s logic is entirely dependent on the established MQM standards; any changes to these standards may require updates to the tool.
Hosting Infrastructure: A reliable cloud server environment (e.g., AWS, Google Cloud) will be required for deploying the web application.

6. Acceptance Criteria

The project will be considered “done” and ready for its V1 launch when:

All features listed within the “Included Features” section are implemented and function according to the specifications.
The beta testing program has been completed, and all identified “critical” and “major” bugs have been resolved.
The final MQM score calculation is verified and confirmed to be accurate according to the framework’s methodology.
The user manual is complete and available to all users.
The application is successfully deployed to a stable production environment and is accessible to the target community.

Topic	Replies	Views
📄 PRD: MT Evaluation Pipeline 📊 MT Evaluation SIG	15	July 12, 2025
PRD of Translation Editor ✨ AI Tools WG	16	September 22, 2025
Pecha AI Studio: Buddhist AI Evaluation Platform PRD ✨ AI Tools WG	12	September 22, 2025
🕉️ Proposal: MT Evaluation SIG 📊 MT Evaluation SIG	11	July 12, 2025
Strategic Meeting Minutes: 13 Jul 2025 - Automatic Translation Evaluation project for 84000 📊 MT Evaluation SIG minutes	11	July 14, 2025