Watson: Building an Accountability Layer for AI

2025-09-18T00:00:00Z · Updated: 2025-09-18T14:48:23Z

Watson: Not for Vibes, for Verification

The final pillar of the Oceanheart ecosystem is perhaps the most critical: Watson. In a world rushing to deploy Large Language Models (LLMs) into every facet of our lives, one question looms large: how do we ensure they're accurate, safe, and accountable, especially in high-stakes fields like healthcare?

The Readme for Watson states its mission in no uncertain terms: "Not for vibes, for verification." This isn't a tool for generating creative text; it's a rigorous clinical curation platform designed to catch AI mistakes before they can cause harm. It's built on the pragmatic, explicit acknowledgment that today's AIs are fallible. They will make mistakes, hallucinate facts, and have subtle omissions. Watson is the system designed to catch them.

Therapy with Receipts

The core of Watson is a simple yet powerful workflow. A clinician—the human expert—is presented with an AI-generated summary. Using a clean TipTap editor, they can directly correct the text. But more importantly, they can label specific issues: "hallucination," "missing risk," "imprecise language."

When they submit their review, Watson generates a detailed diff, showing exactly what was changed, both at the token and structural level. The entire corrected record, complete with labels and diffs, is then exportable as a research-ready JSONL dataset. A friend called this "therapy with receipts," and the name stuck. It creates an invaluable, auditable trail of both AI performance and human oversight.

This feedback loop is everything. It turns every clinical review into structured data that can be used to retrain, refine, and improve the underlying AI models. It's how you teach an LLM to stop making up uncles.

The Right Tools for a Critical Job

The tech stack for Watson was chosen with care. Django and its REST Framework (DRF) provide a robust, secure, and well-tested foundation for the backend. On the front end, I chose HTMX. In a critical verification system like this, I wanted to minimize the complexity of the client-side code. HTMX allows me to build a highly interactive, responsive interface without the overhead and potential surface area for bugs that comes with a heavy JavaScript framework. It keeps the focus on simplicity, maintainability, and auditability.

The ultimate vision for Watson is to build a future where AI output in clinical settings is never taken at face value. It is always checked, always annotated, and always audited by a human expert. Watson is the mechanism for that. It's not just a feature; it's an ethical imperative. It's how we build trust in these powerful new systems, not by pretending they're perfect, but by creating robust frameworks for human accountability.