Skip to main content
Our Methodology

How We Build Trusted Intelligence

Every disease report on Kisho is assembled from authoritative data sources, synthesized by AI, validated through a multi-tier quality control pipeline, and monitored for ongoing accuracy. This page explains how.

Transparency by Design

Most health platforms tell you what they publish. Few explain how their content is created, checked, and maintained. We believe that in rare disease — where information is scarce and misinformation can be harmful — you deserve to see inside the machine.

This page describes our methodology in enough detail for regulatory review, clinical scrutiny, and enterprise due diligence. We share the architecture, the data sources, the safety rules, and the quantitative indicators. We do not share proprietary scoring formulas, prompt engineering details, or internal thresholds — those remain our competitive advantage.

We call this the glass floor: you can see deep enough to trust us, but not deep enough to replicate us.

From Data to Trusted Report

Every disease report passes through four stages before publication.

1

Knowledge Assembly

We assemble a comprehensive knowledge packet for each disease by pulling structured data from 10+ authoritative sources — MONDO, HPO, Orphanet, ClinicalTrials.gov, FDA, HGNC, PubMed, and more. This packet includes genes, phenotypes, prevalence, clinical trials, FDA-approved treatments, orphan drug designations, newborn screening status, and cross-references. No information is invented; every data point traces to a specific source.

2

AI Synthesis

AI synthesizes the knowledge packet into a patient-friendly report following consistent editorial guidelines. The AI does not search the internet or invent claims — it works exclusively from the assembled evidence packet. Reports are structured for clarity: overview, symptoms, causes, diagnosis, treatment, outlook, and research. The system adapts report format to data availability, ensuring even data-sparse diseases receive actionable guidance rather than a dead end.

3

Multi-Tier Quality Control

Before publication, every report passes through automated quality control with 15+ validation rules organized into three severity tiers. Critical safety violations trigger full regeneration. Clinical accuracy issues trigger targeted section regeneration. Editorial suggestions are logged for improvement. A circuit breaker quarantines reports that cannot pass validation after multiple attempts, flagging them for human review.

4

Publication & Monitoring

Published reports are continuously monitored for freshness. Domain-specific contracts define how quickly each data type should be refreshed — from hours for breaking news to months for prevalence data. When source data changes or content ages beyond its threshold, reports are automatically regenerated in the background. A separate AI quality review system periodically re-examines published content, cross-referencing claims against current database records.

Grounded in Authoritative Sources

Every claim on Kisho traces back to a named, verifiable data source. Each source is refreshed on its own schedule.

MONDO Ontology

Updated with releases

Disease classification backbone — 23,529+ diseases

HPO

Monthly

Clinical phenotypes with frequency, onset, and sex data

Orphanet

Quarterly

Prevalence estimates and epidemiological data

ClinicalTrials.gov

Weekly

Active trial counts per disease, phase, and location

FDA (Drugs@FDA)

Weekly

Approved treatments including gene and cell therapies

FDA (Orphan Drug)

Weekly

Orphan drug designations and approval status

HGNC

Reference database

Gene symbol validation — 44,748 approved symbols

PubMed / NIH

On demand

Research publications and citation enrichment

25+ News Sources

Multiple times daily

BioSpace, STAT, Fierce Pharma, FDA alerts, and more

Congress.gov / LegiScan

Daily

Federal and state rare disease legislation

Quality Assurance

Three-Tier Validation

Every generated report is checked against 15+ automated rules before it reaches you.

Critical Safety

Hard Stop

Detects patient safety risks: unauthorized gene associations, false FDA approval claims, incorrect mechanism of action statements, or dangerous treatment misinformation. Any critical violation triggers complete report regeneration. No exceptions.

Full regeneration

Clinical Accuracy

Targeted Fix

Identifies clinical accuracy issues within specific sections: missing variability language, unsupported temporal claims, or inconsistent inheritance patterns. These issues trigger targeted section regeneration rather than starting from scratch.

Section regeneration

Editorial Quality

Logged for Improvement

Catches editorial issues that do not affect safety or accuracy: word count drift, formatting inconsistencies, or stylistic preferences. These are logged and tracked but do not block publication.

Advisory only

If a report cannot pass critical validation after multiple attempts, a circuit breaker quarantines the content for manual human review. We would rather show nothing than show something wrong.

44,748

Approved gene symbols

>99%

Hallucination detection

Gene Validation Backed by HGNC

Every gene mentioned in a disease report is validated against the HUGO Gene Nomenclature Committee (HGNC) database of 44,748 approved human gene symbols and 58,538 known aliases. This eliminates gene hallucinations — a common failure mode for AI systems generating medical content — with greater than 99% detection accuracy and less than 1% false positive rate.

Disease-specific acronyms (like “ALS” for amyotrophic lateral sclerosis) are validated contextually against disease synonyms, preventing false positives while maintaining strict gene accuracy.

Evidence Base Indicators

Every report displays its evidence strength, so you always know how well-documented a disease is.

Evidence base: Strong

Based on extensive clinical, genetic, and treatment data from multiple authoritative sources. Diseases with deep research history, known genes, FDA-approved treatments, and established clinical phenotypes.

Typical for well-studied diseases like Duchenne muscular dystrophy or cystic fibrosis.

Evidence base: Developing

Reflects available clinical information, with some areas still under active research. The report draws from real data but may have gaps in certain dimensions — such as prevalence or phenotype frequency.

Common for diseases with active research pipelines but incomplete epidemiological data.

Evidence base: Early

Limited structured data is available. The report provides what is known from authoritative sources and clearly identifies information gaps. These reports prioritize actionable guidance — connecting patients to support organizations, relevant clinical trials, and next steps.

Applies to ultra-rare diseases where even basic epidemiological data may not yet exist.

Evidence base indicators are computed automatically from the structured data available for each disease — not from subjective judgment. When data changes, the indicator updates accordingly.

Human Oversight

Expert Disease Stewardship

Automated QC catches errors. Named experts provide judgment. Stewardship is the human layer that elevates content from “QC Validated” to “Expert Validated.”

Named Accountability

Unlike anonymous health content, stewards are verified professionals — clinicians, researchers, genetic counselors — who claim specific disease reports, review them for accuracy and tone, and stand behind their validation publicly.

Collaborative Teams

Stewards can build teams of qualified collaborators — editors who refine content and reviewers who vet community contributions. Ownership, editing, and review roles ensure clear accountability at every level.

AI Safety Valve

Even expert edits pass through automated safety checks before publication. The system catches clinically dangerous changes regardless of who makes them — protecting both patients and stewards.

Versioned Audit Trail

Every steward review, edit, and validation creates an auditable record — what changed, who changed it, when, and why. Version history enables full rollback and regulatory traceability.

Validation Status Progression

AI DraftQC ValidatedCommunity ReviewedExpert Validated

Every report displays its current validation status. You always know how verified the information is.

Are you a clinician, researcher, or genetic counselor? Your expertise can shape how patients understand their condition.

Citation Integrity

Disease reports are enriched with peer-reviewed literature from PubMed. Our citation pipeline uses a hybrid approach: AI generates targeted search queries, and we retrieve articles directly from the PubMed API. This means every PubMed ID in our reports comes directly from the National Library of Medicine — not from AI memory or generation.

Zero hallucination risk on citations. Every PubMed ID is retrieved directly from the PubMed E-utilities API, not generated by AI. The AI helps find the right papers; the database confirms they exist.

Citations are matched to specific claims within reports — for example, linking a treatment efficacy statement to the pivotal clinical trial that supports it. This claim-level citation linking goes beyond generic “further reading” lists to provide traceable evidence for individual assertions.

News Intelligence Pipeline

25+ sources. AI-classified. Disease-linked. Updated multiple times daily.

1

Aggregate

News is collected from 25+ authoritative sources across three tiers: structured APIs (FDA, NIH, ClinicalTrials.gov), curated RSS feeds (STAT, BioSpace, Fierce Pharma), and discovery search for emerging stories.

2

Classify

Every article is classified into one of five executive categories and scored for significance. Importance labels decay over time — preventing stale urgency from misleading readers.

3

Connect

Articles are automatically linked to diseases they mention using fuzzy matching against 15,964+ disease names and synonyms, with confidence scoring to prevent false associations.

Five Executive Categories

Pipeline & Approvals
Policy & Access
Funding & Deals
Science & Discovery
Community & Advocacy

Each article receives a category, an importance score, an executive summary, and entity extraction (companies, drugs, genes) — enabling filter, search, and alert capabilities.

Continuous Quality Monitoring

Publishing a report is not the end of our quality process — it is the beginning of ongoing monitoring.

Automated quality reviews

An independent AI system periodically re-examines published content, cross-referencing claims against current gene associations, prevalence data, and treatment records.

Freshness contracts

Each data domain has its own freshness policy. When data ages beyond its threshold, reports are automatically queued for regeneration.

News-triggered updates

When a news article is linked to a disease, the content system checks if the disease report needs updating — ensuring reports stay current with breaking developments.

Community feedback loop

Every disease page includes a feedback mechanism. Reported inaccuracies are reviewed and used to improve content and QC rules.

Complete audit trail

Every generation, validation, and update is logged with full provenance: what model was used, what constraints were applied, what safety checks passed, and what triggered the generation.

By the Numbers

Quantitative indicators of our methodology's rigor.

10+

Authoritative data sources

44,748

HGNC-validated gene symbols

15+

Automated validation rules

3

Validation severity tiers

25+

Curated news sources

>99%

Gene hallucination detection

<1%

Gene validation false positives

7

Report content sections

Trust Is Earned, Not Claimed

This methodology represents our current system as of February 2026. We continuously improve our pipeline, expand our data sources, and refine our validation rules. If you have questions about our methodology or need additional technical detail for regulatory review, we welcome the conversation.