On 10 March 2026, Singapore's Ministry of Health and Health Sciences Authority published AIHGle 2.0 — the most comprehensive national guideline yet for artificial intelligence in healthcare. Buried in Chapter 8.2, under a section on Generative AI, is a sentence that should stop every healthcare AI developer in their tracks.
"Perhaps the most promising recent technique is to compute 'certainty' of the outputs of LLMs. This works by examining the falloff or probabilities in the 'logits' in the output layer of the LLM."
The guideline goes further. It distinguishes between aleatoric uncertainty — noise inherent in data collection — and epistemic uncertainty, which it describes as shortcomings of the model itself, "often due to improper or incomplete summarisation of the knowledge which it is trying to represent."
In other words: Singapore's regulators are now explicitly saying that measuring whether an AI knows what it doesn't know is the most promising frontier in healthcare AI safety.
We agree. Because we've been building exactly this — and have the data to prove it works.
What AIHGle 2.0 Gets Right
Let's be clear: this is an exceptional document. It moves beyond the usual "be responsible with AI" language and gets specific about the real dangers.
The guideline identifies four amplified risks of Generative AI in healthcare: hallucination, where models present fictitious content as fact; undesirable content that may compromise clinical outcomes; data disclosure through inadvertent leaking of sensitive information; and vulnerability to adversarial prompts that manipulate model behaviour.
It also establishes a three-tier human oversight framework — human-in-the-loop (active control), human-over-the-loop (supervisory monitoring), and human-out-of-the-loop (autonomous) — and makes a critical declaration: autonomous AI should not make clinical decisions without human oversight.
Most importantly, the guideline explicitly calls for developers to implement epistemic uncertainty measurement as a core safety mechanism, and recommends techniques like Retrieval-Augmented Generation (RAG), red teaming, and source-citing architectures.
This is significant. It means the regulatory direction is no longer about "don't use AI in healthcare." It's about "measure what the AI doesn't know, and build systems that catch failure before it reaches the patient."
Where AIHGle 2.0 Stops — and Where EBP Begins
The guideline describes what needs to happen. It does not prescribe how. This is appropriate for a governance document — regulators should set the target, not mandate the engineering.
But someone has to build the engineering. And the gap between "compute certainty" and a working system that actually does it in clinical contexts is enormous.
The Epistemic Bridge Protocol (EBP) was designed specifically to close this gap.
AIHGle 2.0 recommends examining logit probabilities from a single LLM. EBP goes further: instead of trusting one model's self-reported confidence, it measures consensus across multiple independent models and uses the agreement signal (σ) to classify epistemic states — producing a verifiable, reproducible certainty score without requiring access to any model's internals.
This matters because logit-based confidence and actual factual reliability are only loosely correlated. A model can be confidently wrong. EBP's multi-model consensus avoids this failure mode entirely: if four independently-trained models agree on a clinical assessment, the probability that all four share the same hallucination is vanishingly small.
| Dimension | AIHGle 2.0 Recommends | EBP / eptim.health Delivers |
|---|---|---|
| Uncertainty measurement | "Compute certainty via logit probabilities" | Multi-model consensus signal (σ) — model-agnostic, no logit access required |
| Epistemic vs. aleatoric | Distinguishes the two conceptually | Formally modelled: P(hallucination) = (1−σ) × η, validated across 13,728 responses |
| Hallucination mitigation | RAG, red teaming, source citation | All of the above + multi-model verification achieving 0% emergency under-triage |
| Human oversight model | Three tiers: in-the-loop, over-the-loop, out-of-the-loop | Three epistemic states: EXPLORE → PROVISIONAL → COMMIT with mandatory doctor validation at COMMIT |
| Clinical incompleteness | Not addressed as distinct from hallucination | 4-way outcome taxonomy separating hallucination from clinically insufficient responses |
| Guardrails | "Rule-based constraints to filter inappropriate input and output" | 8 Clinical Consistency Rules (CR-1 → CR-8) catching physiologically impossible outputs |
| DTC safety | "Should not generate outputs requiring clinical expertise to interpret" | Epistemic state labelling ensures users always see confidence level; escalation to doctor built in |
| Continuous learning risk | Monitor for model drift post-deployment | Drift detection via consensus divergence — patented (USPTO provisional) |
| Validation scale | No benchmark specified | 13,728 responses, 4 models, 3 domains; 1,248 clinical vignettes replicating GPT-Health-Eval |
The Numbers That Matter
When we stress-tested EBP against 1,248 clinical vignettes from the GPT-Health-Eval benchmark — the same scenarios used to evaluate ChatGPT's clinical safety — the results were unambiguous.
under-triage
under-triage rate
unsafe outputs
In the broader EFT validation study across 13,728 AI responses, hallucination rates dropped from approximately 51.9% at low model consensus to 5.9% at perfect consensus — a clean monotonic relationship that held across medical, legal, and technical domains.
Critically, we also identified a class of failure that AIHGle 2.0 does not yet distinguish: clinical incompleteness. An AI that responds "administer IV fluids" for sepsis is not hallucinating — but without specifying the 30 mL/kg bolus within a 3-hour window, the response is clinically dangerous. In our dataset, incomplete responses were 7× more common than hallucinations. No existing safety framework catches them. EBP does.
The Timeline Tells the Story
The convergence between regulatory direction and our technical work isn't coincidental. It reflects a shared recognition that the fundamental problem of healthcare AI safety isn't about blocking bad outputs — it's about measuring epistemic reliability.
We didn't build EBP in response to AIHGle 2.0. We built it because the problem was obvious to anyone deploying AI in clinical contexts. The fact that Singapore's regulators have now arrived at the same conclusion independently is validation — not of our specific solution, but of the problem space itself.
What This Means for ASEAN Healthcare AI
Singapore's regulatory framework carries outsized influence in the region. When MOH and HSA set direction, Malaysia's NAIO, Thailand's MOPH, and Indonesia's regulatory bodies take notice. AIHGle 2.0 is explicitly aligned with ASEAN's Guide on AI Governance and Ethics, and its emphasis on epistemic uncertainty measurement signals where regional regulation is heading.
For healthcare AI developers across ASEAN, the message is clear: uncertainty quantification is no longer optional. It is becoming a regulatory expectation.
Three implications for the sector
1. "We use RAG" is no longer sufficient. AIHGle 2.0 treats RAG as one technique among many. The guideline's emphasis on computing certainty of outputs suggests regulators expect something more fundamental — an architectural commitment to measuring what the AI doesn't know.
2. Clinical incompleteness needs its own category. The guideline's distinction between hallucination and epistemic uncertainty is a start, but the 7× prevalence gap we found between incomplete and hallucinated responses suggests regulators will eventually need to address this explicitly.
3. Multi-model consensus is the natural architecture. If the goal is model-agnostic certainty measurement that doesn't depend on any single vendor's logit access, cross-model verification is the scalable solution. EBP demonstrates this is technically feasible and clinically effective today.
The Regulatory Sandbox Opportunity
AIHGle 2.0 introduces regulatory sandboxes for healthcare AI — a mechanism for testing AI solutions in real-world clinical settings with reduced licensing requirements. HSA launched the AI-MD Exemption Sandbox in February 2026, specifically for low-to-moderately low-risk AI medical devices.
This is precisely the kind of environment where EBP's capabilities can be demonstrated at scale. A system that can verifiably measure its own uncertainty, that escalates appropriately, and that achieves 0% emergency under-triage in benchmark testing is exactly what a regulatory sandbox should be evaluating.
The infrastructure is ready. The evidence base exists. The regulatory direction is aligned.
We welcome AIHGle 2.0 as a validation of the problem space that EBP was designed to address. Singapore has set the standard for what healthcare AI governance should demand. Now the question is: who will build the systems that meet it?
We believe we already have.
The Epistemic Bridge Protocol (EBP) and Epistemic Field Theory (EFT) are developed by Eptim.ai Sdn. Bhd. (Malaysia). The EBP paper is under review at Discover Artificial Intelligence (Springer Nature). The EFT manuscript is in preparation for Nature Communications. eptim.health is currently in beta.
For enquiries on partnership, regulatory consultation, or the EBP architecture, contact us at eptim.ai.
The future of healthcare AI is verifiable trust
If you're building, deploying, or regulating AI in healthcare — let's talk about how epistemic verification changes the equation.
Learn more about EBP