Power Platform AI Week Day 6: Human-in-the-Loop Approvals with AI Confidence Scoring

  • avatar
    Admin Content
  • Dec 04, 2025

  • 39

Power Platform AI Week Day 6: Human-in-the-Loop Approvals with AI Confidence Scoring

This article explains how to combine Power Platform’s AI capabilities (AI Builder, Copilot/Intelligent Approvals and Power Automate) with human reviewers so that routine approvals can be automated while uncertain or risky items are routed to people. You’ll get the why, the how, practical patterns for thresholds and routing, governance considerations, and a simple implementation blueprint you can adapt. Key Microsoft docs and community guidance are used as source material.


Introduction — Why human-in-the-loop (HITL) matters for approvals

Organizations adopt automation to speed decisions, cut manual work, and reduce human error. But approvals frequently carry risk: contracts, invoices, purchase orders, HR actions and compliance checks all have nuance that pure automation may mis-handle. A practical middle path is HITL: let AI handle routine, high-confidence items and route the uncertain ones to humans for review. This preserves speed while keeping oversight where it matters.

HITL approval flows are not just a safety net — they’re a way to operationalize trust. When AI systems provide a confidence score for each prediction or extraction, makers can define deterministic routing rules based on that score. High-confidence outputs flow automatically; mid-confidence outputs go to a human for quick validation; low-confidence outputs follow an exception path. Microsoft’s AI Builder and Power Automate natively expose confidence values you can use in flows, making this pattern straightforward to implement.

Beyond routing, the HITL pattern creates useful telemetry: which items were auto-approved, which needed human review, and how humans corrected the model. That telemetry feeds a feedback loop so your models improve over time — turning a one-off automation pilot into a continuously-improving capability. Microsoft documents explicit feedback-loop support for document models so you can capture reviewer edits and feed them back into retraining.

Finally, HITL is about governance and auditability as much as accuracy. Approvals touch financial, legal, and compliance domains; embedding humans where risk is higher and recording the model’s confidence and decision rationale creates an auditable trail that stakeholders and auditors will want. Later sections cover how to capture those audit traces cleanly.


How AI confidence scoring works in Power Platform (practical view)

AI Builder and related Power Platform AI features typically return, along with predictions, a numeric confidence (or probability) for each predicted value or classification. For example, a document-processing model that extracts “Invoice Total” will usually include a confidence percentage indicating how sure the model is about that extraction. You can read and use those confidence values directly in cloud flows. This makes confidence values first-class routing signals in approval workflows.

Confidence scores are model-specific — they depend on the model type and training. They’re not raw “truth” but useful heuristics: a high number means the model’s internal scoring function considers the output reliable given what it has seen before. Because scoring methods differ between model types, makers should validate what score ranges mean for their dataset (e.g., determine empirically whether a 0.85 score truly corresponds to acceptable accuracy for your use case). The Microsoft docs recommend using quick tests and cloud-flow predictions to inspect confidence behavior before building production routing rules.

Confidence scores work at multiple granularities. For document processing models you often get a global confidence and per-field or per-cell confidences. That allows targeted human review: you might auto-approve an invoice when the total is high-confidence but still flag the vendor name when its confidence is low. Use per-field thresholds where the business impact differs between fields.

Remember that confidence doesn’t capture everything. It’s blind to business context that wasn’t encoded in training data (e.g., suspicious vendor patterns, unusual payment terms). Combine confidence scoring with business rules (amount limits, vendor blacklists, policy checks) and, where available, model explanations or rationale produced by Intelligent Approvals to form a safer decision surface.


Designing HITL approval flows in Power Automate — patterns and thresholds

Start with a simple tiered routing pattern. Many organizations adopt a three-tier approach:

 

  1. Auto-approve: confidence ≥ high threshold and passes business rules.
  2. Human review: confidence between high and low thresholds (the gray band).
  3. Manual escalation: confidence ≤ low threshold or failed business rules — route to an expert or exception queue.

 

This pattern balances speed and safety. Set thresholds based on empirical validation: run a sample set of documents through the model, capture confidences, and measure real error rates at different cutoffs. That way your “high” threshold reflects acceptable error tolerance for your business. Several community writeups and Microsoft guidance recommend this empirical, metrics-driven thresholding approach.

Use the confidence score as a discriminator in your cloud flow: after the AI Builder action (e.g., Extract information from documents), add Condition steps that check the confidence value(s). For per-field logic, build composite conditions — for example, auto-approve only if Total_Confidence > 0.90 AND Vendor_Confidence > 0.85. For complexity or contextual rules, encapsulate checks in a Power Automate child flow or an Azure function to keep the main flow readable.

Make the human review step lightweight. Present extracted values with the confidence next to each field, and give reviewers quick actions: approve, edit value, escalate. Use Adaptive Cards or Power Apps forms for compact, mobile-friendly reviewer experiences. The less friction reviewers face, the cheaper the HITL solution becomes in operational cost. Community tutorials show how to display AI Builder outputs and confidence metadata in emails or cards for fast human validation.


Practical implementation walkthrough (example: invoice approvals)

 

  1. Modeling and testing: Train an AI Builder document processing model (or use prebuilt models where appropriate) on a representative corpus of invoices. Use the quick test functionality to inspect per-field confidences and understand where the model struggles (line items, OCR on poor scans, unusual formats). Microsoft docs explain how to run quick tests and retrieve confidence values during predict operations.
  2. Flow skeleton: Create a Power Automate flow triggered when a new invoice is uploaded to SharePoint. Add the AI Builder Extract information from documents action. Immediately after, check the confidence outputs and run a Condition: if Total_Confidence >= 0.92 and Vendor_Confidence >= 0.88 and amount < routing limit, create an approval request with the “approved automatically” tag and complete the flow. If confidence falls in the gray band, post an adaptive card to an approvals team with the extracted fields and confidence values for quick validation; if below the low threshold, route to a specialist. The PowerApps911 blog and other community guides show practical examples of reading confidence metadata inside flows.
  3. Reviewer UX: Use Adaptive Cards or a small Power App embedded as the approval task. Show each extracted field, original image preview, and the confidence number. Allow inline editing so reviewers can fix mistakes — then capture the reviewed values and outcome (approve/reject/needs more info) as flow outputs. Store those reviewed records in Dataverse or a SharePoint list for auditing and for use in model retraining.
  4. Feedback loop: Log each human correction and, where appropriate, push corrected documents into AI Builder’s feedback loop so the model can be retrained on real mistakes. Microsoft added explicit feedback-loop workflows for document models, which lets you collect human corrections directly from flows and surface them as candidate training data. This closes the improvement cycle and reduces future reviewer load.

 

Article content
 

Governance, auditing and compliance considerations

Approvals often live in regulated spaces; design your HITL system for auditability from day one. Persist the following for every decision: raw input (document image), extracted values, per-field confidence scores, AI rationale (if available), human reviewer identity, edits made by the reviewer, timestamps and the final decision. This trail supports investigations and demonstrates that human oversight was applied where needed.

Set and enforce SLAs for human review queues so automation doesn’t create new bottlenecks. For example, items in the gray band should have an SLA of a few hours; low-confidence escalations to experts can have longer targets but must be monitored. Use Power Platform monitoring tools or Dataverse dashboards to surface queue length, average review time, and auto-approval rate.

Be explicit about model limitations and disclose them to stakeholders. If a model has known weak spots (handwritten notes, poor scans from certain vendors), surface that in your approval UI and routing logic. Also ensure data residency and retention policies are respected when storing documents and model training data.

Finally, consider regulatory requirements: financial controls, GDPR (data subject access and right to be forgotten), or sectoral rules will affect what you store and for how long. Design retention and deletion flows (Power Automate plus Dataverse retention policies) to satisfy those constraints. When in doubt, involve compliance early and bake compliance checks into the workflow design.


Best practices, pitfalls to avoid, and next steps

Best practices

 

  • Validate thresholds with real data. Don’t guess a “good” confidence cutoff — measure error rates at candidate thresholds.
  • Use per-field thresholds for multi-field documents so critical fields get human attention even when others are confident.
  • Keep the reviewer UX minimal — short, focused actions reduce review time and cost.

 

Common pitfalls

 

  • Treating confidence as absolute truth. Confidence helps but doesn’t replace business rules or secondary checks. Combine signals.
  • Ignoring edge cases. Rare document types or vendor formats will break models; detect and route them explicitly.
  • Not instrumenting the feedback loop. If reviewer corrections aren’t captured and used to retrain, your model will stagnate.

 

Next steps for teams

 

  • Run a small pilot on a representative subset of approvals (e.g., invoices from three vendors). Capture metrics: auto-approval rate, error rate on auto-approvals, average review time, and cost per approval. Use that data to tune thresholds and forecast ROI.
  • Automate monitoring and alerts for spikes in low-confidence items; such spikes often indicate data drift or upstream changes (new invoice formats, scanning changes).
  • As your confidence and governance matures, evaluate intelligent stages (like Microsoft’s Intelligent Approvals) which can use LLMs and richer context to make explainable decisions in a controlled stage of the approval flow.

 


Closing — the practical payoff

Human-in-the-loop approvals with confidence scoring let teams get the best of both worlds: automation for routine, low-risk work and human judgement where it matters. The immediate payoffs are speed, reduced manual effort, and clearer audit trails. The long-term payoff — continual model improvement via feedback loops — reduces operational costs and increases trust in automation.

If you want, I can:

 

  • produce a concrete Power Automate flow JSON skeleton you can import (with the AI Builder action, condition checks, and adaptive card step),
  • generate a one-page training checklist for reviewers, or
  • create a sample set of threshold experiments and a recommended A/B test plan to tune your cutoffs.

 

I used Microsoft docs and community guidance for this article (I can add a full reference section if you’d like). Key sources include Microsoft’s guidance on interpreting confidence scores and feedback loops, the Intelligent Approvals release notes, and community articles showing how to read and route on AI Builder confidence metadata.

Source: Power Platform AI Week Day 6: Human-in-the-Loop Approvals with AI Confidence Scoring

Get New Internship Notification!

Subscribe & get all related jobs notification.