What does human-in-the-loop mean for an AI agent?

It means a person reviews and approves a specific high-risk action before the agent executes it. For agents with real tools — email, deploys, payments, databases — a useful human-in-the-loop step pauses the exact action, shows a person what will run, and continues only on an explicit approval that is tied to that action.

Is a Slack confirmation button enough for human-in-the-loop?

A button records that someone clicked, but the click is not bound to what actually runs. If the agent changes the recipient, amount, or command after the click, nothing detects it, and prompt injection can talk an agent past a soft confirmation. Binding the approval to a hash of the exact payload closes that gap.

How do you make an approval bound to the exact action?

Cosignet uses a WebAuthn passkey signature over the challenge nonce ‖ SHA-256(payload). The approver signs the exact action with a device passkey. If any field of the action changes afterward, the signature no longer matches, so the approval is evidence about one specific operation.

Blog · Guide

Human-in-the-loop for AI agents: a practical guide

Once an AI agent can send email, deploy to production, or move money, the dangerous part stops being what it generates and becomes what it executes. Human-in-the-loop is how you keep a person on the irreversible steps — but only if the approval is bound to the action, not just a click.

What human-in-the-loop actually means

“Human-in-the-loop” (HITL) means a person reviews and approves a step before the system commits to it. For an LLM that only writes text, the loop is editorial. For an agent with tools — GitHub, billing, cloud APIs, a database — the loop has to sit in front of the real side effect: the deploy, the transfer, the deletion, the secret rotation, the admin command.

The useful version of HITL has three properties:

It pauses the exact action, not a vague summary. The approver sees the real recipient, amount, endpoint, or command.
It fails closed. No approval, a timeout, or a denial means the action does not run.
It produces evidence. Afterward you can show who approved what, and prove the action that ran is the action that was approved.

Why a confirmation click is not a decision

The common first attempt is a Slack message with an Approve / Reject button. It feels like human-in-the-loop, but it records only that someone clicked a button. The click is not cryptographically tied to what the agent actually executes. Two failure modes follow:

Drift between approval and execution. The agent can change the recipient, amount, endpoint, or command between the click and the call. Nothing detects the change.
Prompt injection. A malicious instruction hidden in a web page, email, or document can talk an agent into reframing a dangerous action as routine — and a soft “Are you sure?” is easy to walk past.

A click is a UI event. For high-risk actions you want a decision bound to the operation.

Binding the approval to the action

The fix is to make the approval inseparable from the exact action. Cosignet does this with a standard WebAuthn passkey signature over a challenge built from the action itself:

challenge = nonce ‖ SHA-256(payload)

The approver confirms with a device passkey — Face ID, Touch ID, Windows Hello, or a security key — and the signature covers that exact payload. If the agent changes any field afterward, the signature no longer matches the operation, so the approval is real evidence about one specific action rather than reassurance about an intent. Keys never leave the approver’s device, and user verification (biometric or PIN) is required. The details are in the security model.

Where this helps — and where it doesn’t

Being honest about scope matters. Cosignet is an approval and evidence layer, not an executor or a policy engine. It does not run your action and it does not decide policy for you. It pauses a step, gets a real human signature bound to the payload, and hands you back a verifiable decision; you still execute.

It helps most when:

The action is high-risk and irreversible — deploys, fund transfers, data deletion, secret rotation, admin commands.
You need an audit trail that holds up later, not just a log line.
The caller is behind NAT or a firewall (a CLI, a CI job, a locked-down VPC) and can’t expose an inbound webhook.

It is not a replacement for least-privilege scoping, input validation, or sandboxing. Human approval is the last gate on the actions you deliberately choose to gate — use it together with those controls, not instead of them.

How to add a human step to an agent

Call Cosignet right before the risky tool runs. It exposes an MCP server for agents and a REST API for scripts, CI/CD, and backends. Both long-poll for the human decision over your own outbound connection, so there is no inbound port to open.

npm install @cosignet/sdk

import { Cosignet } from '@cosignet/sdk';

const cosignet = new Cosignet({ apiKey: process.env.COSIGNET_API_KEY });

// Before the agent runs the dangerous tool:
const decision = await cosignet.requestApproval({
  username: 'alex',
  action: 'Deploy api-gateway to production',
  payload: { service: 'api-gateway', env: 'production', commit: 'a1b2c3d' },
  notify: 'telegram_or_email',
});

if (decision.status === 'approved') {
  // proceed — decision.rawAssertion is the signed, payload-bound proof
} else {
  // 'rejected' | 'expired' | 'pending' (timed out) → do NOT run the action
}

The agent proceeds only on an explicit approved status. Anything else — a rejection, an expiry, or a timeout — fails closed. For a framework-specific walkthrough, see adding human approval to a LangChain agent.

Evidence you can verify without trusting the vendor

Every real approval is appended to a public, append-only transparency log (an RFC-6962 Merkle tree with Ed25519-signed tree heads). Anyone can independently recompute the leaf, the inclusion proof, and the approver’s passkey signature with open, dependency-free verifiers — the audit trail does not depend on trusting Cosignet. You can check a live example on the verify page.

Takeaways

Human-in-the-loop for agents must sit in front of the exact side effect, fail closed, and leave evidence.
A confirmation click is a UI event; a payload-bound passkey signature is a decision about a specific action.
Use it as the last gate on actions you choose to gate — alongside least-privilege, validation, and sandboxing, not instead of them.

Request access Read the docs