Stop Calling Workflows ‘Agents’ – A Guide To Real Agentic AI

By Jake Jones, Flank.

Legal tech has a new addiction: slapping ‘agentic’ on anything with an LLM and a few integrations. It’s sloppy, it confuses buyers, and it slows the industry down. If your product can’t run unattended, can’t re-plan when the world pushes back, and requires a bespoke UI to babysit every click, then it’s not an agent. It’s software with delusions of grandeur.

This piece draws a bright line between genuine agentic systems and ‘workflow theatre’ dressed up as autonomy. Expect some toes to be stepped on.

A simple definition:

‘Agentic AI is a system that can pursue goals autonomously within constraints’.

Concretely, that means it can:

1. Hold a goal (e.g., ‘execute this NDA within policy’).

2. Form and revise a plan over multiple steps.

3. Choose and compose tools (email, e-signature, CLM, CRM, calendars, knowledge bases) without you telling it which to use, when, or how.

4. Act in external systems, observe results, and adapt when reality deviates.

5. Handle obstacles (OOO replies, missing fields, blocked permissions) by replanning, escalating, or negotiating alternatives.

6. Operate via existing channels (email, Slack, Teams) rather than requiring you to live inside a new interface.

7. Respect policy and risk tolerances via a rules/policy engine and auditable logs.

8. Finish the job (or stop safely) without human micro-orchestration.

If any of those are missing, we’re not in ‘agent’ territory.

What agentic AI is not:

Not an if/then workflow. Pre-baked branches are brittle. Agents plan, act, observe, and re-plan.

Not a generative UI with tool buttons. A pretty panel of integrations you have to click through is still software.

Not a ‘copilot’ that drafts suggestions you must accept step-by-step. That’s assistive AI, not autonomy.

Not dependent on a dedicated interface. Real agents meet you where you already work.

Not a synonym for ‘we wrote lots of integrations.’ Integration count ≠ autonomy.

Vendor bingo: the most common ‘fake agent’ patterns:

1. Workflow Wrappers

A rigid business process flow with LLM prompts glued in. Impressive demo; collapses the first time Finance changes a form.

2. Integration Theatre

‘We’re agentic, we integrate with 47 tools.’ The system still needs you to select Tool X, step 3, option B. That’s a remote control, not an agent.

3. Wizard Cosplay

A five-step UI that asks you everything the agent should infer. If the human must drive the path, it’s not autonomous.

4. Play-Acting Copilots

Drafts clauses and comments, but can’t chase signatures, update the tracker, or re-route around an OOO. That’s assistive drafting.

5. LLM-as-Form-Filler

Auto-completes fields in your CLM but can’t negotiate timelines, chase counterparties, or book a call when stuck.

If you recognise your product in any of these, stop calling it ‘agentic.’ Sell it proudly as assistive or automated workflow… both useful, just not agents.

The autonomy ladder (use this with your buyers)

Level 0 – Automated Workflow: Deterministic sequences. Reliable, brittle, cheap.

Level 1 – Assistive AI: Drafts, classifies, extracts. Human drives the process.

Level 2 – Supervised Agent: Plans and acts across tools; human approves key steps or exceptions.

Level 3 – Constrained Autonomy: Operates unattended within policy and risk bounds; escalates only on edge cases.

Most ‘agentic’ legal tech on the market is Level 1 masquerading as Level 3.

The legal-grade minimum bar for an agent

To claim ‘agent’, you should meet all of the following:

Goal & Plan Loop: An explicit planner that updates its plan based on outcomes, not prompts alone.

Tool Autonomy: Dynamic selection/composition of tools (including fallback paths).

Obstacle Recovery: Detects blockers (OOO, permission denied, missing data), tries alternatives, and escalates with context.

Policy Guardrails: Hard constraints (approval thresholds, clause libraries, data handling rules) enforced at run-time.

Auditability: Complete action log (who/what/when/why), reproducible inputs/outputs, and deterministic policy checks.

Channel-Native Operation: Works over email/Slack/Teams; no bespoke UI dependency.

Stop Conditions: Risk triggers, timeouts, and retry ceilings to avoid runaway behaviour.

If your system ticks these boxes only with a human clicking ‘next’, it’s not agentic.

A concrete example: NDA to signature without babysitting

Goal: Execute a low-risk NDA within policy.

A real agent will:

1. Parse intake from email/Slack, classify counterparty risk, pick the correct template.

2. Draft the NDA, apply house positions, log rationale.

3. Send via e-signature; if signer is OOO, re-route to delegate, propose a call, or reschedule.

4. Detect non-standard edits; auto-negotiate within authority, escalate only above thresholds.

5. Update the CLM, CRM, matter tracker; notify stakeholders in their channels.

6. Close the loop with an audit trail and evidence pack.

A ‘workflow wrapper’ will: generate a draft, open a UI, and wait for you to do the rest.

Underutilising agents just to wear the badge

Another flavour of malpractice: products that throttle autonomy so marketing can say ‘agentic’ without doing the hard work.

Forcing approvals on every microscopic step ‘for control’, you’ve turned an agent into a checklist.

Banning tool selection—hard-coding the e-signature vendor and calendar logic—so the ‘agent’ can never re-plan.

Hiding behind ‘compliance’ to avoid building guardrails, then blaming regulators for lack of autonomy.

If you’re doing this, you’re not safeguarding; you’re ducking engineering.

How buyers should evaluate ‘agentic’ claims

Ask for these four metrics on a representative cohort of matters:

1. Unattended Completion Rate (UCR): % of tasks fully completed with no human actions.

2. Obstacle Recovery Rate (ORR): % of blockers resolved without human help.

3. Mean Time to Human (MTTH): Average runtime before first required human intervention.

4. Policy Breach Rate (PBR): Incidents per 1,000 runs where the agent attempted an out-of-policy action (should be near zero).

Then run a black-box test: give a mailbox, a CLM, an e-sig tool, your policies, and a real inbox full of edge cases for a week. No vendor-operated demo rail. Watch what survives.

Architecture matters (and it’s different)

Agentic systems aren’t ‘CRUD-plus-LLM.’ They have different bones:

• Planner/Controller: Maintains goals, decomposes tasks, re-plans on feedback.

• Memory & State: Case state + episodic memory for long-running matters.

• Policy Engine: Compile-time and run-time constraints; authority thresholds; safe-action filters.

• Toolbox & Router: Tool schemas, affordances, adapter discovery, and fallbacks.

• Monitors: Execution watchdogs, anomaly detectors, stop conditions.

• Event Bus: Asynchronous, event-driven loops, not request/response forms.

• Audit Layer: Immutable logs, artefact storage, replay.

If your ‘agent’ is a prompt template calling a few APIs, it will crumble the moment reality deviates.

The OOO email, revisited

Agents don’t fall over when they hit an obstacle (like an OOO email when seeking approval on a contract).

A real agent will infer the delay impact, check the authority map, contact a delegate, propose alternate timelines, or escalate with a risk-aware summary… without you holding its hand!

The interface myth

Agents don’t need a dedicated interface. If your system only ‘works’ inside your proprietary UI, it’s not an agent; it’s a product demanding user behaviour change. Agents should hum along over email/Slack/Teams and touch your CLM/CRM quietly in the background.

The naming problem (and why it matters)

‘Agent’ isn’t just another name for a generative AI application. Language shapes budgets. When vendors blur ‘assistive’, ‘automated’, and ‘agentic’, legal teams buy the wrong thing, measure the wrong outcomes, and conclude ‘AI can’t do that.’ It can, but only if we build the right class of system and deploy it in the right risk bands.

A workable way forward

• Be honest about the level. If you’re L1/L2, say so. There’s huge value in copilots and smart workflows.

• Pick bounded domains. Start with high-volume, low-risk matters (NDAs, routine vendor onboarding, standard DPAs).

• Engineer guardrails properly. Policy engines, safe tool schemas, monitors. Not just ‘human-in-the-loop everywhere.’

• Publish the metrics. UCR, ORR, MTTH, PBR. If you can’t, you’re not ready to say ‘agent’.

• Meet users where they are. Channels first; dashboards later.

The paradigm shift, plainly

The emerging industry is not ‘digital software with AI inside.’ It’s intelligent, autonomous systems that act across your stack to achieve outcomes. Different components, different constraints, different responsibilities. We don’t ‘use’ them so much as task them, constrain them, and audit them.

Stop rebranding workflows. Build agents, or sell what you have got proudly as what it is.

—

About the author: Jake Jones is the co-founder of Flank, a legal tech company that develops agents for legal teams that can autonomously handle routine tasks.

This is an educational think piece kindly written for Artificial Lawyer after this site has become increasingly aware that some of the ‘agents’ currently being sold in the legal tech market are not actually real agents at all. Hence, we need to understand more about this subject. AL therefore asked Jake, who has been working in this niche area for some years, to help clear up the matter and set out some clear definitions.

As noted in a previous AL article, if you’re planning on marketing a new product or feature, please consider first whether it actually displays agentic characteristics before describing it as such.

—

Legal Innovators Conferences in London and New York – November ’25

If you’d like to stay ahead of the legal AI curve then come along to Legal Innovators New York, Nov 19 + 20 and also, Legal Innovators UK – Nov 4 + 5 + 6, where the brightest minds will be sharing their insights on where we are now and where we are heading.

Legal Innovators UK arrives first, with: Law Firm Day on Nov 4th, then Inhouse Day, on the 5th, and then our new Litigation Day on the 6th.

What's Hot

UniVideo: Unified Understanding, Generation, and Editing for Videos – Takara TLDR

MIT rejects Trump administration’s higher education funding agreement

Reinforcing Diffusion Models by Direct Group Preference Optimization – Takara TLDR

Stop Calling Workflows ‘Agents’ – A Guide to Real Agentic AI – Artificial Lawyer

Stanford’s Paper2Agent Reimagines Scientific Papers as Interactive AI Agents

Tesla axed one of the Model Y’s best features in ‘Standard’ trims: here’s why

How Tesla’s Standard models will help deliveries despite price disappointment

The Rubin Names 2025 Art Prize, Research and Art Projects Grants

Kochi-Muziris Biennial Announces 66 Artists for December Exhibition

Instagram Launches ‘Rings’ Awards for Creators—With KAWS as a Judge

Frieze to Launch Abu Dhabi Fair in November 2026

UniVideo: Unified Understanding, Generation, and Editing for Videos – Takara TLDR

MIT rejects Trump administration’s higher education funding agreement

Reinforcing Diffusion Models by Direct Group Preference Optimization – Takara TLDR

What's Hot

Stop Calling Workflows ‘Agents’ – A Guide to Real Agentic AI – Artificial Lawyer

Discover more from Artificial Lawyer

Related Posts

Subscribe to Updates