When AI Acts, Failure Becomes Systemic

Learning Series: Agentic Development: A New Way to Build Software

The hidden risks of agentic systems

We’ve spent years building systems where AI generates responses.
Now, we’re building systems where AI makes decisions and takes actions.

That shift sounds subtle — but it fundamentally changes how systems fail.

A wrong answer is recoverable.
A wrong action is executed.

From Generation to Execution

Traditional LLM applications are mostly stateless:

Input → Output → Done

Agentic systems introduce a loop, often based on the ReAct pattern:

Reason → Act → Observe → Repeat

Here, the model:

reasons about a goal
selects a tool (API, database, function)
executes it
observes the result
continues iterating

This creates a stateful, evolving system.

And that’s where new failure modes emerge.

Failure Mode 1: Compounding Hallucinations

LLMs don’t verify truth — they optimize for likelihood of tokens.

In isolation, hallucination is manageable.
In a loop, it becomes dangerous.

A typical chain looks like:

Incorrect reasoning
→ Incorrect tool selection
→ Incorrect output
→ Stored as context
→ Used in the next reasoning step

This leads to what can be described as error propagation across iterations.

The system doesn’t just fail once — it drifts over time.

Failure Mode 2: Tool-Calling Errors

Modern agents rely on function/tool calling frameworks:

APIs
database queries
external services

Tool selection is not deterministic — it’s inferred by the model.

This introduces:

Wrong tool selection
Incorrect parameter generation
Misinterpretation of tool responses

For example:

A retrieval tool is selected when computation is needed
A query is formed with incorrect filters
A response is treated as “complete” when it is partial

The agent is not executing logic — it is predicting which logic to execute.

Failure Mode 3: Lack of Termination (Unbounded Loops)

ReAct-style systems require:

clear goals
explicit stopping criteria

Without them, agents may:

retry the same action repeatedly
re-evaluate identical context
continue reasoning without convergence

This is often caused by:

vague objectives
weak feedback signals
absence of loop constraints (step limits, timeouts)

The system appears active — but produces no meaningful progress.

Failure Mode 4: Goal Misalignment

Agents optimize for interpreted goals, not actual intent.

A prompt like:

“Detect suspicious activity”

is underspecified.

The agent must internally define:

what counts as “suspicious”
thresholds for triggering
acceptable uncertainty

This leads to:

over-sensitive behavior (false positives)
under-sensitive behavior (missed events)

In practice, this is a form of:

specification ambiguity → behavioral divergence

Failure Mode 5: Over-Autonomy Without Guardrails

The real risk of agentic systems is not intelligence — it’s autonomy with side effects.

When agents are allowed to:

trigger workflows
modify system state
interact with external systems

Without constraints, small errors can escalate into:

incorrect state updates
corrupted pipelines
unintended actions across services

This is especially critical in:

automation systems
monitoring pipelines
decision-triggered workflows

An incorrect action is no longer local — it becomes system-wide.

The Core Engineering Challenge

Agentic systems combine three difficult properties:

Non-determinism (LLMs generate probabilistic outputs)
Statefulness (each step depends on previous steps)
Autonomy (the system decides what to do next)

Individually, each is manageable.
Together, they create systems that are harder to predict, test, and debug.

You’re no longer debugging functions.
You’re debugging decision-making processes over time.

Designing for Failure (Not Just Success)

Agentic systems require a different engineering mindset.

Instead of asking:

“What should this system do?”

You need to ask:

“What happens when it does the wrong thing?”

Practical safeguards include:

Tool constraints (limit what actions can be taken)
Structured outputs (reduce ambiguity in responses)
Step limits & timeouts (prevent infinite loops)
Verification layers (validate outputs before execution)
Reasoning logs (trace decisions, not just results)

Final Thought

The biggest shift isn’t that AI can reason.

It’s that we’re allowing it to:

select actions and execute them inside real systems.

And that changes the cost of failure.

In agentic systems, failure is no longer a bad response — it is a decision carried out in your architecture.

If you’re building with agents, the real question isn’t:

“How capable is this system?”

It’s:

“How does it behave when it’s wrong?”