Learning Series: Agentic Development: A New Way to Build Software

The hidden risks of agentic systems


We’ve spent years building systems where AI generates responses.
Now, we’re building systems where AI makes decisions and takes actions.

That shift sounds subtle — but it fundamentally changes how systems fail.

A wrong answer is recoverable.
A wrong action is executed.

From Generation to Execution

Traditional LLM applications are mostly stateless:

  • Input → Output → Done

Agentic systems introduce a loop, often based on the ReAct pattern:

Reason → Act → Observe → Repeat

Here, the model:

  • reasons about a goal
  • selects a tool (API, database, function)
  • executes it
  • observes the result
  • continues iterating

This creates a stateful, evolving system.

And that’s where new failure modes emerge.

Failure Mode 1: Compounding Hallucinations

LLMs don’t verify truth — they optimize for likelihood of tokens.

In isolation, hallucination is manageable.
In a loop, it becomes dangerous.

A typical chain looks like:

  • Incorrect reasoning
  • → Incorrect tool selection
  • → Incorrect output
  • → Stored as context
  • → Used in the next reasoning step

This leads to what can be described as error propagation across iterations.

The system doesn’t just fail once — it drifts over time.

Failure Mode 2: Tool-Calling Errors

Modern agents rely on function/tool calling frameworks:

  • APIs
  • database queries
  • external services

Tool selection is not deterministic — it’s inferred by the model.

This introduces:

  • Wrong tool selection
  • Incorrect parameter generation
  • Misinterpretation of tool responses

For example:

  • A retrieval tool is selected when computation is needed
  • A query is formed with incorrect filters
  • A response is treated as “complete” when it is partial

The agent is not executing logic — it is predicting which logic to execute.

Failure Mode 3: Lack of Termination (Unbounded Loops)

ReAct-style systems require:

  • clear goals
  • explicit stopping criteria

Without them, agents may:

  • retry the same action repeatedly
  • re-evaluate identical context
  • continue reasoning without convergence

This is often caused by:

  • vague objectives
  • weak feedback signals
  • absence of loop constraints (step limits, timeouts)

The system appears active — but produces no meaningful progress.

Failure Mode 4: Goal Misalignment

Agents optimize for interpreted goals, not actual intent.

A prompt like:

“Detect suspicious activity”

is underspecified.

The agent must internally define:

  • what counts as “suspicious”
  • thresholds for triggering
  • acceptable uncertainty

This leads to:

  • over-sensitive behavior (false positives)
  • under-sensitive behavior (missed events)

In practice, this is a form of:

specification ambiguity → behavioral divergence

Failure Mode 5: Over-Autonomy Without Guardrails

The real risk of agentic systems is not intelligence — it’s autonomy with side effects.

When agents are allowed to:

  • trigger workflows
  • modify system state
  • interact with external systems

Without constraints, small errors can escalate into:

  • incorrect state updates
  • corrupted pipelines
  • unintended actions across services

This is especially critical in:

  • automation systems
  • monitoring pipelines
  • decision-triggered workflows

An incorrect action is no longer local — it becomes system-wide.

The Core Engineering Challenge

Agentic systems combine three difficult properties:

  • Non-determinism (LLMs generate probabilistic outputs)
  • Statefulness (each step depends on previous steps)
  • Autonomy (the system decides what to do next)

Individually, each is manageable.
Together, they create systems that are harder to predict, test, and debug.

You’re no longer debugging functions.
You’re debugging decision-making processes over time.

Designing for Failure (Not Just Success)

Agentic systems require a different engineering mindset.

Instead of asking:

“What should this system do?”

You need to ask:

“What happens when it does the wrong thing?”

Practical safeguards include:

  • Tool constraints (limit what actions can be taken)
  • Structured outputs (reduce ambiguity in responses)
  • Step limits & timeouts (prevent infinite loops)
  • Verification layers (validate outputs before execution)
  • Reasoning logs (trace decisions, not just results)

Final Thought

The biggest shift isn’t that AI can reason.

It’s that we’re allowing it to:

select actions and execute them inside real systems.

And that changes the cost of failure.

In agentic systems, failure is no longer a bad response — it is a decision carried out in your architecture.

If you’re building with agents, the real question isn’t:

“How capable is this system?”

It’s:

“How does it behave when it’s wrong?”

Hridya Syju
Hridya Syju