Agents and Multi-Step Workflows | Building Your First AI Product | Celestinosalim.com

Agents and Multi-Step Workflows

What You Will Build

A customer support agent that can look up orders, check shipping status, and process refunds --- all from a single user message. The user writes "My order #1234 arrived damaged. Can I get a refund?" and the agent handles the entire workflow: verify the order, check eligibility, initiate the refund, and respond with a confirmation number.

Here is the complete agent:

// app/api/agent/route.ts
import { streamText, tool } from 'ai'
import { z } from 'zod'

export async function POST(request: Request) {
  const { messages } = await request.json()

  const result = streamText({
    model: 'openai/gpt-4o-mini',
    system: `You are a customer support agent. You can look up orders, check their status, and process refunds. Always verify the order exists before taking action. Be helpful and concise.`,
    messages,
    tools: {
      getOrderStatus: tool({
        description: 'Get the status of a customer order by order ID',
        inputSchema: z.object({
          orderId: z.string().describe('The order ID, e.g. "1234"')
        }),
        execute: async ({ orderId }) => {
          // In production, query your database
          return {
            orderId,
            status: 'delivered',
            deliveredAt: '2026-02-22',
            items: ['Wireless Headphones'],
            total: 79.99,
            refundEligible: true
          }
        }
      }),
      initiateRefund: tool({
        description: 'Initiate a refund for an order. Only use after confirming the order exists and is eligible.',
        inputSchema: z.object({
          orderId: z.string(),
          reason: z.string().describe('Reason for the refund')
        }),
        execute: async ({ orderId, reason }) => {
          // In production, call your payments API
          return {
            confirmationNumber: 'RF-5678',
            amount: 79.99,
            estimatedDays: 5
          }
        }
      })
    },
    maxSteps: 5
  })

  return result.toUIMessageStreamResponse()
}

Paste this into your project. Use the same useChat client from lesson 2. Send "My order #1234 arrived damaged. Can I get a refund?" and watch the agent work through multiple tool calls before responding with the confirmation. Now let us understand why this works.

What an Agent Actually Is

An agent is an LLM in a loop. Instead of generating one response and stopping, it follows a cycle:

THINK   - Analyze the current situation and decide what to do next
DECIDE  - Choose an action (call a tool, generate text, ask for clarification)
ACT     - Execute the action
OBSERVE - Read the result
REPEAT  - Go back to THINK until the task is complete

The key difference from a chatbot: the model decides what to do next, not the user. The user provides a goal. The agent figures out the steps.

maxSteps Is the Agent Loop

You already built tool use in lesson 3. What you may not have realized is that streamText with maxSteps is already an agent loop. When you set maxSteps: 5, the model can:

Read the user's message.
Decide to call a tool.
Receive the tool result.
Decide to call another tool (or the same tool with different arguments).
Generate a final response using all the information gathered.

Here is the agent's internal loop for the refund request:

1. THINK: Customer wants a refund for a damaged order. I need to check the order first.
2. ACT:   Call getOrderStatus({ orderId: "1234" })
3. OBSERVE: Order exists, delivered 3 days ago, refund eligible.
4. THINK: Order is eligible. I should initiate the refund.
5. ACT:   Call initiateRefund({ orderId: "1234", reason: "damaged on arrival" })
6. OBSERVE: Refund initiated, confirmation number RF-5678.
7. RESPOND: "I've initiated a refund for order #1234. Your confirmation
            number is RF-5678. You should see the credit within 5 business days."

Three tool calls, one coherent response. The user sent one message and got a completed task.

Agents vs Copilots: Know Which You Are Building

A copilot suggests. You decide. GitHub Copilot proposes code; you accept or reject it. ChatGPT gives advice; you act on it or not.

An agent decides and acts. You set the goal; it handles the steps. A refund agent processes the refund. A research agent gathers and synthesizes information. A scheduling agent books the meeting.

The distinction matters because agents carry more risk. A copilot that suggests wrong code is harmless --- you catch it. An agent that processes the wrong refund costs money. Always match the autonomy level to the stakes:

Low stakes (summarizing, drafting, searching): Full agent autonomy.
Medium stakes (sending emails, updating records): Agent proposes, human confirms.
High stakes (financial transactions, deleting data): Human in the loop at every step.

The Supervisor Pattern

For complex tasks, a single agent with many tools gets unwieldy. The supervisor pattern splits the work: one "manager" LLM plans and delegates, while specialist steps handle specific tasks.

User: "Analyze our competitor Acme Corp"

Supervisor breaks this into:
  1. Research step: gather recent news and product launches
  2. Extraction step: pull out key metrics and details
  3. Strategy step: compare positioning and identify gaps

Supervisor synthesizes results into a final report.

In code, this is a chain of generateText calls where the output of one becomes the input of the next:

// lib/agents/competitor-analysis.ts
import { generateText } from 'ai'

async function analyzeCompetitor(company: string) {
  // Step 1: Research
  const { text: research } = await generateText({
    model: 'openai/gpt-4o-mini',
    system: 'You are a research analyst. Summarize key findings.',
    prompt: `Research recent developments for ${company}.`,
    tools: { webSearch: searchTool },
    maxSteps: 3
  })

  // Step 2: Extract metrics
  const { text: metrics } = await generateText({
    model: 'openai/gpt-4o-mini',
    system: 'Extract key business metrics and product details.',
    prompt: `From this research, extract structured metrics:\n\n${research}`
  })

  // Step 3: Strategic analysis (use a stronger model for reasoning)
  const { text: strategy } = await generateText({
    model: 'openai/gpt-4o',
    system: 'You are a strategy consultant. Be specific and actionable.',
    prompt: `Based on this competitor analysis, identify strategic opportunities:\n\nResearch:\n${research}\n\nMetrics:\n${metrics}`
  })

  return { research, metrics, strategy }
}

Each step uses a focused system prompt and receives only the context it needs. The supervisor (your code, in this case) orchestrates the sequence. Notice step 3 uses a stronger model --- you can mix models within a workflow, using cheaper models for routine steps and premium models for the reasoning that matters most.

For a streaming version of multi-step workflows, the AI SDK provides createUIMessageStream:

import { createUIMessageStream, streamText, convertToModelMessages } from 'ai'

const stream = createUIMessageStream({
  execute: async ({ writer }) => {
    const result1 = streamText({
      model: 'openai/gpt-4o-mini',
      messages,
      tools: { /* ... */ }
    })
    writer.merge(result1.toUIMessageStream({ sendFinish: false }))

    const result2 = streamText({
      model: 'openai/gpt-4o',
      messages: [
        ...convertToModelMessages(messages),
        ...(await result1.response).messages
      ]
    })
    writer.merge(result2.toUIMessageStream({ sendStart: false }))
  }
})

This streams both steps to the client in sequence, so the user sees results as each step completes rather than waiting for the entire chain.

When Agents Help and When They Hurt

Agents help when:

The task requires multiple steps with clear success criteria.
Each step can be validated before moving to the next.
The task is repetitive enough to justify the engineering investment.
The tools are well-defined and the action space is bounded.

Agents hurt when:

The task is simple enough for a single LLM call. Agent overhead adds latency and cost for no benefit.
The task requires nuanced human judgment that cannot be expressed as tool calls.
Errors compound across steps. If step 1 has 90% accuracy and step 2 also has 90% accuracy, the chain has 81% accuracy. Five steps at 90% each drops to 59%.
The action space is unbounded. An agent that can "do anything" will eventually do the wrong thing.

Start with the simplest approach that works. A single streamText call with tools covers most use cases. Graduate to multi-step workflows only when you have a clear, validated need.

Try This

Add a checkRefundPolicy tool to the support agent that takes a reason and returns whether the refund is approved or denied based on simple rules (e.g., "damaged" is always approved, "changed mind" is only approved within 30 days). Then modify the agent's system prompt to say: "Always check the refund policy before initiating a refund."

This forces a three-step chain: look up order, check policy, then initiate refund (or explain the denial). Watch how the model plans the sequence without you hardcoding it.

What's Next

You have built an AI feature that works locally --- streaming chat, structured outputs, RAG, and multi-step agents. Time to ship it. In the final lesson, we cover deploying AI on Vercel: environment variables, rate limiting, cost controls, and monitoring.