Skip to main content
Restate automatically retries failures of your agents until they succeed. But LLM calls are costly, so you might want to configure retry behavior to fit your use case and to avoid retrying errors that cannot heal. Restate distinguishes between two types of errors:
  • Transient errors: Temporary issues like network failures or rate limits. Restate automatically retries these until they succeed or the retry policy is exhausted.
  • Terminal errors: Permanent failures like invalid input or business rule violations. Restate does not retry these. The invocation fails permanently. You can catch these errors and handle them gracefully.

Retrying LLM calls

LLM API calls can suffer from transient failures (rate limits, network issues, provider outages). Restate retries failed LLM calls so your agents recover automatically.

Default behavior

The Vercel AI SDK and the Restate middleware each have their own retry layer, and they compose.The Vercel AI SDK does the first layer of retries based on what is set for maxRetries on generateText (default: 2) . Once those are exhausted, the AI SDK throws an error.Restate then takes over and retries the invocation. Each Restate retry replays the call, which goes through maxRetries Vercel AI SDK attempts again.By default, Restate’s retries follow the policy configured at the service or handler level, or otherwise the Restate server’s default policy. Restate will go through a limited set of retries with exponential backoff (see default policy), after which the invocation will be paused. This gives you time to fix the issue, and then resume the invocation.

Setting a retry policy

To set a separate retry policy for LLM calls, pass RunOptions to the durableCalls middleware:
errorhandling/fail-on-terminal-tool-agent.ts
const model = wrapLanguageModel({
  model: openai("gpt-5.4"),
  middleware: durableCalls(ctx, { maxRetryAttempts: 3 }),
});
If you set a maximum number of retry attempts, Restate will still go through the AI SDK’s maxRetries for each attempt, so the two limits multiply (e.g. maxRetryAttempts: 3 × maxRetries: 2 = up to 6 attempts).Once Restate’s retries are exhausted, the invocation fails with a TerminalError and won’t be retried further. You can catch the Terminal Error in your handler and act accordingly.

Tool execution errors

Restate makes tool execution resilient by retrying transient errors and propagating terminal ones.

Transient errors

By default, the Vercel AI SDK converts any errors in tool executions into a message to the LLM, and the agent decides how to proceed. This is often desirable, as the LLM can decide to use a different tool or provide a fallback answer.When you wrap external calls in Restate Context actions like ctx.run, Restate retries transient errors within the Context action before the result reaches the agent. This makes your tools resilient to network failures, database hiccups, and other temporary issues. For all operations that might suffer from transient errors, use Context actions:
// Without ctx.run - error goes straight to agent
async function myTool() {
  const result = await fetch("/api/data"); // Might fail due to network
  // If this fails, agent gets the error immediately
}

// With ctx.run - Restate handles retries
async function myToolWithRestate(ctx: restate.Context) {
  const result = await ctx.run("fetch-data", () => fetch("/api/data"));
  // Network failures get retried automatically
  // Only terminal errors reach the AI
}
Restate then retries the whole invocation according to the policy configured at the service or handler level, or otherwise the Restate server’s default policy.

Setting a retry policy on run actions

If you do run actions in your tools, you can override the default retry policy by passing RunOptions:
const result = await ctx.run(
    "fetch-data",
    () => fetch("/api/data"),
    { maxRetryAttempts: 3 }
);
See custom retry policies for more options. When retries are exhausted, the tool will fail with a Terminal Error.

Terminal errors

For errors that should not be retried (invalid input, business rule violations, resource not found), use a TerminalError in your tool. Restate does not retry these:
throw new TerminalError("This tool is not allowed to run for this input.");
By default, Vercel AI converts the terminal error into a message to the LLM, and the agent decides how to proceed.If you want to treat terminal tool errors as permanent failures and stop the agent instead, the Restate middleware provides two utilities:
To fail the agent on terminal tool errors, rethrow the error in onStepFinish:
errorhandling/fail-on-terminal-tool-agent.ts
const { text } = await generateText({
  model,
  tools: {
    getWeather: tool({
      description: "Get the current weather for a given city.",
      inputSchema: z.object({ city: z.string() }),
      execute: async ({ city }) => {
        return await ctx.run("get weather", () => fetchWeather(city));
      },
    }),
  },
  stopWhen: [stepCountIs(5)],
  onStepFinish: rethrowTerminalToolError,
  system: "You are a helpful agent that provides weather updates.",
  messages: [{ role: "user", content: prompt }],
});
To stop the agent on terminal tool errors and handle it after the agent finishes, you can use hasTerminalToolError in stopWhen and then inspect the steps for errors:
errorhandling/stop-on-terminal-tool-agent.ts
const { steps, text } = await generateText({
  model,
  tools: {
    getWeather: tool({
      description: "Get the current weather for a given city.",
      inputSchema: z.object({ city: z.string() }),
      execute: async ({ city }) => {
        return await ctx.run("get weather", () => fetchWeather(city));
      },
    }),
  },
  stopWhen: [stepCountIs(5), hasTerminalToolError],
  system: "You are a helpful agent that provides weather updates.",
  messages: [{ role: "user", content: prompt }],
});

const terminalSteps = getTerminalToolSteps(steps);
if (terminalSteps.length > 0) {
  // Do something with the terminal tool error steps
}
To learn more about error handling with Restate, consult the error handling guide.

Combining with rollback

For multi-step agent workflows where steps have side effects (bookings, payments, emails), combine terminal errors with compensation/rollback patterns to undo completed work before finishing.