Skip to main content
When agents need to scale independently, run on different infrastructure, or be developed by different teams, you can deploy them as separate Restate services and route requests between them. Restate makes cross-service calls look like local function calls while providing end-to-end durability, failure recovery, and automatic retries.

Local vs remote agents

There are two ways to coordinate specialist agents:
Local agentsRemote agents
Where they runSame process as the routerSeparate services, potentially on different infrastructure
Best forSimple specialization with shared contextIndependent scaling, isolation, different languages
How routing worksHandoffs, sub-agents, or tool calls within one handlerDurable RPC calls between Restate services
ExampleLLM picks a specialist prompt, calls LLM again in the same handlerLLM picks a specialist service, router calls it over HTTP
Agent SDKs like OpenAI and Google ADK have built-in mechanisms for local routing (handoffs, sub-agents). For remote routing, or when using the Restate SDK directly, you deploy each specialist as its own Restate service and call it via Restate’s service clients. You can either use typed service clients or call a remote service by string name (generic calls). Learn more about calling services in the TS/Py SDK documentation.

Example: routing to specialist agents

With the Vercel AI, specialist agents are exposed as tools. The LLM decides which tool to call, and Restate durably persists the routing decision and the agent call via ctx.serviceClient().
remote-agents.ts
const run = async (ctx: restate.Context, claim: ClaimInput) => {
  const model = wrapLanguageModel({
    model: openai("gpt-5.4"),
    middleware: durableCalls(ctx, { maxRetryAttempts: 3 }),
  });

  const { text } = await generateText({
    model,
    prompt: `Claim: ${JSON.stringify(claim)}`,
    system:
      "You are a claim approval engine. Analyze the claim and use your tools to decide whether to approve.",
    tools: {
      analyzeEligibility: tool({
        description: "Analyze claim eligibility.",
        inputSchema: InsuranceClaimSchema,
        execute: async (claim: InsuranceClaim) =>
          ctx.serviceClient(eligibilityAgent).run(claim),
      }),
      analyzeFraud: tool({
        description: "Analyze probability of fraud.",
        inputSchema: InsuranceClaimSchema,
        execute: async (claim: InsuranceClaim) =>
          ctx.serviceClient(fraudCheckAgent).run(claim),
      }),
    },
    stopWhen: [stepCountIs(10)],
    providerOptions: { openai: { parallelToolCalls: false } },
  });

  return text;
};
Each specialist agent runs as its own Restate service:
eligibility-agent.ts
export const eligibilityAgent = restate.service({
  name: "EligibilityAgent",
  handlers: {
    run: async (ctx: restate.Context, claim: InsuranceClaim) => {
      const model = wrapLanguageModel({
        model: openai("gpt-5.4"),
        middleware: durableCalls(ctx, { maxRetryAttempts: 3 }),
      });
      const { text } = await generateText({
        model,
        system:
          "Decide whether the following claim is eligible for reimbursement." +
          "Respond with eligible if it's a medical claim, and not eligible otherwise.",
        prompt: JSON.stringify(claim),
      });
      return text;
    },
  },
});
Install Restate and launch it:
npm install --global @restatedev/restate-server@latest @restatedev/restate@latest
restate-server
Get the example:
restate example typescript-vercel-ai-tour-of-agents && cd typescript-vercel-ai-tour-of-agents
npm install
Export your OpenAI API key and run the agent:
export OPENAI_API_KEY=sk-...
npx tsx ./src/remote-agents.ts
Register the agents with Restate:
restate deployments register http://localhost:9080 --force --yes # dev only: overrides previous registrations
Start a request for a claim that needs to be analyzed by multiple agents:
curl localhost:8080/restate/call/MultiAgentClaimApproval/run --json '{
    "date":"2024-10-01",
    "category":"orthopedic",
    "reason":"hospital bill for a broken leg",
    "amount":3000,
    "placeOfService":"General Hospital"
}'
In the UI, you can see that the agent called the sub-agents and is waiting for their responses. You can see the trace of the sub-agents in the timeline.Once all sub-agents return, the main agent continues and makes a decision.
Multi-agent execution trace
For more details on resilient service-to-service calls, see the SDK documentation: TypeScript / Python.