LLM orchestration patterns: agents, tools, and loops

On this page

Skip to content

    Overview

    Agent frameworks have proliferated faster than the empirical evidence supporting them. LangGraph, AutoGen, CrewAI, DSPy — each claims to solve the orchestration problem, but they solve different problems. This log maps the landscape by pattern, not by framework name.

    ReAct: Reason + Act

    ReAct is the simplest agent pattern and the most widely deployed. The model alternates between thinking (reasoning about what to do) and acting (calling a tool). Each cycle produces a thought, an action, an observation, and repeats until the model decides it has enough information.

    Pros:

    • Simple to implement. One loop, one prompt template.
    • The model decides when it’s done, not a hard-coded iteration limit.
    • Works surprisingly well for single-step tool use.

    Cons:

    • No memory between cycles. Each thought is based on the full conversation history, which grows unbounded.
    • The model can get stuck in thought-action-observation loops if the tool output doesn’t provide the right signal.
    • No way to plan ahead. The model reacts to each observation rather than following a strategy.

    Plan-and-Execute

    Plan-and-Execute separates the planning phase from the execution phase. The model first generates a plan (a sequence of steps), then executes each step sequentially.

    Pros:

    • The plan can be validated before execution begins.
    • Steps can be parallelized if they’re independent.
    • Easier to debug — you can see the plan and the execution separately.

    Cons:

    • Plans are brittle. A single failed step can invalidate the entire plan.
    • The model’s planning ability is significantly weaker than its reasoning ability.
    • Requires a separate prompt for planning vs. execution, which doubles the token cost.

    Tool-Augmented Generation

    This is the pattern most people call “RAG with tools” but it’s distinct from both ReAct and Plan-and-Execute. The model has access to a set of tools but doesn’t explicitly reason about when to use them. Instead, the system routes tool calls based on input classification.

    Pros:

    • Predictable routing. You know which tool will be called for which input type.
    • No reasoning overhead. The model just generates the tool parameters.
    • Easier to test and monitor.

    Cons:

    • Classification is fragile. Misclassifying an input sends it to the wrong tool.
    • No ability to combine tools. If a task requires three tools, the system needs to orchestrate them externally.
    • The model can’t discover new tool use patterns.

    When Each Pattern Breaks

    • ReAct breaks when the task requires more than 5-6 action cycles. The conversation history becomes too long, and the model loses track of which tools have been called.
    • Plan-and-Execute breaks when the environment is non-deterministic. A plan that works in testing may fail in production because the state changed between planning and execution.
    • Tool-Augmented Generation breaks when the input space is open-ended. If you can’t reliably classify inputs into tool categories, the routing fails.

    Current Focus

    Building a hybrid pattern that combines ReAct’s flexibility with Plan-and-Execute’s structure. The model generates a plan, but can revise it mid-execution based on observations. Early results are promising but the revision logic is the hard part.