LLM orchestration patterns: agents, tools, and loops —
On this page
Skip to contentOverview
Agent frameworks have proliferated faster than the empirical evidence supporting them. LangGraph, AutoGen, CrewAI, DSPy — each claims to solve the orchestration problem, but they solve different problems. This log maps the landscape by pattern, not by framework name.
ReAct: Reason + Act
ReAct is the simplest agent pattern and the most widely deployed. The model alternates between thinking (reasoning about what to do) and acting (calling a tool). Each cycle produces a thought, an action, an observation, and repeats until the model decides it has enough information.
Pros:
- Simple to implement. One loop, one prompt template.
- The model decides when it’s done, not a hard-coded iteration limit.
- Works surprisingly well for single-step tool use.
Cons:
- No memory between cycles. Each thought is based on the full conversation history, which grows unbounded.
- The model can get stuck in thought-action-observation loops if the tool output doesn’t provide the right signal.
- No way to plan ahead. The model reacts to each observation rather than following a strategy.
Plan-and-Execute
Plan-and-Execute separates the planning phase from the execution phase. The model first generates a plan (a sequence of steps), then executes each step sequentially.
Pros:
- The plan can be validated before execution begins.
- Steps can be parallelized if they’re independent.
- Easier to debug — you can see the plan and the execution separately.
Cons:
- Plans are brittle. A single failed step can invalidate the entire plan.
- The model’s planning ability is significantly weaker than its reasoning ability.
- Requires a separate prompt for planning vs. execution, which doubles the token cost.
Tool-Augmented Generation
This is the pattern most people call “RAG with tools” but it’s distinct from both ReAct and Plan-and-Execute. The model has access to a set of tools but doesn’t explicitly reason about when to use them. Instead, the system routes tool calls based on input classification.
Pros:
- Predictable routing. You know which tool will be called for which input type.
- No reasoning overhead. The model just generates the tool parameters.
- Easier to test and monitor.
Cons:
- Classification is fragile. Misclassifying an input sends it to the wrong tool.
- No ability to combine tools. If a task requires three tools, the system needs to orchestrate them externally.
- The model can’t discover new tool use patterns.
When Each Pattern Breaks
- ReAct breaks when the task requires more than 5-6 action cycles. The conversation history becomes too long, and the model loses track of which tools have been called.
- Plan-and-Execute breaks when the environment is non-deterministic. A plan that works in testing may fail in production because the state changed between planning and execution.
- Tool-Augmented Generation breaks when the input space is open-ended. If you can’t reliably classify inputs into tool categories, the routing fails.
Current Focus
Building a hybrid pattern that combines ReAct’s flexibility with Plan-and-Execute’s structure. The model generates a plan, but can revise it mid-execution based on observations. Early results are promising but the revision logic is the hard part.