Using Claude Code as a Proposer¶
GEPA's reflection LM — the model that reads evaluation feedback and proposes improved candidates — accepts any callable matching (str) -> str. This means you can use Claude Code's claude -p command as a drop-in proposer, powered by your existing Claude subscription with no API key required.
The same approach works for any CLI-based model (e.g., ollama run, a local model server). More generally, the reflection_lm accepts any Python function with signature (str) -> str — it doesn't need to be a CLI wrapper. You can call a local model, hit a custom HTTP endpoint, or implement any logic you want.
Setup¶
Make sure claude is installed and authenticated:
claude # opens interactive mode; complete login if needed
claude -p "say hello" # verify non-interactive mode works
Wrapping claude -p as a Reflection LM¶
The LanguageModel protocol is (str) -> str. Wrap the CLI call in a function:
import subprocess
def claude_cli_lm(prompt: str) -> str:
result = subprocess.run(
["claude", "-p", prompt],
capture_output=True,
text=True,
timeout=120,
)
if result.returncode != 0:
raise RuntimeError(f"claude -p failed: {result.stderr}")
return result.stdout
Using with gepa.optimize¶
Pass the wrapper as reflection_lm:
import gepa
result = gepa.optimize(
seed_candidate={"instructions": "Answer the question correctly."},
trainset=dataset,
task_lm=your_task_lm,
reflection_lm=claude_cli_lm,
max_metric_calls=50,
)
Using with optimize_anything¶
Set it on ReflectionConfig:
from gepa.optimize_anything import optimize_anything, GEPAConfig, ReflectionConfig
config = GEPAConfig(
reflection=ReflectionConfig(
reflection_lm=claude_cli_lm,
),
)
result = optimize_anything(config=config, ...)
Using claude -p as Both Task LM and Reflection LM¶
You can also use claude -p to power the system being optimized. The task_lm in gepa.optimize expects a chat-completion callable (list[dict]) -> str, so add a thin wrapper that flattens messages:
def claude_cli_chat(messages: list) -> str:
parts = []
for m in messages:
if m["role"] == "system":
parts.append(f"[System]\n{m['content']}")
elif m["role"] == "user":
parts.append(f"[User]\n{m['content']}")
prompt = "\n\n".join(parts)
return claude_cli_lm(prompt)
result = gepa.optimize(
seed_candidate={"instructions": "Answer the question correctly."},
trainset=dataset,
task_lm=claude_cli_chat,
reflection_lm=claude_cli_lm,
max_metric_calls=50,
)
Full Example¶
A minimal end-to-end script that optimizes a Q&A prompt using only claude -p:
import subprocess
import gepa
def claude_cli_lm(prompt: str) -> str:
result = subprocess.run(
["claude", "-p", prompt],
capture_output=True, text=True, timeout=120,
)
if result.returncode != 0:
raise RuntimeError(f"claude -p failed: {result.stderr}")
return result.stdout
def claude_cli_chat(messages: list) -> str:
parts = []
for m in messages:
if m["role"] == "system":
parts.append(f"[System]\n{m['content']}")
elif m["role"] == "user":
parts.append(f"[User]\n{m['content']}")
return claude_cli_lm("\n\n".join(parts))
dataset = [
{"input": "What is 2+2?", "answer": "4", "additional_context": {}},
{"input": "What is the capital of France?", "answer": "Paris", "additional_context": {}},
{"input": "What color is the sky on a clear day?", "answer": "blue", "additional_context": {}},
]
result = gepa.optimize(
seed_candidate={"instructions": "Answer the question correctly."},
trainset=dataset,
task_lm=claude_cli_chat,
reflection_lm=claude_cli_lm,
max_metric_calls=20,
reflection_minibatch_size=1,
display_progress_bar=True,
)
print(f"Best instructions: {result.best_candidate}")
print(f"Best score: {result.val_aggregate_scores[result.best_idx]:.3f}")
Parallelism
Each claude -p call is a separate subprocess, so GEPA's parallel evaluation works out of the box if you use claude_cli_chat as your task LM. The reflection LM calls are sequential by design (one proposal per iteration).
Why Claude Code as a proposer?
Unlike a plain API call, Claude Code has access to your full task context, documentation, custom skills, and any connected MCP servers. When used as the reflection LM, it can draw on all of that context to produce richer, more informed proposals — not just generic text improvements.