Skip to content

Creating Adapters

GEPA can optimize any system consisting of text components by implementing the GEPAAdapter protocol. This guide explains how to create custom adapters.

The GEPAAdapter Protocol

Every adapter must implement two methods:

from gepa.core.adapter import GEPAAdapter, EvaluationBatch

class MyAdapter(GEPAAdapter[DataInst, Trajectory, RolloutOutput]):
    def evaluate(
        self,
        batch: list[DataInst],
        candidate: dict[str, str],
        capture_traces: bool = False,
    ) -> EvaluationBatch[Trajectory, RolloutOutput]:
        """Execute the system and return scores."""
        ...

    def make_reflective_dataset(
        self,
        candidate: dict[str, str],
        eval_batch: EvaluationBatch[Trajectory, RolloutOutput],
        components_to_update: list[str],
    ) -> dict[str, list[dict]]:
        """Build dataset for reflection."""
        ...

Step-by-Step Guide

Step 1: Define Your Types

First, define the types your adapter will use:

from dataclasses import dataclass
from typing import Any

# Your input data type
@dataclass
class TaskInput:
    question: str
    context: str
    expected_answer: str

# Trajectory captures execution details
@dataclass  
class ExecutionTrace:
    prompt_used: str
    model_response: str
    intermediate_steps: list[str]

# Output from your system
@dataclass
class TaskOutput:
    answer: str
    confidence: float

Step 2: Implement evaluate

The evaluate method runs your system on a batch of inputs:

from gepa.core.adapter import EvaluationBatch

class MyAdapter:
    def __init__(self, model_name: str):
        self.model_name = model_name

    def evaluate(
        self,
        batch: list[TaskInput],
        candidate: dict[str, str],
        capture_traces: bool = False,
    ) -> EvaluationBatch[ExecutionTrace, TaskOutput]:
        outputs = []
        scores = []
        trajectories = [] if capture_traces else None

        for task in batch:
            # Build prompt using candidate components
            prompt = candidate["system_prompt"] + "\n" + task.question

            # Run your system
            response = self._call_model(prompt)

            # Parse output
            output = TaskOutput(answer=response, confidence=0.9)
            outputs.append(output)

            # Compute score (higher is better)
            score = 1.0 if output.answer == task.expected_answer else 0.0
            scores.append(score)

            # Capture trace if requested
            if capture_traces:
                trace = ExecutionTrace(
                    prompt_used=prompt,
                    model_response=response,
                    intermediate_steps=[],
                )
                trajectories.append(trace)

        return EvaluationBatch(
            outputs=outputs,
            scores=scores,
            trajectories=trajectories,
        )

Step 3: Implement make_reflective_dataset

This method creates data for the reflection LLM to propose improvements:

def make_reflective_dataset(
    self,
    candidate: dict[str, str],
    eval_batch: EvaluationBatch[ExecutionTrace, TaskOutput],
    components_to_update: list[str],
) -> dict[str, list[dict]]:
    """Build a reflective dataset for each component."""

    dataset = {}

    for component_name in components_to_update:
        component_data = []

        for i, trace in enumerate(eval_batch.trajectories):
            record = {
                "Inputs": {
                    "prompt": trace.prompt_used,
                },
                "Generated Outputs": {
                    "response": trace.model_response,
                },
                "Feedback": self._generate_feedback(
                    trace, 
                    eval_batch.outputs[i],
                    eval_batch.scores[i],
                ),
            }
            component_data.append(record)

        dataset[component_name] = component_data

    return dataset

def _generate_feedback(self, trace, output, score):
    """Generate helpful feedback for the reflection LLM."""
    if score == 1.0:
        return "Correct! The answer matched the expected output."
    else:
        return f"Incorrect. The model answered '{output.answer}' but this was wrong."

Best Practices

1. Rich Feedback

The more informative your feedback, the better GEPA can optimize:

def _generate_feedback(self, trace, output, expected, score):
    feedback_parts = []

    # Include the score
    feedback_parts.append(f"Score: {score}")

    # Explain what went wrong
    if score < 1.0:
        feedback_parts.append(f"Expected: {expected}")
        feedback_parts.append(f"Got: {output.answer}")

        # Add specific error analysis
        if len(output.answer) > 100:
            feedback_parts.append("Issue: Response too verbose")
        if expected.lower() not in output.answer.lower():
            feedback_parts.append("Issue: Key information missing")

    return "\n".join(feedback_parts)

2. Error Handling

Handle failures gracefully:

def evaluate(self, batch, candidate, capture_traces=False):
    outputs, scores, trajectories = [], [], []

    for task in batch:
        try:
            output = self._run_task(task, candidate)
            score = self._compute_score(output, task)
        except Exception as e:
            # Return a failed result rather than raising
            output = TaskOutput(answer="ERROR", confidence=0.0)
            score = 0.0
            if capture_traces:
                trajectories.append(ExecutionTrace(
                    error=str(e),
                    # ... capture what you can
                ))

        outputs.append(output)
        scores.append(score)

    return EvaluationBatch(outputs=outputs, scores=scores, trajectories=trajectories)

3. Multi-Objective Optimization

Support multiple objectives:

def evaluate(self, batch, candidate, capture_traces=False):
    # ... evaluation logic ...

    objective_scores = []
    for output in outputs:
        objective_scores.append({
            "accuracy": 1.0 if output.correct else 0.0,
            "latency": 1.0 / (1.0 + output.latency),  # Lower is better, inverted
            "cost": 1.0 / (1.0 + output.token_count),
        })

    return EvaluationBatch(
        outputs=outputs,
        scores=scores,
        trajectories=trajectories,
        objective_scores=objective_scores,  # Multi-objective support
    )

Example: Complete Adapter

Here's a complete example adapter:

from dataclasses import dataclass
from typing import Any
import litellm
from gepa.core.adapter import GEPAAdapter, EvaluationBatch

@dataclass
class QAInput:
    question: str
    answer: str

@dataclass
class QATrace:
    prompt: str
    response: str

@dataclass
class QAOutput:
    answer: str

class SimpleQAAdapter(GEPAAdapter[QAInput, QATrace, QAOutput]):
    def __init__(self, model: str = "openai/gpt-4o-mini"):
        self.model = model

    def evaluate(
        self,
        batch: list[QAInput],
        candidate: dict[str, str],
        capture_traces: bool = False,
    ) -> EvaluationBatch[QATrace, QAOutput]:
        outputs, scores = [], []
        trajectories = [] if capture_traces else None

        for item in batch:
            # Build prompt
            prompt = f"{candidate['system_prompt']}\n\nQuestion: {item.question}"

            # Call model
            response = litellm.completion(
                model=self.model,
                messages=[{"role": "user", "content": prompt}],
            )
            answer = response.choices[0].message.content

            # Score
            output = QAOutput(answer=answer)
            score = 1.0 if item.answer.lower() in answer.lower() else 0.0

            outputs.append(output)
            scores.append(score)

            if capture_traces:
                trajectories.append(QATrace(prompt=prompt, response=answer))

        return EvaluationBatch(
            outputs=outputs,
            scores=scores,
            trajectories=trajectories,
        )

    def make_reflective_dataset(
        self,
        candidate: dict[str, str],
        eval_batch: EvaluationBatch[QATrace, QAOutput],
        components_to_update: list[str],
    ) -> dict[str, list[dict]]:
        dataset = {"system_prompt": []}

        for i, trace in enumerate(eval_batch.trajectories or []):
            dataset["system_prompt"].append({
                "Inputs": {"question": trace.prompt.split("Question: ")[-1]},
                "Generated Outputs": {"answer": trace.response},
                "Feedback": f"Score: {eval_batch.scores[i]}"
            })

        return dataset

# Usage
adapter = SimpleQAAdapter(model="openai/gpt-4o-mini")
result = gepa.optimize(
    seed_candidate={"system_prompt": "Answer questions accurately."},
    trainset=trainset,
    adapter=adapter,
    reflection_lm="openai/gpt-4o",
    max_metric_calls=50,
)

Built-in Adapters

GEPA provides several ready-to-use adapters for common use cases:

Adapter Description Use Case
DefaultAdapter Simple adapter for prompt optimization with any LLM General prompt tuning, Q&A systems
DSPy Adapter Optimizes DSPy program instructions and prompts DSPy module optimization
DSPy Full Program Adapter Evolves entire DSPy programs including structure Full program evolution, architecture search
RAG Adapter Optimizes RAG pipeline components Retrieval-augmented generation systems
MCP Adapter Optimizes MCP tool descriptions and system prompts Tool-using agents, MCP servers
TerminalBench Adapter Optimizes agents for terminal-based tasks CLI agents, shell automation

When to Use Each Adapter

  • DefaultAdapter: Start here for simple prompt optimization tasks. Works with any LLM via litellm.

  • DSPy Adapter: Use when you have a DSPy program and want to optimize the instructions for individual predictors while keeping the program structure fixed.

  • DSPy Full Program Adapter: Use when you want GEPA to evolve the entire DSPy program, including its structure and module composition.

  • RAG Adapter: Use for optimizing retrieval-augmented generation systems. Supports multiple vector stores (ChromaDB, Weaviate, Qdrant, Milvus, etc.) and optimizes query reformulation, context synthesis, and answer generation prompts.

  • MCP Adapter: Use for optimizing Model Context Protocol tool usage. Supports both local (stdio) and remote (SSE/StreamableHTTP) MCP servers.

  • TerminalBench Adapter: Use for optimizing agents that interact with terminal/shell environments.

Next Steps

  • See the API Reference for complete GEPAAdapter protocol documentation
  • Explore the built-in adapters above for your specific use case
  • Read the DefaultAdapter source for a reference implementation