Get Started

Automatically optimize prompts for any AI system

Frontier Performance, up to 90x cheaper | 35x faster than RL

Trusted by
teams at

Shopify OpenAI Databricks Pydantic Dropbox Comet ML Weaviate MLFlow Uber Meta AWS Cerebras Standard Metrics The Browser Company Dria Prime Intellect NuBank Infosys Invitae Bespoke AI Labs Shopify OpenAI Databricks Pydantic Dropbox Comet ML Weaviate MLFlow Uber Meta AWS Cerebras Standard Metrics The Browser Company Dria Prime Intellect NuBank Infosys Invitae Bespoke AI Labs

What People Are Saying¶

Tobi Lutke

CEO, Shopify

Both DSPy and (especially) GEPA are currently severely under hyped in the AI context engineering world

View on X

Ivan Zhou

Research Engineer, Databricks Mosaic

GEPA can push open models beyond frontier performance; gpt-oss-120b + GEPA beats Claude Opus 4.1 while being 90x cheaper

View on X

Drew Houston

CEO, Dropbox

Have heard great things about DSPy plus GEPA, which is an even stronger prompt optimizer than miprov2 — repo and (fascinating) examples of generated prompts

View on X

🛍️

Chad Boyda

CTO, AppSumo

DSPy's GEPA is prompt engineering! The only kind we should all collectively be doing. What a work of art

View on X

🧠

OpenAI

Official Cookbook

Self-evolving agents that autonomously retrain themselves using GEPA to improve performance over time.

Read Cookbook

Josh Clemm

VP of Engineering, Dropbox Dash

With DSPy [GEPA], you just plug the model in, define your goals, and out spits the prompt that works. So you can do this model switching far more rapidly.

Get Started¶

pip install gepa

Basic ExampleWith DSPyCustom System

import gepa

# Load your dataset
trainset, valset, _ = gepa.examples.aime.init_dataset()

# Define your initial prompt
seed_prompt = {"system_prompt": "You are a helpful assistant..."}

# Run optimization
result = gepa.optimize(
    seed_candidate=seed_prompt,
    trainset=trainset,
    valset=valset,
    task_lm="openai/gpt-4.1-mini",
    max_metric_calls=150,
    reflection_lm="openai/gpt-5",
)

print(result.best_candidate['system_prompt'])

Result: +10% improvement (46.6% → 56.6%) on AIME 2025 with GPT-4.1 Mini

import dspy

class RAG(dspy.Module):
    def __init__(self):
        self.retrieve = dspy.Retrieve(k=3)
        self.generate = dspy.ChainOfThought("context, question -> answer")

    def forward(self, question):
        context = self.retrieve(question).passages
        return self.generate(context=context, question=question)

# Optimize with GEPA
gepa = dspy.GEPA(
    metric=your_metric,
    max_metric_calls=150,
    reflection_lm="openai/gpt-5"
)
optimized_rag = gepa.compile(student=RAG(), trainset=trainset, valset=valset)

GEPA is built into DSPy! See DSPy tutorials for more.

from gepa import optimize
from gepa.core.adapter import EvaluationBatch

class MySystemAdapter:
    def evaluate(self, batch, candidate, capture_traces=False):
        outputs, scores, trajectories = [], [], []
        for example in batch:
            prompt = candidate['my_prompt']
            result = my_system.run(prompt, example)
            score = compute_score(result, example)
            outputs.append(result)
            scores.append(score)
            if capture_traces:
                trajectories.append({
                    'input': example,
                    'output': result.output,
                    'steps': result.intermediate_steps,
                    'errors': result.errors
                })
        return EvaluationBatch(
            outputs=outputs,
            scores=scores,
            trajectories=trajectories if capture_traces else None
        )

    def make_reflective_dataset(self, candidate, eval_batch, components_to_update):
        reflective_data = {}
        for component in components_to_update:
            reflective_data[component] = []
            for traj, score in zip(eval_batch.trajectories, eval_batch.scores):
                reflective_data[component].append({
                    'Inputs': traj['input'],
                    'Generated Outputs': traj['output'],
                    'Feedback': f"Score: {score}. Errors: {traj['errors']}"
                })
        return reflective_data

result = optimize(
    seed_candidate={'my_prompt': 'Initial prompt...'},
    trainset=my_trainset,
    valset=my_valset,
    adapter=MySystemAdapter(),
    task_lm="openai/gpt-4.1-mini",
)

Results¶

90x Cost Reduction Open-source models beat Claude Opus 4.1 at Databricks 35x More Efficient vs. Reinforcement Learning methods

+10% On AIME 2025 46.6% → 56.6% with GPT-4.1 Mini

50+ Production Use Cases Across diverse industries

How It Works¶

1

Select from
Pareto Front

Pick candidate excelling on some examples

→

2

Run on
Minibatch

Execute & capture full traces

→

3

Reflect with
LLM

Diagnose failures in natural language

→

4

Mutate
Prompt

Accumulate lessons from ancestors and new rollouts

→

5

Accept if
Improved

Add to pool & update Pareto front

Repeat until convergence

Uses natural language reasoning instead of gradients to diagnose failures and improve prompts. Each mutation inherits lessons from all ancestors.
Based on research from UC Berkeley, Stanford, MIT & Databricks.
Read the paper →

When to Use GEPA¶

Enterprise & Production

90x cost reduction at Databricks
Self-evolving agents in OpenAI Cookbook
Core algorithm in Comet ML Opik

AI Coding Agents

Production incident diagnosis
Data analysis agents (FireBird)
Code safety monitoring

Domain-Specific

Healthcare multi-agent RAG systems
38% OCR error reduction
Market research AI personas

Research & Advanced

Multi-objective optimization
Agent architecture discovery
Adversarial prompt search

View all 50+ use cases →

Why Choose GEPA?¶

Feature	GEPA	Reinforcement Learning	Manual Prompting
Cost	Low	Very High	Low
Sample Efficiency	High (150 calls)	Low (10K+ calls)	N/A
Performance	SOTA	SOTA	Suboptimal
Interpretability	Natural Language	Black Box	Clear
Setup Time	Minutes	Days/Weeks	Minutes
Framework Support	Any System	Framework Specific	Any System
Multi-Objective	Native	Complex	Manual

Community & Resources¶

Quickstart

Get up and running in minutes

Join 1,000+ developers

GitHub

Star, contribute, report issues

Twitter/X

Updates and highlights

Built something with GEPA? We'd love to feature your work.

Submit via Email Submit via GitHub

Citation¶

If you use GEPA in your research, please cite our paper:

@misc{agrawal2025gepareflectivepromptevolution,
      title={GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning},
      author={Lakshya A Agrawal and Shangyin Tan and Dilara Soylu and Noah Ziems and
              Rishi Khare and Krista Opsahl-Ong and Arnav Singhvi and Herumb Shandilya and
              Michael J Ryan and Meng Jiang and Christopher Potts and Koushik Sen and
              Alexandros G. Dimakis and Ion Stoica and Dan Klein and Matei Zaharia and Omar Khattab},
      year={2025},
      eprint={2507.19457},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2507.19457}
}