Get Started
Automatically optimize prompts for any AI system
Frontier Performance, up to 90x cheaper | 35x faster than RL
Used by Shopify, Databricks and Dropbox
Trusted by
teams at
teams at
Shopify
OpenAI
Databricks
Pydantic
Dropbox
Comet ML
Weaviate
MLFlow
Uber
Meta
AWS
Cerebras
Standard Metrics
The Browser Company
Dria
Prime Intellect
NuBank
Infosys
Invitae
Bespoke AI Labs
Shopify
OpenAI
Databricks
Pydantic
Dropbox
Comet ML
Weaviate
MLFlow
Uber
Meta
AWS
Cerebras
Standard Metrics
The Browser Company
Dria
Prime Intellect
NuBank
Infosys
Invitae
Bespoke AI Labs
What People Are Saying¶
CEO, Shopify
Both DSPy and (especially) GEPA are currently severely under hyped in the AI context engineering world
View on X
Research Engineer, Databricks Mosaic
GEPA can push open models beyond frontier performance; gpt-oss-120b + GEPA beats Claude Opus 4.1 while being 90x cheaper
View on X
CEO, Dropbox
Have heard great things about DSPy plus GEPA, which is an even stronger prompt optimizer than miprov2 — repo and (fascinating) examples of generated prompts
View on X
CTO, AppSumo
DSPy's GEPA is prompt engineering! The only kind we should all collectively be doing. What a work of art
View on X
Official Cookbook
Self-evolving agents that autonomously retrain themselves using GEPA to improve performance over time.
Read Cookbook
VP of Engineering, Dropbox Dash
With DSPy [GEPA], you just plug the model in, define your goals, and out spits the prompt that works. So you can do this model switching far more rapidly.
Read More
Get Started¶
import gepa
# Load your dataset
trainset, valset, _ = gepa.examples.aime.init_dataset()
# Define your initial prompt
seed_prompt = {"system_prompt": "You are a helpful assistant..."}
# Run optimization
result = gepa.optimize(
seed_candidate=seed_prompt,
trainset=trainset,
valset=valset,
task_lm="openai/gpt-4.1-mini",
max_metric_calls=150,
reflection_lm="openai/gpt-5",
)
print(result.best_candidate['system_prompt'])
Result: +10% improvement (46.6% → 56.6%) on AIME 2025 with GPT-4.1 Mini
import dspy
class RAG(dspy.Module):
def __init__(self):
self.retrieve = dspy.Retrieve(k=3)
self.generate = dspy.ChainOfThought("context, question -> answer")
def forward(self, question):
context = self.retrieve(question).passages
return self.generate(context=context, question=question)
# Optimize with GEPA
gepa = dspy.GEPA(
metric=your_metric,
max_metric_calls=150,
reflection_lm="openai/gpt-5"
)
optimized_rag = gepa.compile(student=RAG(), trainset=trainset, valset=valset)
GEPA is built into DSPy! See DSPy tutorials for more.
from gepa import optimize
from gepa.core.adapter import EvaluationBatch
class MySystemAdapter:
def evaluate(self, batch, candidate, capture_traces=False):
outputs, scores, trajectories = [], [], []
for example in batch:
prompt = candidate['my_prompt']
result = my_system.run(prompt, example)
score = compute_score(result, example)
outputs.append(result)
scores.append(score)
if capture_traces:
trajectories.append({
'input': example,
'output': result.output,
'steps': result.intermediate_steps,
'errors': result.errors
})
return EvaluationBatch(
outputs=outputs,
scores=scores,
trajectories=trajectories if capture_traces else None
)
def make_reflective_dataset(self, candidate, eval_batch, components_to_update):
reflective_data = {}
for component in components_to_update:
reflective_data[component] = []
for traj, score in zip(eval_batch.trajectories, eval_batch.scores):
reflective_data[component].append({
'Inputs': traj['input'],
'Generated Outputs': traj['output'],
'Feedback': f"Score: {score}. Errors: {traj['errors']}"
})
return reflective_data
result = optimize(
seed_candidate={'my_prompt': 'Initial prompt...'},
trainset=my_trainset,
valset=my_valset,
adapter=MySystemAdapter(),
task_lm="openai/gpt-4.1-mini",
)
Results¶
90x
Cost Reduction
Open-source models beat Claude Opus 4.1 at Databricks
35x
More Efficient
vs. Reinforcement Learning methods
+10%
On AIME 2025
46.6% → 56.6% with GPT-4.1 Mini
50+
Production Use Cases
Across diverse industries
How It Works¶
1
Select from
Pareto Front
Pareto Front
Pick candidate excelling on some examples
→
2
Run on
Minibatch
Minibatch
Execute & capture full traces
→
3
Reflect with
LLM
LLM
Diagnose failures in natural language
→
4
Mutate
Prompt
Prompt
Accumulate lessons from ancestors and new rollouts
→
5
Accept if
Improved
Improved
Add to pool & update Pareto front
Repeat until convergence
Uses natural language reasoning instead of gradients to diagnose failures and improve prompts. Each mutation inherits lessons from all ancestors.
Based on research from UC Berkeley, Stanford, MIT & Databricks.
Read the paper →
Based on research from UC Berkeley, Stanford, MIT & Databricks.
Read the paper →
When to Use GEPA¶
Enterprise & Production
- 90x cost reduction at Databricks
- Self-evolving agents in OpenAI Cookbook
- Core algorithm in Comet ML Opik
AI Coding Agents
- Production incident diagnosis
- Data analysis agents (FireBird)
- Code safety monitoring
Domain-Specific
- Healthcare multi-agent RAG systems
- 38% OCR error reduction
- Market research AI personas
Research & Advanced
- Multi-objective optimization
- Agent architecture discovery
- Adversarial prompt search
Why Choose GEPA?¶
| Feature | GEPA | Reinforcement Learning | Manual Prompting |
|---|---|---|---|
| Cost | Low | Very High | Low |
| Sample Efficiency | High (150 calls) | Low (10K+ calls) | N/A |
| Performance | SOTA | SOTA | Suboptimal |
| Interpretability | Natural Language | Black Box | Clear |
| Setup Time | Minutes | Days/Weeks | Minutes |
| Framework Support | Any System | Framework Specific | Any System |
| Multi-Objective | Native | Complex | Manual |
Community & Resources¶
Quickstart
Get up and running in minutes
Tutorials
Step-by-step guides
API Reference
Full documentation
Discord
Join 1,000+ developers
GitHub
Star, contribute, report issues
Twitter/X
Updates and highlights
Built something with GEPA? We'd love to feature your work.
Citation¶
If you use GEPA in your research, please cite our paper:
@misc{agrawal2025gepareflectivepromptevolution,
title={GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning},
author={Lakshya A Agrawal and Shangyin Tan and Dilara Soylu and Noah Ziems and
Rishi Khare and Krista Opsahl-Ong and Arnav Singhvi and Herumb Shandilya and
Michael J Ryan and Meng Jiang and Christopher Potts and Koushik Sen and
Alexandros G. Dimakis and Ion Stoica and Dan Klein and Matei Zaharia and Omar Khattab},
year={2025},
eprint={2507.19457},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2507.19457}
}