optimize¶
gepa.api.optimize(seed_candidate: dict[str, str], trainset: list[DataInst] | DataLoader[DataId, DataInst], valset: list[DataInst] | DataLoader[DataId, DataInst] | None = None, adapter: GEPAAdapter[DataInst, Trajectory, RolloutOutput] | None = None, task_lm: str | ChatCompletionCallable | None = None, evaluator: Evaluator | None = None, reflection_lm: LanguageModel | str | None = None, candidate_selection_strategy: CandidateSelector | Literal['pareto', 'current_best', 'epsilon_greedy'] = 'pareto', frontier_type: FrontierType = 'instance', skip_perfect_score: bool = True, batch_sampler: BatchSampler | Literal['epoch_shuffled'] = 'epoch_shuffled', reflection_minibatch_size: int | None = None, perfect_score: float = 1.0, reflection_prompt_template: str | None = None, module_selector: ReflectionComponentSelector | str = 'round_robin', use_merge: bool = False, max_merge_invocations: int = 5, merge_val_overlap_floor: int = 5, max_metric_calls: int | None = None, stop_callbacks: StopperProtocol | Sequence[StopperProtocol] | None = None, logger: LoggerProtocol | None = None, run_dir: str | None = None, callbacks: list[GEPACallback] | None = None, use_wandb: bool = False, wandb_api_key: str | None = None, wandb_init_kwargs: dict[str, Any] | None = None, use_mlflow: bool = False, mlflow_tracking_uri: str | None = None, mlflow_experiment_name: str | None = None, track_best_outputs: bool = False, display_progress_bar: bool = False, use_cloudpickle: bool = False, cache_evaluation: bool = False, seed: int = 0, raise_on_exception: bool = True, val_evaluation_policy: EvaluationPolicy[DataId, DataInst] | Literal['full_eval'] | None = None) -> GEPAResult[RolloutOutput, DataId]
¶
GEPA is an evolutionary optimizer that evolves (multiple) text components of a complex system to optimize them towards a given metric. GEPA can also leverage rich textual feedback obtained from the system's execution environment, evaluation, and the system's own execution traces to iteratively improve the system's performance.
Concepts:
- System: A harness that uses text components to perform a task. Each text component of the system to be optimized is a named component of the system.
- Candidate: A mapping from component names to component text. A concrete instantiation of the system is realized by setting the text of each system component
to the text provided by the candidate mapping.
- DataInst: An (uninterpreted) data type over which the system operates.
- RolloutOutput: The output of the system on a DataInst.
Each execution of the system produces a RolloutOutput, which can be evaluated to produce a score. The execution of the system also produces a trajectory,
which consists of the operations performed by different components of the system, including the text of the components that were executed.
GEPA can be applied to optimize any system that uses text components (e.g., prompts in a AI system, code snippets/code files/functions/classes in a codebase, etc.).
In order for GEPA to plug into your system's environment, GEPA requires an adapter, GEPAAdapter to be implemented. The adapter is responsible for:
1. Evaluating a proposed candidate on a batch of inputs.
- The adapter receives a candidate proposed by GEPA, along with a batch of inputs selected from the training/validation set.
- The adapter instantiates the system with the texts proposed in the candidate.
- The adapter then evaluates the candidate on the batch of inputs, and returns the scores.
- The adapter should also capture relevant information from the execution of the candidate, like system and evaluation traces.
2. Identifying textual information relevant to a component of the candidate
- Given the trajectories captured during the execution of the candidate, GEPA selects a component of the candidate to update.
- The adapter receives the candidate, the batch of inputs, and the trajectories captured during the execution of the candidate.
- The adapter is responsible for identifying the textual information relevant to the component to update.
- This information is used by GEPA to reflect on the performnace of the component, and propose new component texts.
At each iteration, GEPA proposes a new candidate using one of the following strategies: 1. Reflective mutation: GEPA proposes a new candidate by mutating the current candidate, leveraging rich textual feedback. 2. Merge: GEPA proposes a new candidate by merging 2 candidates that are on the Pareto frontier.
GEPA also tracks the Pareto frontier of performance achieved by different candidates on the validation set. This way, it can leverage candidates that work well on a subset of inputs to improve the system's performance on the entire validation set, by evolving from the Pareto frontier.
Parameters:
- seed_candidate: The initial candidate to start with.
- trainset: Training data supplied as an in-memory sequence or a DataLoader yielding batches for reflective updates.
- valset: Validation data source (sequence or DataLoader) used for tracking Pareto scores. If not provided, GEPA reuses the trainset.
- adapter: A GEPAAdapter instance that implements the adapter interface. This allows GEPA to plug into your system's environment. If not provided, GEPA will use a default adapter: gepa.adapters.default_adapter.default_adapter.DefaultAdapter, with model defined by task_lm.
- task_lm: Optional. The model to use for the task. This is only used if adapter is not provided, and is used to initialize the default adapter.
- evaluator: Optional. A custom evaluator to use for evaluating the candidate program. If not provided, GEPA will use the default evaluator: gepa.adapters.default_adapter.default_adapter.ContainsAnswerEvaluator. Only used if adapter is not provided.
Reflection-based configuration¶
- reflection_lm: A
LanguageModelinstance that is used to reflect on the performance of the candidate program. - candidate_selection_strategy: The strategy to use for selecting the candidate to update. Supported strategies: 'pareto', 'current_best', 'epsilon_greedy'. Defaults to 'pareto'.
- frontier_type: Strategy for tracking Pareto frontiers. 'instance' tracks per validation example, 'objective' tracks per objective metric, 'hybrid' combines both, 'cartesian' tracks per (example, objective) pair. Defaults to 'instance'.
- skip_perfect_score: Whether to skip updating the candidate if it achieves a perfect score on the minibatch.
- batch_sampler: Strategy for selecting training examples. Can be a BatchSampler instance or a string for a predefined strategy from ['epoch_shuffled']. Defaults to 'epoch_shuffled', which creates an EpochShuffledBatchSampler.
- reflection_minibatch_size: The number of examples to use for reflection in each proposal step. Defaults to 3. Only valid when batch_sampler='epoch_shuffled' (default), and is ignored otherwise.
- perfect_score: The perfect score to achieve.
- reflection_prompt_template: The prompt template to use for reflection. If not provided, GEPA will use the default prompt template (see InstructionProposalSignature). The prompt template must contain the following placeholders, which will be replaced with actual values:
<curr_instructions>(will be replaced by the instructions to evolve) and<inputs_outputs_feedback>(replaced with the inputs, outputs, and feedback generated with current instruction). This will be ignored if the adapter provides its ownpropose_new_textsmethod.
Component selection configuration¶
- module_selector: Component selection strategy. Can be a ReflectionComponentSelector instance or a string ('round_robin', 'all'). Defaults to 'round_robin'. The 'round_robin' strategy cycles through components in order. The 'all' strategy selects all components for modification in every GEPA iteration.
Merge-based configuration¶
- use_merge: Whether to use the merge strategy.
- max_merge_invocations: The maximum number of merge invocations to perform.
- merge_val_overlap_floor: Minimum number of shared validation ids required between parents before attempting a merge subsample. Only relevant when using
val_evaluation_policyother thanfull_eval.
Budget and Stop Condition¶
- max_metric_calls: Optional maximum number of metric calls to perform. If not provided, stop_callbacks must be provided.
- stop_callbacks: Optional stopper(s) that return True when optimization should stop. Can be a single StopperProtocol or a list or tuple of StopperProtocol instances. Examples: FileStopper, TimeoutStopCondition, SignalStopper, NoImprovementStopper, or custom stopping logic. If not provided, max_metric_calls must be provided.
Logging and Callbacks¶
- logger: A
LoggerProtocolinstance that is used to log the progress of the optimization. - callbacks: Optional list of callback objects for observing optimization progress. Callbacks receive events like on_optimization_start, on_iteration_start, on_candidate_accepted, etc. See
gepa.core.callbacks.GEPACallbackfor the full protocol. - run_dir: The directory to save the results to. Optimization state and results will be saved to this directory. If the directory already exists, GEPA will read the state from this directory and resume the optimization from the last saved state. If provided, a FileStopper is automatically created which checks for the presence of "gepa.stop" in this directory, allowing graceful stopping of the optimization process upon its presence.
- use_wandb: Whether to use Weights and Biases to log the progress of the optimization.
- wandb_api_key: The API key to use for Weights and Biases.
- wandb_init_kwargs: Additional keyword arguments to pass to the Weights and Biases initialization.
- use_mlflow: Whether to use MLflow to log the progress of the optimization. Both wandb and mlflow can be used simultaneously if desired.
- mlflow_tracking_uri: The tracking URI to use for MLflow.
- mlflow_experiment_name: The experiment name to use for MLflow.
- track_best_outputs: Whether to track the best outputs on the validation set. If True, GEPAResult will contain the best outputs obtained for each task in the validation set.
- display_progress_bar: Show a tqdm progress bar over metric calls when enabled.
- use_cloudpickle: Use cloudpickle instead of pickle. This can be helpful when the serialized state contains dynamically generated DSPy signatures.
Evaluation caching¶
- cache_evaluation: Whether to cache the (score, output, objective_scores) of (candidate, example) pairs. If True and a cache entry exists, GEPA will skip the fitness evaluation and use the cached results. This helps avoid redundant evaluations and saves metric calls. Defaults to False.
Reproducibility¶
- seed: The seed to use for the random number generator.
- val_evaluation_policy: Strategy controlling which validation ids to score each iteration and which candidate is currently best. Supported strings: "full_eval" (evaluate every id each time) Passing None defaults to "full_eval".
- raise_on_exception: Whether to propagate proposer/evaluator exceptions instead of stopping gracefully.
Source code in gepa/api.py
41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 | |