EvaluationBatch¶
gepa.core.adapter.EvaluationBatch(outputs: list[RolloutOutput], scores: list[float], trajectories: list[Trajectory] | None = None, objective_scores: list[dict[str, float]] | None = None)
dataclass
¶
Bases: Generic[Trajectory, RolloutOutput]
Container for the result of evaluating a proposed candidate on a batch of data.
- outputs: raw per-example outputs from upon executing the candidate. GEPA does not interpret these; they are forwarded to other parts of the user's code or logging as-is.
- scores: per-example numeric scores (floats). GEPA sums these for minibatch acceptance and averages them over the full validation set for tracking/pareto fronts.
- trajectories: optional per-example traces used by make_reflective_dataset to build
a reflective dataset (See
GEPAAdapter.make_reflective_dataset). If capture_traces=True is passed toevaluate, trajectories should be provided and align one-to-one withoutputsandscores. - objective_scores: optional per-example maps of objective name -> score. Leave None when the evaluator does not expose multi-objective metrics.