EvaluationBatch¶

`gepa.core.adapter.EvaluationBatch(outputs: list[RolloutOutput], scores: list[float], trajectories: list[Trajectory] | None = None, objective_scores: list[dict[str, float]] | None = None)` `dataclass` ¶

Bases: Generic[Trajectory, RolloutOutput]

Container for the result of evaluating a proposed candidate on a batch of data.

outputs: raw per-example outputs from upon executing the candidate. GEPA does not interpret these; they are forwarded to other parts of the user's code or logging as-is.
scores: per-example numeric scores (floats). GEPA sums these for minibatch acceptance and averages them over the full validation set for tracking/pareto fronts.
trajectories: optional per-example traces used by make_reflective_dataset to build a reflective dataset (See GEPAAdapter.make_reflective_dataset). If capture_traces=True is passed to evaluate, trajectories should be provided and align one-to-one with outputs and scores.
objective_scores: optional per-example maps of objective name -> score. Leave None when the evaluator does not expose multi-objective metrics.