Skip to content

EvaluationBatch

gepa.core.adapter.EvaluationBatch(outputs: list[RolloutOutput], scores: list[float], trajectories: list[Trajectory] | None = None, objective_scores: list[dict[str, float]] | None = None) dataclass

Bases: Generic[Trajectory, RolloutOutput]

Container for the result of evaluating a proposed candidate on a batch of data.

  • outputs: raw per-example outputs from upon executing the candidate. GEPA does not interpret these; they are forwarded to other parts of the user's code or logging as-is.
  • scores: per-example numeric scores (floats). GEPA sums these for minibatch acceptance and averages them over the full validation set for tracking/pareto fronts.
  • trajectories: optional per-example traces used by make_reflective_dataset to build a reflective dataset (See GEPAAdapter.make_reflective_dataset). If capture_traces=True is passed to evaluate, trajectories should be provided and align one-to-one with outputs and scores.
  • objective_scores: optional per-example maps of objective name -> score. Leave None when the evaluator does not expose multi-objective metrics.

Attributes

outputs: list[RolloutOutput] instance-attribute

scores: list[float] instance-attribute

trajectories: list[Trajectory] | None = None class-attribute instance-attribute

objective_scores: list[dict[str, float]] | None = None class-attribute instance-attribute

Functions