Evaluator¶
gepa.optimize_anything.Evaluator
¶
Bases: Protocol
Functions¶
__call__(candidate: str | Candidate, example: object | None = None, **kwargs: Any) -> tuple[float, SideInfo] | float
¶
Score a candidate, returning a score and diagnostic side information (ASI).
This is the function you write. GEPA calls it repeatedly with mutated candidates and collects the returned scores and diagnostics to drive the optimization loop.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
candidate
|
str | Candidate
|
The text parameter(s) to evaluate. Type matches
|
required |
example
|
object | None
|
One item from |
None
|
opt_state
|
(optional) Declare |
required |
Returns:
| Type | Description |
|---|---|
tuple[float, SideInfo] | float
|
Either |
tuple[float, SideInfo] | float
|
|
tuple[float, SideInfo] | float
|
|
tuple[float, SideInfo] | float
|
If you return score-only, use |
tuple[float, SideInfo] | float
|
|
Evaluator signature per mode:
Single-Task Search (dataset=None) — called without example::
import gepa.optimize_anything as oa
def evaluate(candidate):
result = run_code(candidate["code"])
oa.log(f"Output: {result}") # ASI via oa.log()
return compute_score(result)
Multi-Task Search / Generalization (dataset provided) — called per example::
def evaluate(candidate, example):
pred = run(candidate["prompt"], example["input"])
score = 1.0 if pred == example["expected"] else 0.0
return score, {"Input": example["input"], "Output": pred}
Reserved side_info keys
"log", "stdout", "stderr" are used by GEPA's
automatic capture. If you use them, GEPA stores captured
output under "_gepa_log" etc. to avoid collisions.