Skip to content

Cost & Token Tracking

GEPA tracks cumulative cost and token usage on the reflection LM automatically. No configuration needed — every call through the LM accumulates metrics that you can read at any time.


Reading cost and tokens

After optimization, read directly from the LM instance:

import gepa
from gepa.lm import LM

reflection_lm = LM("openai/gpt-4.1")

result = gepa.optimize(
    seed_candidate={"instructions": "..."},
    trainset=trainset,
    reflection_lm=reflection_lm,
    max_metric_calls=500,
)

print(f"Total cost: ${reflection_lm.total_cost:.4f}")
print(f"Tokens in:  {reflection_lm.total_tokens_in:,}")
print(f"Tokens out: {reflection_lm.total_tokens_out:,}")

The same works with optimize_anything:

import gepa.optimize_anything as oa
from gepa.lm import LM

lm = LM("openai/gpt-5.1")

result = oa.optimize_anything(
    seed_candidate="...",
    evaluator=my_evaluator,
    config=oa.GEPAConfig(
        reflection=oa.ReflectionConfig(reflection_lm=lm),
    ),
)

print(f"Total cost: ${lm.total_cost:.4f}")

What gets tracked

LM type total_cost total_tokens_in total_tokens_out
String model name (e.g. "openai/gpt-4.1") Exact via litellm.completion_cost Exact from API usage Exact from API usage
LM instance passed directly Exact Exact Exact
Plain callable (e.g. lambda p: ...) 0.0 Estimated (~4 chars/token) Estimated (~4 chars/token)

When you pass a string model name, GEPA creates an LM instance internally. When you pass a plain callable, GEPA wraps it in a TrackingLM that estimates tokens from string length. In both cases, the tracking attributes are available on the object GEPA actually uses.

Keep a reference to the LM

If you pass a string like reflection_lm="openai/gpt-4.1", GEPA creates the LM internally and you won't have a reference to read cost from. To access cost after optimization, create the LM yourself and pass the instance.


Stopping on cost budget

Use max_reflection_cost to stop optimization when the reflection LM's cumulative cost reaches a USD budget:

result = gepa.optimize(
    ...,
    reflection_lm="openai/gpt-4.1",
    max_reflection_cost=5.00,  # stop after $5 of reflection calls
)

Or in optimize_anything:

config = oa.GEPAConfig(
    engine=oa.EngineConfig(
        max_reflection_cost=10.00,
    ),
)

This reads lm.total_cost at each iteration. For plain callables (which report cost as 0.0), the stopper never triggers — it only stops on costs the LM actually reports.

max_reflection_cost can be combined with max_metric_calls — optimization stops when either budget is reached first.