Skip to main content

Installation

pip install trainly

Quick Start

from trainly import TrainlyClient

client = TrainlyClient(
    api_key="tk_...",
    project_id="proj_..."
)
You can also set credentials via environment variables:
export TRAINLY_API_KEY="tk_..."
export TRAINLY_PROJECT_ID="proj_..."
client = TrainlyClient()  # reads from env

@observe Decorator

The @observe decorator is the primary way to instrument your AI calls. It automatically captures inputs, outputs, latency, and exceptions.
@client.observe(model="gpt-4o", tags=["production"])
def summarize(text: str) -> str:
    return openai_client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": text}]
    ).choices[0].message.content

Parameters

ParameterTypeDescription
modelstrModel name (e.g. "gpt-4o", "claude-sonnet-4-20250514")
tagslist[str]Filterable labels
expected_outputstrGround truth for evaluation
trace_idstrCustom trace identifier
metadatadictArbitrary key-value pairs
versionstrCode or prompt version tag
custom_attributesdictAdditional structured data
span_namestrOverride the default span name
capture_exceptionsboolLog exceptions as trace errors (default True)
session_idstrGroup traces into a session

With All Options

@client.observe(
    model="gpt-4o",
    tags=["staging", "summarization"],
    version="v2.1",
    metadata={"team": "ml-ops"},
    capture_exceptions=True,
    session_id="session_abc123"
)
def generate_summary(article: str) -> str:
    return call_llm(article)

Manual Logging

Use client.observe.log() when you need full control over what gets recorded.
import time

start = time.time()
result = call_my_model(prompt)
latency = (time.time() - start) * 1000

client.observe.log(
    input=prompt,
    output=result,
    model="gpt-4o",
    latency_ms=latency,
    tags=["batch-job"],
    token_usage={"prompt_tokens": 120, "completion_tokens": 45},
    status="success",
    cost=0.0023,
    metadata={"run_id": "abc"},
)

Parameters

ParameterTypeDescription
inputstrThe input sent to the model
outputstrThe model’s response
modelstrModel identifier
latency_msfloatExecution time in milliseconds
tagslist[str]Filterable labels
expected_outputstrGround truth for evaluation
trace_idstrCustom trace identifier
token_usagedictToken counts (prompt_tokens, completion_tokens)
metadatadictArbitrary key-value pairs
statusstr"success" or "error"
errorstrError message if applicable
versionstrVersion tag
custom_attributesdictAdditional structured data
tool_callslist[dict]Tool/function calls made during execution
input_structureddictStructured input (e.g. message arrays)
output_structureddictStructured output (e.g. parsed JSON)
spanslist[dict]Sub-step span data
costfloatEstimated cost in USD
session_idstrSession grouping identifier

Sessions

Group related traces into a session using the agent_session() context manager. This is useful for multi-step agent workflows.
with client.observe.agent_session() as session:
    plan = planner(task)       # trace 1
    result = executor(plan)    # trace 2
    review = reviewer(result)  # trace 3
    # All three traces share the same session_id

Spans

Break a single trace into sub-steps with the span() context manager.
@client.observe(model="gpt-4o")
def agent_pipeline(query: str) -> str:
    with client.observe.span("retrieval", kind="retriever") as span:
        span.set_input(query)
        docs = search_index(query)
        span.set_output(docs)
        span.set_attribute("num_results", len(docs))

    with client.observe.span("generation", kind="llm") as span:
        span.set_input(docs)
        span.set_model("gpt-4o")
        answer = generate(docs, query)
        span.set_output(answer)
        span.set_token_usage({"prompt_tokens": 200, "completion_tokens": 80})
        span.set_cost(0.004)

    return answer

Span Methods

MethodDescription
set_input(value)Record the span’s input
set_output(value)Record the span’s output
set_attribute(key, value)Attach a custom attribute
set_model(name)Set the model used in this span
set_token_usage(usage)Record token counts
set_cost(amount)Record cost in USD

Scoring

Score traces for evaluation, either manually or with an AI judge.
client.score(
    trace_id="trace_abc123",
    name="accuracy",
    value=0.95,
    comment="Output matched expected format"
)

Prompt Management

Manage versioned prompts and build them with template variables.
prompt = client.get_prompt(slug="summarize-v2", version="latest")

rendered = prompt.build(
    topic="quarterly earnings",
    tone="professional",
    max_length="200 words"
)

Analytics

Access trace analytics and cost data programmatically.
analytics = client.analytics

# Get traces with filters
traces = analytics.get_query_traces(
    tags=["production"],
    model="gpt-4o",
    limit=50
)

# Get details for a specific trace
detail = analytics.get_trace_details(trace_id="trace_abc123")

# Aggregate metrics
summary = analytics.get_metrics_summary(
    start_date="2026-04-01",
    end_date="2026-04-07"
)

# Cost and performance breakdowns
costs = analytics.get_cost_breakdown(group_by="model")
perf = analytics.get_performance_stats(tags=["production"])

# Export logs as CSV/JSON
analytics.export_query_logs(format="csv", output_path="traces.csv")

Testing

Create test suites, add cases, and run evaluations against your AI functions.
testing = client.testing

suite = testing.create_suite(name="Summarization Tests")

testing.add_test_case(
    suite_id=suite.id,
    input="Explain gravity in one sentence.",
    expected_output="Gravity is the force that attracts objects toward each other."
)

run = testing.run_suite(suite_id=suite.id)

results = testing.get_run_results(run_id=run.id)
for case in results.cases:
    print(f"{case.status}: {case.score}")

Versions

Publish, track, and roll back versioned deployments of your AI pipelines.
versions = client.versions

versions.publish(
    version="v2.1.0",
    metadata={"changelog": "Improved summarization prompt"}
)

all_versions = versions.list_versions()
active = versions.get_active_version()

diff = versions.compare_versions("v2.0.0", "v2.1.0")

versions.rollback(version="v2.0.0")

Error Handling

All SDK errors raise TrainlyError with structured context.
from trainly import TrainlyClient, TrainlyError

client = TrainlyClient()

try:
    client.observe.log(input="test", output="result", model="gpt-4o")
except TrainlyError as e:
    print(f"Status: {e.status_code}")
    print(f"Details: {e.details}")