Skip to main content

Overview

Essential practices for building production applications with Trainly, covering security, performance, testing, and operational excellence.
Following these practices will help you build robust, scalable, and cost-effective applications.

Security

API Key Management

Never expose API keys in client-side code or commit them to version control!
// ✅ GOOD - Server-side only
// app/api/query/route.ts
const trainly = new TrainlyClient({
  apiKey: process.env.TRAINLY_API_KEY!, // Never in client code
  chatId: process.env.TRAINLY_CHAT_ID!,
});

export async function POST(request: Request) {
  const { question } = await request.json();
  return trainly.query({ question });
}
// ❌ BAD - Exposed in browser
const trainly = new TrainlyClient({
  apiKey: "tk_your_key", // Visible to users!
  chatId: "chat_123",
});
Key Points:
  • Store keys in environment variables or secret managers (AWS Secrets, Vault)
  • Never log API keys
  • Rotate keys regularly
  • Use OAuth (V1 auth) for multi-tenant applications

Input Validation

Always validate and sanitize user input before querying.
import { z } from "zod";

const querySchema = z.object({
  question: z.string().min(1).max(1000),
  scopeFilters: z.record(z.string()).optional(),
});

export async function POST(request: Request) {
  const body = await request.json();

  // Validate input
  const validated = querySchema.parse(body);

  // Query with validated input
  const response = await trainly.query(validated);
  return Response.json(response);
}

Rate Limiting

Implement rate limiting to prevent abuse.
import rateLimit from "express-rate-limit";

const limiter = rateLimit({
  windowMs: 15 * 60 * 1000, // 15 minutes
  max: 100, // 100 requests per window
  message: "Too many requests, please try again later",
});

app.use("/api/query", limiter);

Performance

Use Streaming

Stream responses for better user experience.
// ✅ GOOD - Streaming
const stream = await trainly.queryStream({ question });

for await (const chunk of stream) {
  if (chunk.type === "content") {
    // Send to client immediately
    controller.enqueue(encoder.encode(chunk.data));
  }
}

Enable Reranking

Reranking significantly improves relevance.
// Configure once
await trainly.config.updateReranking({
  enabled: true,
  model: "cohere", // or 'cross-encoder'
  top_n: 5,
});

// All queries now use reranking
const response = await trainly.query({ question });

Cache Common Queries

Cache frequently asked questions.
import { LRUCache } from "lru-cache";

const cache = new LRUCache<string, QueryResponse>({
  max: 100,
  ttl: 1000 * 60 * 5, // 5 minutes
});

async function cachedQuery(question: string) {
  const cached = cache.get(question);
  if (cached) return cached;

  const response = await trainly.query({ question });
  cache.set(question, response);
  return response;
}

Optimize Chunking

Tune chunking for your content type.
// For technical docs
await trainly.config.updateChunking({
  strategy: "auto",
  chunk_size: 1000,
  chunk_overlap: 200,
});

// For code
await trainly.config.updateChunking({
  strategy: "fixed",
  chunk_size: 500,
  chunk_overlap: 50,
});

Testing

Create Comprehensive Test Suites

Cover all test categories for robust validation.
const suite = await trainly.testing.createSuite(
  "Production Tests",
  "Critical functionality validation",
  ["production", "golden"]
);

// Generate diverse test cases
await trainly.testing.generateTestCases(5); // 5 per category

// Categories: golden, contradiction, multi-hop, numerical,
// negation, distractor, adversarial, paraphrase

Run Tests in CI/CD

Automate testing before deployments.
// scripts/test.ts
async function validateDeployment() {
  const suite = await trainly.testing.listSuites();
  const run = await trainly.testing.runSuite(suite.suites[0].suite_id);
  const results = await trainly.testing.waitForRun(run.run_id);

  if (results.pass_rate < 90) {
    throw new Error(`Test pass rate ${results.pass_rate}% below threshold`);
  }

  console.log("✅ All tests passed!");
}

Set Pass Rate Thresholds

Enforce quality standards with deployment gating.
// Require 90% pass rate for production deployment
await trainly.config.setMinimumReliabilityScore(0.9);
await trainly.config.enableDeploymentGating(true);

// Publish will fail if tests don't meet threshold
try {
  await trainly.versions.publish("v1.0.0", "Production release");
} catch (error) {
  console.error("Deployment blocked:", error.message);
}

Fine-Tuning

Collect Quality Training Data

Use multiple sources for diverse preference pairs.
// 1. Generate from test failures
const failedRun = await trainly.testing.runSuite(suiteId);
const pairs = await trainly.fineTuning.generatePairsFromTestRun(
  failedRun.run_id,
  true, // only_failed
  true // synthesize_good
);

// 2. Manual high-quality pairs
await trainly.fineTuning.createPreferencePair({
  input_messages: [{ role: "user", content: "Explain X" }],
  preferred_output: "Detailed, accurate explanation...",
  non_preferred_output: "Brief, incomplete explanation",
  source: "expert_review",
});

// 3. User feedback
// Collect thumbs up/down from users

Validate Before Training

Ensure minimum dataset quality.
const pairs = await trainly.fineTuning.listPairs({ status: "approved" });

if (pairs.total < 100) {
  console.warn("Need at least 100 approved pairs for good results");
  return;
}

// Review pair quality manually or with LLM judge
const lowQualityPairs = pairs.pairs.filter(
  (p) => p.preferred_output.length < 50
);

if (lowQualityPairs.length > 10) {
  console.warn("Many low-quality pairs detected");
}

// Start training
const job = await trainly.fineTuning.startTrainingJob({
  base_model: "gpt-4o-mini",
  min_approved_pairs: 100,
});

A/B Test Fine-Tuned Models

Compare performance before full rollout.
// Keep base model for comparison
const baseModel = await trainly.config.getSettings();

// Activate fine-tuned model
await trainly.fineTuning.activateModel("ft-model-123");

// Run A/B test
const testResults = await trainly.testing.runSuite(suiteId);

if (testResults.pass_rate < baselinePassRate) {
  // Rollback if worse
  await trainly.fineTuning.deactivateFineTunedModel();
  console.log("Fine-tuned model underperformed, rolled back");
} else {
  // Publish if better
  await trainly.versions.publish(
    "ft-v1",
    `Fine-tuned model improved pass rate to ${testResults.pass_rate}%`
  );
}

Analytics & Monitoring

Track Key Metrics

Monitor performance and cost continuously.
// Real-time dashboard
async function getMetrics() {
  const [metrics, perf, cost] = await Promise.all([
    trainly.analytics.getMetricsSummary(),
    trainly.analytics.getPerformanceStats(),
    trainly.analytics.getCostBreakdown(),
  ]);

  return {
    queries: metrics.total_queries,
    avgLatency: perf.p50_latency_ms,
    p95Latency: perf.p95_latency_ms,
    successRate: metrics.success_rate,
    totalCost: cost.total,
    costPerQuery: cost.total / metrics.total_queries,
  };
}

Set Up Alerts

Alert on performance degradation or cost spikes.
async function checkHealthMetrics() {
  const metrics = await getMetrics();

  if (metrics.p95Latency > 3000) {
    sendAlert("High latency detected", metrics.p95Latency);
  }

  if (metrics.successRate < 95) {
    sendAlert("Low success rate", metrics.successRate);
  }

  if (metrics.costPerQuery > 0.05) {
    sendAlert("High cost per query", metrics.costPerQuery);
  }
}

// Run every 5 minutes
setInterval(checkHealthMetrics, 5 * 60 * 1000);

Analyze Query Traces

Debug slow queries with detailed traces.
const traces = await trainly.analytics.getQueryTraces({
  limit: 100,
  status: "completed",
});

// Find slow queries
const slowQueries = traces.filter((t) => t.duration_ms > 5000);

// Analyze bottlenecks
for (const trace of slowQueries) {
  const details = await trainly.analytics.getTraceDetails(trace.trace_id);

  console.log(`Query: ${trace.question}`);
  console.log(`Retrieval: ${details.timing.retrieval_ms}ms`);
  console.log(`Reranking: ${details.timing.reranking_ms}ms`);
  console.log(`LLM: ${details.timing.llm_ms}ms`);

  // Optimize based on bottleneck
  if (details.timing.retrieval_ms > 2000) {
    // Consider indexing improvements
  }
  if (details.timing.llm_ms > 3000) {
    // Consider faster model or reduce max_tokens
  }
}

Version Management

Version All Changes

Track configuration history.
// Before making changes
const hasChanges = await trainly.versions.getUnpublishedChanges();

if (hasChanges) {
  console.warn("Unpublished changes detected!");
}

// Make configuration changes
await trainly.config.updateSettings({ model: "gpt-4o" });
await trainly.config.updateReranking({ enabled: true });

// Publish as a version
await trainly.versions.publish(
  "v2.1.0",
  "Upgraded to GPT-4o with reranking enabled"
);

Test Before Publishing

Validate changes in staging first.
// 1. Make changes in staging chat
const stagingClient = new TrainlyClient({
  apiKey: process.env.TRAINLY_API_KEY!,
  chatId: process.env.TRAINLY_STAGING_CHAT_ID!,
});

await stagingClient.config.updateSettings({ model: "gpt-4o" });

// 2. Run tests
const run = await stagingClient.testing.runSuite(suiteId);
const results = await stagingClient.testing.waitForRun(run.run_id);

// 3. If tests pass, apply to production
if (results.pass_rate >= 95) {
  const prodClient = new TrainlyClient({
    apiKey: process.env.TRAINLY_API_KEY!,
    chatId: process.env.TRAINLY_PROD_CHAT_ID!,
  });

  await prodClient.config.updateSettings({ model: "gpt-4o" });
  await prodClient.versions.publish("v2.1.0", "Upgraded to GPT-4o");
}

Keep Rollback Ready

Maintain ability to quickly revert changes.
// Get current version before changes
const currentVersion = await trainly.versions.getActiveVersion();
console.log("Current version:", currentVersion?.version);

// Make changes and publish
await trainly.config.updateSettings({ temperature: 0.9 });
await trainly.versions.publish("v2.2.0", "Increased temperature");

// Monitor metrics
await new Promise((resolve) => setTimeout(resolve, 60000)); // Wait 1 min

const metrics = await trainly.analytics.getMetricsSummary();

if (metrics.success_rate < 95) {
  // Rollback if problems detected
  if (currentVersion) {
    await trainly.versions.rollback(currentVersion.version_id);
    console.log("Rolled back due to low success rate");
  }
}

Error Handling

Implement Retry Logic

Handle transient failures gracefully.
import {
  TrainlyError,
  RateLimitError,
  AuthenticationError,
} from "@trainly/react";

async function queryWithRetry(
  question: string,
  maxAttempts: number = 3
): Promise<QueryResponse> {
  for (let attempt = 1; attempt <= maxAttempts; attempt++) {
    try {
      return await trainly.query({ question });
    } catch (error) {
      if (error instanceof RateLimitError) {
        await new Promise((resolve) =>
          setTimeout(resolve, error.retryAfter * 1000)
        );
        continue;
      }

      if (error instanceof AuthenticationError) {
        // Refresh token and retry
        await refreshToken();
        continue;
      }

      if (attempt === maxAttempts) throw error;

      // Exponential backoff for other errors
      await new Promise((resolve) =>
        setTimeout(resolve, Math.pow(2, attempt) * 1000)
      );
    }
  }

  throw new Error("Max retries exceeded");
}

Graceful Degradation

Provide fallback responses when service is unavailable.
async function safeQuery(question: string): Promise<string> {
  try {
    const response = await trainly.query({ question });
    return response.answer;
  } catch (error) {
    console.error("Trainly query failed:", error);

    // Return helpful fallback
    return "I'm currently experiencing technical difficulties. Please try again in a moment or contact support.";
  }
}

Cost Optimization

Choose the Right Model

Select model based on task complexity and budget.
// For simple queries - cheap and fast
await trainly.config.updateSettings({
  model: "gpt-4o-mini",
  temperature: 0.7,
  max_tokens: 500,
});

// For complex analysis - more capable
await trainly.config.updateSettings({
  model: "gpt-4o",
  temperature: 0.3,
  max_tokens: 1500,
});

// Monitor cost impact
const cost = await trainly.analytics.getCostBreakdown();
console.log("Cost per query:", cost.total / metrics.total_queries);

Optimize Token Usage

Reduce costs by minimizing tokens.
// Set appropriate max_tokens
const response = await trainly.query({
  question,
  maxTokens: 300, // Shorter responses = lower cost
});

// Use custom prompts to guide conciseness
await trainly.config.updateSettings({
  custom_prompt: "Provide concise, direct answers in 2-3 sentences.",
});

// Reduce chunk overlap
await trainly.config.updateChunking({
  chunk_size: 800,
  chunk_overlap: 100, // Lower overlap = less context = lower cost
});

Implement Query Deduplication

Avoid duplicate queries.
const recentQueries = new Set<string>();

async function deduplicatedQuery(question: string) {
  const normalized = question.toLowerCase().trim();

  if (recentQueries.has(normalized)) {
    console.log("Duplicate query detected, using cache");
    return cache.get(normalized);
  }

  recentQueries.add(normalized);

  const response = await trainly.query({ question });
  cache.set(normalized, response);

  // Clean up old queries after 5 minutes
  setTimeout(() => recentQueries.delete(normalized), 5 * 60 * 1000);

  return response;
}

Deployment

Health Checks

Implement health endpoints for load balancers.
app.get("/health", async (req, res) => {
  try {
    // Test Trainly connectivity
    await trainly.config.getSettings();

    res.json({
      status: "healthy",
      trainly: "connected",
      timestamp: new Date().toISOString(),
    });
  } catch (error) {
    res.status(503).json({
      status: "unhealthy",
      trainly: "disconnected",
      error: error.message,
    });
  }
});

Environment-Specific Configuration

Use different configurations for dev/staging/prod.
const config = {
  development: {
    apiKey: process.env.TRAINLY_DEV_API_KEY!,
    chatId: process.env.TRAINLY_DEV_CHAT_ID!,
  },
  staging: {
    apiKey: process.env.TRAINLY_STAGING_API_KEY!,
    chatId: process.env.TRAINLY_STAGING_CHAT_ID!,
  },
  production: {
    apiKey: process.env.TRAINLY_API_KEY!,
    chatId: process.env.TRAINLY_CHAT_ID!,
  },
};

const trainly = new TrainlyClient(config[process.env.NODE_ENV || "development"]);

Quick Checklist

Before Production:
  • API keys in environment variables (never in code)
  • Server-side SDK usage only (not in browser)
  • Input validation on all user inputs
  • Rate limiting configured
  • Error handling with retries
  • Test suite with 90%+ pass rate
  • Reranking enabled for better relevance
  • Caching for common queries
  • Analytics monitoring set up
  • Health check endpoint implemented
  • Version management configured
Operational:
  • Monitor P95 latency (<3 seconds target)
  • Track success rate (>95% target)
  • Monitor cost per query
  • Run test suites before deployments
  • Review analytics weekly
  • Version all configuration changes
  • Test staging before production
  • Keep rollback ready
Optimization:
  • Fine-tune model with 100+ pairs
  • A/B test before full rollout
  • Optimize chunking for content type
  • Cache frequently asked questions
  • Choose appropriate model for task
  • Set reasonable max_tokens limits
  • Review slow queries monthly

Next Steps