Overview
Essential practices for building production applications with Trainly, covering security, performance, testing, and operational excellence.
Following these practices will help you build robust, scalable, and cost-effective applications.
Security
API Key Management
Never expose API keys in client-side code or commit them to version control!
// ✅ GOOD - Server-side only
// app/api/query/route.ts
const trainly = new TrainlyClient({
apiKey: process.env.TRAINLY_API_KEY!, // Never in client code
chatId: process.env.TRAINLY_CHAT_ID!,
});
export async function POST(request: Request) {
const { question } = await request.json();
return trainly.query({ question });
}
// ❌ BAD - Exposed in browser
const trainly = new TrainlyClient({
apiKey: "tk_your_key", // Visible to users!
chatId: "chat_123",
});
Key Points:
- Store keys in environment variables or secret managers (AWS Secrets, Vault)
- Never log API keys
- Rotate keys regularly
- Use OAuth (V1 auth) for multi-tenant applications
Always validate and sanitize user input before querying.
import { z } from "zod";
const querySchema = z.object({
question: z.string().min(1).max(1000),
scopeFilters: z.record(z.string()).optional(),
});
export async function POST(request: Request) {
const body = await request.json();
// Validate input
const validated = querySchema.parse(body);
// Query with validated input
const response = await trainly.query(validated);
return Response.json(response);
}
Rate Limiting
Implement rate limiting to prevent abuse.
import rateLimit from "express-rate-limit";
const limiter = rateLimit({
windowMs: 15 * 60 * 1000, // 15 minutes
max: 100, // 100 requests per window
message: "Too many requests, please try again later",
});
app.use("/api/query", limiter);
Use Streaming
Stream responses for better user experience.
// ✅ GOOD - Streaming
const stream = await trainly.queryStream({ question });
for await (const chunk of stream) {
if (chunk.type === "content") {
// Send to client immediately
controller.enqueue(encoder.encode(chunk.data));
}
}
Enable Reranking
Reranking significantly improves relevance.
// Configure once
await trainly.config.updateReranking({
enabled: true,
model: "cohere", // or 'cross-encoder'
top_n: 5,
});
// All queries now use reranking
const response = await trainly.query({ question });
Cache Common Queries
Cache frequently asked questions.
import { LRUCache } from "lru-cache";
const cache = new LRUCache<string, QueryResponse>({
max: 100,
ttl: 1000 * 60 * 5, // 5 minutes
});
async function cachedQuery(question: string) {
const cached = cache.get(question);
if (cached) return cached;
const response = await trainly.query({ question });
cache.set(question, response);
return response;
}
Optimize Chunking
Tune chunking for your content type.
// For technical docs
await trainly.config.updateChunking({
strategy: "auto",
chunk_size: 1000,
chunk_overlap: 200,
});
// For code
await trainly.config.updateChunking({
strategy: "fixed",
chunk_size: 500,
chunk_overlap: 50,
});
Testing
Create Comprehensive Test Suites
Cover all test categories for robust validation.
const suite = await trainly.testing.createSuite(
"Production Tests",
"Critical functionality validation",
["production", "golden"]
);
// Generate diverse test cases
await trainly.testing.generateTestCases(5); // 5 per category
// Categories: golden, contradiction, multi-hop, numerical,
// negation, distractor, adversarial, paraphrase
Run Tests in CI/CD
Automate testing before deployments.
// scripts/test.ts
async function validateDeployment() {
const suite = await trainly.testing.listSuites();
const run = await trainly.testing.runSuite(suite.suites[0].suite_id);
const results = await trainly.testing.waitForRun(run.run_id);
if (results.pass_rate < 90) {
throw new Error(`Test pass rate ${results.pass_rate}% below threshold`);
}
console.log("✅ All tests passed!");
}
Set Pass Rate Thresholds
Enforce quality standards with deployment gating.
// Require 90% pass rate for production deployment
await trainly.config.setMinimumReliabilityScore(0.9);
await trainly.config.enableDeploymentGating(true);
// Publish will fail if tests don't meet threshold
try {
await trainly.versions.publish("v1.0.0", "Production release");
} catch (error) {
console.error("Deployment blocked:", error.message);
}
Fine-Tuning
Collect Quality Training Data
Use multiple sources for diverse preference pairs.
// 1. Generate from test failures
const failedRun = await trainly.testing.runSuite(suiteId);
const pairs = await trainly.fineTuning.generatePairsFromTestRun(
failedRun.run_id,
true, // only_failed
true // synthesize_good
);
// 2. Manual high-quality pairs
await trainly.fineTuning.createPreferencePair({
input_messages: [{ role: "user", content: "Explain X" }],
preferred_output: "Detailed, accurate explanation...",
non_preferred_output: "Brief, incomplete explanation",
source: "expert_review",
});
// 3. User feedback
// Collect thumbs up/down from users
Validate Before Training
Ensure minimum dataset quality.
const pairs = await trainly.fineTuning.listPairs({ status: "approved" });
if (pairs.total < 100) {
console.warn("Need at least 100 approved pairs for good results");
return;
}
// Review pair quality manually or with LLM judge
const lowQualityPairs = pairs.pairs.filter(
(p) => p.preferred_output.length < 50
);
if (lowQualityPairs.length > 10) {
console.warn("Many low-quality pairs detected");
}
// Start training
const job = await trainly.fineTuning.startTrainingJob({
base_model: "gpt-4o-mini",
min_approved_pairs: 100,
});
A/B Test Fine-Tuned Models
Compare performance before full rollout.
// Keep base model for comparison
const baseModel = await trainly.config.getSettings();
// Activate fine-tuned model
await trainly.fineTuning.activateModel("ft-model-123");
// Run A/B test
const testResults = await trainly.testing.runSuite(suiteId);
if (testResults.pass_rate < baselinePassRate) {
// Rollback if worse
await trainly.fineTuning.deactivateFineTunedModel();
console.log("Fine-tuned model underperformed, rolled back");
} else {
// Publish if better
await trainly.versions.publish(
"ft-v1",
`Fine-tuned model improved pass rate to ${testResults.pass_rate}%`
);
}
Analytics & Monitoring
Track Key Metrics
Monitor performance and cost continuously.
// Real-time dashboard
async function getMetrics() {
const [metrics, perf, cost] = await Promise.all([
trainly.analytics.getMetricsSummary(),
trainly.analytics.getPerformanceStats(),
trainly.analytics.getCostBreakdown(),
]);
return {
queries: metrics.total_queries,
avgLatency: perf.p50_latency_ms,
p95Latency: perf.p95_latency_ms,
successRate: metrics.success_rate,
totalCost: cost.total,
costPerQuery: cost.total / metrics.total_queries,
};
}
Set Up Alerts
Alert on performance degradation or cost spikes.
async function checkHealthMetrics() {
const metrics = await getMetrics();
if (metrics.p95Latency > 3000) {
sendAlert("High latency detected", metrics.p95Latency);
}
if (metrics.successRate < 95) {
sendAlert("Low success rate", metrics.successRate);
}
if (metrics.costPerQuery > 0.05) {
sendAlert("High cost per query", metrics.costPerQuery);
}
}
// Run every 5 minutes
setInterval(checkHealthMetrics, 5 * 60 * 1000);
Analyze Query Traces
Debug slow queries with detailed traces.
const traces = await trainly.analytics.getQueryTraces({
limit: 100,
status: "completed",
});
// Find slow queries
const slowQueries = traces.filter((t) => t.duration_ms > 5000);
// Analyze bottlenecks
for (const trace of slowQueries) {
const details = await trainly.analytics.getTraceDetails(trace.trace_id);
console.log(`Query: ${trace.question}`);
console.log(`Retrieval: ${details.timing.retrieval_ms}ms`);
console.log(`Reranking: ${details.timing.reranking_ms}ms`);
console.log(`LLM: ${details.timing.llm_ms}ms`);
// Optimize based on bottleneck
if (details.timing.retrieval_ms > 2000) {
// Consider indexing improvements
}
if (details.timing.llm_ms > 3000) {
// Consider faster model or reduce max_tokens
}
}
Version Management
Version All Changes
Track configuration history.
// Before making changes
const hasChanges = await trainly.versions.getUnpublishedChanges();
if (hasChanges) {
console.warn("Unpublished changes detected!");
}
// Make configuration changes
await trainly.config.updateSettings({ model: "gpt-4o" });
await trainly.config.updateReranking({ enabled: true });
// Publish as a version
await trainly.versions.publish(
"v2.1.0",
"Upgraded to GPT-4o with reranking enabled"
);
Test Before Publishing
Validate changes in staging first.
// 1. Make changes in staging chat
const stagingClient = new TrainlyClient({
apiKey: process.env.TRAINLY_API_KEY!,
chatId: process.env.TRAINLY_STAGING_CHAT_ID!,
});
await stagingClient.config.updateSettings({ model: "gpt-4o" });
// 2. Run tests
const run = await stagingClient.testing.runSuite(suiteId);
const results = await stagingClient.testing.waitForRun(run.run_id);
// 3. If tests pass, apply to production
if (results.pass_rate >= 95) {
const prodClient = new TrainlyClient({
apiKey: process.env.TRAINLY_API_KEY!,
chatId: process.env.TRAINLY_PROD_CHAT_ID!,
});
await prodClient.config.updateSettings({ model: "gpt-4o" });
await prodClient.versions.publish("v2.1.0", "Upgraded to GPT-4o");
}
Keep Rollback Ready
Maintain ability to quickly revert changes.
// Get current version before changes
const currentVersion = await trainly.versions.getActiveVersion();
console.log("Current version:", currentVersion?.version);
// Make changes and publish
await trainly.config.updateSettings({ temperature: 0.9 });
await trainly.versions.publish("v2.2.0", "Increased temperature");
// Monitor metrics
await new Promise((resolve) => setTimeout(resolve, 60000)); // Wait 1 min
const metrics = await trainly.analytics.getMetricsSummary();
if (metrics.success_rate < 95) {
// Rollback if problems detected
if (currentVersion) {
await trainly.versions.rollback(currentVersion.version_id);
console.log("Rolled back due to low success rate");
}
}
Error Handling
Implement Retry Logic
Handle transient failures gracefully.
import {
TrainlyError,
RateLimitError,
AuthenticationError,
} from "@trainly/react";
async function queryWithRetry(
question: string,
maxAttempts: number = 3
): Promise<QueryResponse> {
for (let attempt = 1; attempt <= maxAttempts; attempt++) {
try {
return await trainly.query({ question });
} catch (error) {
if (error instanceof RateLimitError) {
await new Promise((resolve) =>
setTimeout(resolve, error.retryAfter * 1000)
);
continue;
}
if (error instanceof AuthenticationError) {
// Refresh token and retry
await refreshToken();
continue;
}
if (attempt === maxAttempts) throw error;
// Exponential backoff for other errors
await new Promise((resolve) =>
setTimeout(resolve, Math.pow(2, attempt) * 1000)
);
}
}
throw new Error("Max retries exceeded");
}
Graceful Degradation
Provide fallback responses when service is unavailable.
async function safeQuery(question: string): Promise<string> {
try {
const response = await trainly.query({ question });
return response.answer;
} catch (error) {
console.error("Trainly query failed:", error);
// Return helpful fallback
return "I'm currently experiencing technical difficulties. Please try again in a moment or contact support.";
}
}
Cost Optimization
Choose the Right Model
Select model based on task complexity and budget.
// For simple queries - cheap and fast
await trainly.config.updateSettings({
model: "gpt-4o-mini",
temperature: 0.7,
max_tokens: 500,
});
// For complex analysis - more capable
await trainly.config.updateSettings({
model: "gpt-4o",
temperature: 0.3,
max_tokens: 1500,
});
// Monitor cost impact
const cost = await trainly.analytics.getCostBreakdown();
console.log("Cost per query:", cost.total / metrics.total_queries);
Optimize Token Usage
Reduce costs by minimizing tokens.
// Set appropriate max_tokens
const response = await trainly.query({
question,
maxTokens: 300, // Shorter responses = lower cost
});
// Use custom prompts to guide conciseness
await trainly.config.updateSettings({
custom_prompt: "Provide concise, direct answers in 2-3 sentences.",
});
// Reduce chunk overlap
await trainly.config.updateChunking({
chunk_size: 800,
chunk_overlap: 100, // Lower overlap = less context = lower cost
});
Implement Query Deduplication
Avoid duplicate queries.
const recentQueries = new Set<string>();
async function deduplicatedQuery(question: string) {
const normalized = question.toLowerCase().trim();
if (recentQueries.has(normalized)) {
console.log("Duplicate query detected, using cache");
return cache.get(normalized);
}
recentQueries.add(normalized);
const response = await trainly.query({ question });
cache.set(normalized, response);
// Clean up old queries after 5 minutes
setTimeout(() => recentQueries.delete(normalized), 5 * 60 * 1000);
return response;
}
Deployment
Health Checks
Implement health endpoints for load balancers.
app.get("/health", async (req, res) => {
try {
// Test Trainly connectivity
await trainly.config.getSettings();
res.json({
status: "healthy",
trainly: "connected",
timestamp: new Date().toISOString(),
});
} catch (error) {
res.status(503).json({
status: "unhealthy",
trainly: "disconnected",
error: error.message,
});
}
});
Environment-Specific Configuration
Use different configurations for dev/staging/prod.
const config = {
development: {
apiKey: process.env.TRAINLY_DEV_API_KEY!,
chatId: process.env.TRAINLY_DEV_CHAT_ID!,
},
staging: {
apiKey: process.env.TRAINLY_STAGING_API_KEY!,
chatId: process.env.TRAINLY_STAGING_CHAT_ID!,
},
production: {
apiKey: process.env.TRAINLY_API_KEY!,
chatId: process.env.TRAINLY_CHAT_ID!,
},
};
const trainly = new TrainlyClient(config[process.env.NODE_ENV || "development"]);
Quick Checklist
Before Production:
Operational:
Optimization:
Next Steps