Overview
The Chat API allows you to query your Trainly knowledge base using natural language. It supports multiple AI models, streaming responses, and custom prompts with citation-backed answers.All responses are powered by semantic search over your documents with GPT-4,
Claude, or other AI models.
Authentication
Chat API endpoints require a valid API key from your chat settings:Answer Question
Standard Response
Query your knowledge base with a question and receive an AI-generated answer with citations:POST /v1/{chat_id}/answer_question
The ID of your chat/knowledge base
The question to ask about your documents
AI model to use. Options:
gpt-4o-mini (default), gpt-4o, gpt-4,
claude-3-sonnet, claude-3-opusSampling temperature (0.0 to 2.0). Higher values make output more random.
Maximum tokens in the response
Optional custom system prompt to override default behavior
Filter results by custom scopes (e.g.,
{"playlist_id": "xyz"})The AI-generated answer with inline citations (e.g., [^0], [^1])
Array of source chunks used to generate the answer, ordered by relevance
The chat ID that was queried
The AI model used for generation
Token usage statistics for the request
Streaming Response
Get real-time streaming responses for a better user experience:POST /v1/{chat_id}/answer_question_stream
Chunk of the AI response text
Source chunks used (sent at the beginning)
Signals the end of the stream
Error occurred during streaming
Chat Information
Get Chat Info
Retrieve metadata about a chat:GET /v1/{chat_id}/info
Health Check
API Health Status
Check if the API is operational:GET /v1/health
Internal Endpoints
These endpoints are used internally by the Trainly dashboard and require additional authentication:Create Nodes and Embeddings
POST /create_nodes_and_embeddings
Standard Query (Internal)
POST /answer_question
Streaming Query (Internal)
POST /answer_question_stream
Model Selection
Trainly supports multiple AI models with different capabilities and pricing:OpenAI Models
OpenAI Models
gpt-4o-mini (Recommended)
- Fastest and most cost-effective
- Great for most use cases
- 1x credit multiplier (baseline)
- Advanced reasoning capabilities
- Better for complex analysis
- 15x credit multiplier
- High performance
- Extended context window
- 12x credit multiplier
- Most powerful OpenAI model
- Best quality responses
- 18x credit multiplier
Anthropic Claude Models
Anthropic Claude Models
claude-3-haiku
- Fast and efficient
- Similar to gpt-4o-mini
- 1x credit multiplier
- Balanced performance
- Good for analysis tasks
- 8x credit multiplier
- Latest Sonnet version
- Improved capabilities
- 10x credit multiplier
- Most powerful Claude model
- Best for complex reasoning
- 20x credit multiplier
Google Gemini Models
Google Gemini Models
gemini-pro
- Good general purpose model
- 3x credit multiplier
- Extended context window
- 4x credit multiplier
Advanced Features
Custom Prompts
Override the default system prompt to customize AI behavior:Scope Filtering
Filter results to specific subsets of your data:Conversation History
The API automatically uses published chat settings including conversation history limits:- History Limit: Configurable per chat (default: 20 messages)
- Context Preservation: Recent messages are included for continuity
- Token Management: Automatically managed to stay within model limits
Published Settings
Important: External API calls only work with published chat settings.
You must publish your chat configuration before API access works.
- AI Model: The model selection for all API queries
- Temperature: Response randomness setting
- Max Tokens: Maximum response length
- Custom Prompt: System instructions for the AI
- Conversation History Limit: How many previous messages to include
- Context Files: Which documents are accessible
Publishing Settings
- Go to your chat settings
- Configure your preferred model, prompt, and other options
- Click “Publish Settings”
- Enable API access
- Generate an API key
Citation Format
Trainly uses inline citations in markdown format:context array indices in the response.
Best Practices
Use Streaming
Streaming provides better UX for long responses
Set Appropriate Limits
Use max_tokens to control response length and costs
Handle Rate Limits
Implement exponential backoff for rate limit errors
Cache Responses
Cache common queries to reduce API calls
Error Handling Example
Error Codes
Invalid request parameters or unpublished settings
Invalid or missing API key
API access not enabled for this chat
Chat not found
Request body exceeds size limit
Rate limit exceeded
Server error - contact support if persistent