ai_services app
Multi-provider AI: chatbot, embeddings for RAG, AI-generated reports, conversational onboarding.
Models (8)
AIConfig— per-org provider config (chat + embedding); encrypted API keys; monthly_query_cap; is_logging_enabledPlatformAIConfig— platform-default fallback when an org doesn't configureContentEmbedding— vector index of indexable content;pgvectorcolumn; FK to orgEmbeddingJob— long-running batch indexing job stateReportRequest— AI-generated report (e.g., "summary of last quarter"); status, output URLOnboardingTemplate— conversational onboarding flow definitionOnboardingStep— individual step in a templateOnboardingProgress— per-user state in an onboarding flow
Key endpoints
| URL | Purpose |
|---|---|
GET /api/organizations/<id>/ai/config/ | Org AI config |
PATCH /api/organizations/<id>/ai/config/ | Update provider / model / keys |
GET /api/organizations/<id>/ai/config/providers/ | Available providers |
GET /api/organizations/<id>/ai/config/providers/<provider>/models/ | Models for a provider |
POST /api/ai/embeddings/jobs/ | Trigger reindex |
GET /api/ai/embeddings/jobs/<id>/ | Job status |
POST /api/ai/reports/ | Request a report |
GET /api/ai/reports/<id>/ | Report status + download URL |
WS /ws/ai/chat/ | Chatbot WebSocket connection |
Permissions
IsNationalAdmin— config + report managementIsAuthenticated— chat + onboarding (per-user)
Background tasks
generate_embeddings(content_type, content_id, org_id)— embeds a single content rowreindex_org_content(org_id)— full reindex; partition by content type for parallelismprocess_report_request(report_id)— runs the report job, uploads PDF / XLSX to S3auto_assign_courses_to_users(cross-app helper) — uses AI to suggest course assignments based on profile
External integrations
- Anthropic (Claude — default chat model)
- OpenAI (GPT, text-embedding-3-*)
- Google (Gemini, text-embedding-004)
- Voyage AI (Voyage-2 embeddings, used with Anthropic chat)
Notable patterns
Provider abstraction
apps/ai_services/providers/:
base.py—BaseProviderinterface (chat(messages, stream=True),embed(text),count_tokens(text))anthropic_provider.pyopenai_provider.pygoogle_provider.py
apps/ai_services/services/router.py picks the provider based on org.ai_config.chat_provider.
Variable-dim embeddings
Different providers produce different embedding dimensions (Voyage 1024, OpenAI 1536, Google 768). The ContentEmbedding.embedding column is a pgvector(1536) (most common); rows from other providers either pad / truncate or use a separate column. The HNSW index is built per-dim — see migration files for current state.
RAG pipeline
Chat consumer
apps/ai_services/consumers.py — ChatConsumer(AsyncWebsocketConsumer):
- Authenticate user (resolve from cookie / token)
- Receive question
- Embed question
- Vector search filtered by
org_id+ user access scope - Construct prompt with retrieved snippets
- Stream LLM response chunk-by-chunk back to client
- Persist
ChatTurn(if logging enabled)
If user disconnects: cancel the LLM stream to save tokens.
Cost control
monthly_query_capper org; checked before each chat / report- Token-usage tracking on every call (stored regardless of
is_logging_enabledfor billing) - Output token cap per response (default 1024)
BYOM (Bring Your Own Model)
Org admins can override chat_api_key / embedding_api_key to use their own provider account. The platform's keys are billed via subscription tier.
Code paths
- Models:
backend/apps/ai_services/models.py - Providers:
backend/apps/ai_services/providers/ - Consumers:
backend/apps/ai_services/consumers.py - Retrieval:
backend/apps/ai_services/services/retrieval.py - Tasks:
backend/apps/ai_services/tasks.py