AI / RAG pipeline

GreekManage's AI assistant uses retrieval-augmented generation (RAG) — your question is matched against your org's data, relevant snippets are fed to a language model, and the model's answer is streamed back.

End-to-end flow

Components

1. Embeddings store (`pgvector`)

Model: apps/ai_services/models.py — ContentEmbedding(content_type, content_id, org_id, embedding_vector, ...)

pgvector is a Postgres extension that adds a vector column type and indexed approximate-nearest-neighbor search via ivfflat or hnsw indexes.

-- Sample table structure
CREATE TABLE content_embedding (
  id UUID PRIMARY KEY,
  org_id UUID NOT NULL REFERENCES organization(id),
  content_type VARCHAR(64) NOT NULL,    -- 'forum_post', 'document', 'member', etc.
  content_id UUID NOT NULL,
  embedding VECTOR(1536) NOT NULL,      -- 1536 for OpenAI ada-002, 768 for Google text-embedding-004
  text_excerpt TEXT NOT NULL,
  created_at TIMESTAMP NOT NULL,
  ...
);

CREATE INDEX content_embedding_hnsw_idx
  ON content_embedding USING hnsw (embedding vector_cosine_ops);

Why variable-dim: per-org BYOM (bring your own model) means different orgs may use different providers, and provider models have different embedding dimensions. The vector column handles this with a per-row dimension; the index assumes the most common one.

2. Embedder

File: apps/ai_services/providers/

Multi-provider — picks the embedding model based on the org's AIConfig.embedding_provider:

Anthropic — Voyage embeddings (Voyage-2 for general, voyage-code for code)
OpenAI — text-embedding-3-small or text-embedding-3-large
Google — text-embedding-004 (768 dim) or gemini-embedding-001 (768 dim)

3. EmbeddingJob (Celery)

Model: apps/ai_services/models.py — EmbeddingJob

Long-lived background job that:

Walks indexable content types (documents, forum posts, member profiles, compliance entries…)
Batches them
Calls the embedding provider
Upserts ContentEmbedding rows
Reports progress

Triggered by:

Org-level "rebuild index" button (admin)
Nightly Celery beat schedule (incremental — only new / changed content)
Module-enable event (e.g., enabling documents triggers a one-time index of existing docs)

4. Retrieval

File: apps/ai_services/retrieval.py (typical pattern)

def retrieve(question: str, org_id: UUID, user: User, k: int = 8) -> list[Snippet]:
    # 1. Embed the question
    q_vector = embed(question, org_id)

    # 2. Determine user's access scope
    access_scope = derive_access_scope(user, org_id)

    # 3. Vector search, filtered by org and access
    snippets = (
        ContentEmbedding.objects
        .filter(org_id=org_id)
        .filter(access_scope_filter(access_scope))
        .order_by(L2Distance("embedding", q_vector))
        [:k]
    )

    # 4. Hydrate with full text + source URL
    return [hydrate(s) for s in snippets]

The crucial line is #3 — the access filter. A chapter member's chatbot retrieval should never include another chapter's content.

5. Prompt construction

prompt = f"""
You are GreekManage's assistant for {org.name}. Answer based on the
context below. If the answer isn't in the context, say so. Always cite
sources by their [bracketed numbers].

CONTEXT:
{render_snippets_with_numbers(snippets)}

QUESTION:
{question}
"""

6. Streaming

Channels: Django Channels 4.3 with Redis as the channel layer.

The ChatConsumer opens a WebSocket, runs the retrieval, then streams the LLM response chunk-by-chunk. The frontend renders tokens as they arrive, giving the typing-out effect.

If the user disconnects mid-stream, the consumer cancels the LLM request to save tokens.

BYOM (Bring Your Own Model)

Each org configures AIConfig:

class AIConfig(models.Model):
    organization = models.OneToOneField(Organization, ...)
    chat_provider = models.CharField(choices=PROVIDERS)  # anthropic | openai | google
    chat_model = models.CharField()                      # e.g. "claude-sonnet-4-5"
    chat_api_key = EncryptedTextField()                  # encrypted
    embedding_provider = models.CharField(choices=PROVIDERS)
    embedding_model = models.CharField()
    embedding_api_key = EncryptedTextField()
    monthly_query_cap = models.IntegerField(default=10000)
    is_logging_enabled = models.BooleanField(default=True)

The platform falls back to PlatformAIConfig if an org doesn't configure its own. Platform-managed keys are billed via the customer's subscription tier.

Logging + observability

When is_logging_enabled=True:

Every chat message + response logged to apps/ai_services/models.py ChatTurn (text + tokens used)
Org admins can view + export chat logs for quality review
👍 / 👎 feedback stored on each turn

When disabled, only token counts are stored (for billing) — no message content.

Cost control

Per-org monthly_query_cap — user gets a "limit reached" message when exceeded
LLM token caps per response (default 1024 output)
Aggressive context truncation — if retrieved snippets exceed N tokens, lowest-similarity snippets are dropped
Failed retrievals (zero relevant snippets) skip the LLM entirely and return a "no context found" message

Failure modes + mitigations

Failure	Mitigation
Provider API outage	Failover to next configured provider (Anthropic → OpenAI → Google), or graceful "AI is unavailable" message
Embedding dimension mismatch (BYOM swap)	`EmbeddingJob` reindexes all content with new dimensions before chat works
Hallucination	System prompt forces citation; UI shows sources; thumbs-down feedback flagged for review
Privacy leak	Per-user access filter in retrieval; tested via E2E that User A in Chapter X can't get Chapter Y data
Token cost runaway	Monthly cap + per-response output cap

Privacy guarantees

No cross-org data: retrieval filters by org_id
No cross-chapter for non-admins: retrieval filters by user's chapter scope
PII-stripped where possible: sensitive fields (encrypted columns) never enter embeddings
No training on customer data: API calls are inference-only; opt-out at the provider level when supported
Per-org logging toggle: orgs can disable conversation logging entirely

End-to-end flow​

Components​

1. Embeddings store (pgvector)​

2. Embedder​

3. EmbeddingJob (Celery)​

4. Retrieval​

5. Prompt construction​

6. Streaming​

BYOM (Bring Your Own Model)​

Logging + observability​

Cost control​

Failure modes + mitigations​

Privacy guarantees​

Related​