Architecture

The Co-mind.ai Private AI Platform routes requests to multiple AI backends through a unified gateway. This page explains how to choose the right endpoint for your use case.

API Decision Tree

Use this decision tree to determine which endpoint to use:

  +-------------------------------------+
  | Do you need knowledge base context? |
  +-----------------+-------------------+
                    |
          +---------+---------+
          |                   |
         NO                  YES
          |                   |
          v                   v
  +---------------+   +---------------------------------+
  | /v1/chat/     |   | Do you need server-managed      |
  | completions   |   | conversation history?           |
  |               |   +----------------+----------------+
  | Pure OpenAI   |                    |
  +---------------+          +---------+---------+
                             |                   |
                            NO                  YES
                             |                   |
                             v                   v
                 +---------------------+  +--------------+
                 | /v1/knowledgebase/  |  | /v1/chat/    |
                 | chat/completions    |  | sessions     |
                 |                     |  |              |
                 | KB + Stateless      |  | KB + Stateful|
                 +---------------------+  +--------------+

When to Use Each Endpoint

/v1/chat/completions
/v1/knowledgebase/chat/completions
/v1/chat/sessions

Best for: Standard AI chat without document context.

OpenAI-compatible drop-in replacement
Supports streaming, vision, and tool calling
You manage conversation history client-side
Lowest latency option

curl -X POST https://your-instance/v1/chat/completions \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tiiuae/Falcon3-7B-Instruct",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

Best for: Question answering over your documents (RAG).

Retrieves relevant document chunks before generating a response
Returns source citations with relevance scores
You manage conversation history client-side
Supports vision + RAG for image analysis with document context

curl -X POST https://your-instance/v1/knowledgebase/chat/completions \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen/Qwen2.5-32B-Instruct",
    "messages": [{"role": "user", "content": "Summarize the contract"}],
    "knowledgebase_ids": ["kb_abc123"]
  }'

Best for: Multi-turn conversations with document context and server-managed history.

Server stores and manages conversation history
Linked to knowledge bases at session creation
Just send new messages — no need to replay history
Ideal for chat UIs and interactive assistants

# Create session
curl -X POST https://your-instance/v1/chat/sessions \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Contract Review",
    "knowledgebase_ids": ["kb_abc123"],
    "model": "falcon3:3b"
  }'

# Send message (history managed by server)
curl -X POST https://your-instance/v1/chat/sessions/SESSION_ID/messages \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"content": "What are the key terms?"}'

Backend Architecture

The platform supports multiple AI inference backends. Each backend provides different performance characteristics and model support.

Backend Providers

Backend	Engine	Strengths
vLLM	High-performance GPU inference	Best throughput for large models, supports vision and tool calling
Ollama	Local model server	Easy setup, good for smaller models, supports embeddings
llama.cpp	CPU/GPU GGUF inference	Runs on CPU, minimal resource requirements

Capability Matrix

Capability	vLLM	Ollama	llama.cpp
Chat completions	Yes	Yes	Yes
Text completions	Yes	Yes	Yes
Embeddings	Yes	Yes	No
Streaming	Yes	Yes	Yes
Vision	Yes	No	No
Tool calling	Yes	No	No

Use GET /v1/capabilities to check the current capability matrix for your deployment, and GET /v1/backends/health to monitor backend status.

Discovery Endpoints

Endpoint	Purpose
`GET /v1/models`	List all available models across all backends
`GET /v1/backends`	List backend providers with their supported features
`GET /v1/capabilities`	Full capability matrix (backend → features → models)
`GET /v1/backends/health`	Real-time health and latency for each backend

Platform

Architecture

Architecture

API Decision Tree

When to Use Each Endpoint

Backend Architecture

Backend Providers

Capability Matrix

Discovery Endpoints

Next Steps

Quickstart

Authentication

Platform

​Architecture

​API Decision Tree

​When to Use Each Endpoint

​Backend Architecture

​Backend Providers

​Capability Matrix

​Discovery Endpoints

​Next Steps

Quickstart

Authentication

Architecture

API Decision Tree

When to Use Each Endpoint

Backend Architecture

Backend Providers

Capability Matrix

Discovery Endpoints

Next Steps