Skip to Content
DocumentationFeaturesAI & Inference

AI & Inference

Companion Hub includes built-in support for running and routing AI models. The Onboarding Wizard configures your inference backend, recommended models, and optional cloud API keys on first login.

Inference backends

BackendStatusNotes
OllamaAvailableDefault. Runs on your host or in a container. OpenAI-compatible /v1 API.
Cloud providersAvailableOpenAI, Anthropic, Google, GitHub Copilot β€” configured in Settings
vLLMPlannedHigh-throughput GPU serving
LemonadePlannedAMD-optimised inference

Hub detects your hardware (CPU, RAM, GPU vendor, VRAM) and recommends models that fit. You can override preferences in Settings β†’ AI.

Ollama on your host

By default, Hub connects to Ollama on your host machine at port 11434. Containers reach it via host.docker.internal:11434.

Install Ollama

Download from ollama.comΒ  or:

curl -fsSL https://ollama.com/install.sh | sh

Pull a model:

ollama pull llama3.2 # or via Hub CLI: cihub models install llama3.2 cihub models list

Bind Ollama to all interfaces

Ollama listens on 127.0.0.1 by default. Docker containers need it reachable on the host network interface. Set:

# Linux (systemd override) sudo systemctl edit ollama

Add:

[Service] Environment="OLLAMA_HOST=0.0.0.0:11434"

Then restart:

sudo systemctl restart ollama

On macOS/Windows with Docker Desktop, host.docker.internal usually works without changing the bind address.

[GIF OF OLLAMA GPU SETUP] β€” Screen recording showing Ollama detecting a GPU and Hub’s System Inspector confirming VRAM.

GPU acceleration

NVIDIA (CUDA): Install the NVIDIA Container ToolkitΒ  so Docker can pass GPU devices. Ollama auto-detects CUDA when drivers are installed.

AMD (ROCm/HIP): Install ROCm drivers for your GPU. Ollama supports AMD GPUs on Linux with ROCm. Verify with ollama ps while a model is loaded.

Hub’s hardware inspector reads GPU info and tags recommended models accordingly in the wizard.

Standardized inference variables for apps

Apps that use AI can opt in to standardized Hub inference variables via hub_integration.inference in their config.json. At install time, Hub resolves the correct values based on your settings and hardware, then writes them into the app’s app.env.

Hub variable keys

KeyResolved env (internal)Description
llm_base_urlCI_LLM_BASE_URLOpenAI-compatible base URL (Ollama /v1 or cloud provider)
llm_api_keyCI_LLM_API_KEYAPI key (ollama for local Ollama)
chat_modelCI_CHAT_MODELDefault chat/general model ID
embedding_modelCI_EMBEDDING_MODELDefault embedding model ID
vision_modelCI_VISION_MODELDefault vision-capable model ID
ollama_hostOLLAMA_HOSTNative Ollama URL (not OpenAI-compatible)

Example in config.json

Map Hub-resolved values to the env variable names your app expects:

{ "hub_integration": { "inference": { "llm_base_url": "LLM_API_BASE", "llm_api_key": "LLM_API_KEY", "chat_model": "LLM_DEFAULT_CHAT_MODEL", "embedding_model": "LLM_DEFAULT_EMBEDDING_MODEL", "ollama_host": "OLLAMA_HOST" } } }

Apps without hub_integration.inference receive no AI variables β€” zero overhead for non-AI apps.

Resolution order

For each variable, Hub resolves in this order:

  1. Your Hub-wide preference (Settings β†’ AI)
  2. Hardware-aware recommendation from the model registry
  3. Omitted β€” the app must handle the variable being absent

If Ollama is unavailable and no cloud provider is configured, Hub omits all inference variables and logs a warning.

MCP integration for agent apps

Agent apps (like OpenClaw) can declare hub_integration.mcp_client: true to receive Hub MCP endpoints:

VariablePurpose
HUB_URLInternal Hub API URL
HUB_MCP_URLMCP SSE endpoint
HUB_MCP_MESSAGES_URLMCP messages endpoint
HUB_MCP_API_KEYAuth key (when MCP is enabled)
HUB_WAKE_SECRETWake hook secret

Enable MCP on your Hub:

cihub mcp setup cihub mcp config

See Companion Agent for the agent ecosystem.

CLI model management

cihub models list cihub models install mistral cihub models rm mistral cihub status # shows installed models in the Models section
Last updated on