AI & Inference
Companion Hub includes built-in support for running and routing AI models. The Onboarding Wizard configures your inference backend, recommended models, and optional cloud API keys on first login.
Inference backends
| Backend | Status | Notes |
|---|---|---|
| Ollama | Available | Default. Runs on your host or in a container. OpenAI-compatible /v1 API. |
| Cloud providers | Available | OpenAI, Anthropic, Google, GitHub Copilot β configured in Settings |
| vLLM | Planned | High-throughput GPU serving |
| Lemonade | Planned | AMD-optimised inference |
Hub detects your hardware (CPU, RAM, GPU vendor, VRAM) and recommends models that fit. You can override preferences in Settings β AI.
Ollama on your host
By default, Hub connects to Ollama on your host machine at port 11434. Containers reach it via host.docker.internal:11434.
Install Ollama
Download from ollama.comΒ or:
curl -fsSL https://ollama.com/install.sh | shPull a model:
ollama pull llama3.2
# or via Hub CLI:
cihub models install llama3.2
cihub models listBind Ollama to all interfaces
Ollama listens on 127.0.0.1 by default. Docker containers need it reachable on the host network interface. Set:
# Linux (systemd override)
sudo systemctl edit ollamaAdd:
[Service]
Environment="OLLAMA_HOST=0.0.0.0:11434"Then restart:
sudo systemctl restart ollamaOn macOS/Windows with Docker Desktop, host.docker.internal usually works without changing the bind address.
[GIF OF OLLAMA GPU SETUP] β Screen recording showing Ollama detecting a GPU and Hubβs System Inspector confirming VRAM.
GPU acceleration
NVIDIA (CUDA): Install the NVIDIA Container ToolkitΒ so Docker can pass GPU devices. Ollama auto-detects CUDA when drivers are installed.
AMD (ROCm/HIP): Install ROCm drivers for your GPU. Ollama supports AMD GPUs on Linux with ROCm. Verify with ollama ps while a model is loaded.
Hubβs hardware inspector reads GPU info and tags recommended models accordingly in the wizard.
Standardized inference variables for apps
Apps that use AI can opt in to standardized Hub inference variables via hub_integration.inference in their config.json. At install time, Hub resolves the correct values based on your settings and hardware, then writes them into the appβs app.env.
Hub variable keys
| Key | Resolved env (internal) | Description |
|---|---|---|
llm_base_url | CI_LLM_BASE_URL | OpenAI-compatible base URL (Ollama /v1 or cloud provider) |
llm_api_key | CI_LLM_API_KEY | API key (ollama for local Ollama) |
chat_model | CI_CHAT_MODEL | Default chat/general model ID |
embedding_model | CI_EMBEDDING_MODEL | Default embedding model ID |
vision_model | CI_VISION_MODEL | Default vision-capable model ID |
ollama_host | OLLAMA_HOST | Native Ollama URL (not OpenAI-compatible) |
Example in config.json
Map Hub-resolved values to the env variable names your app expects:
{
"hub_integration": {
"inference": {
"llm_base_url": "LLM_API_BASE",
"llm_api_key": "LLM_API_KEY",
"chat_model": "LLM_DEFAULT_CHAT_MODEL",
"embedding_model": "LLM_DEFAULT_EMBEDDING_MODEL",
"ollama_host": "OLLAMA_HOST"
}
}
}Apps without hub_integration.inference receive no AI variables β zero overhead for non-AI apps.
Resolution order
For each variable, Hub resolves in this order:
- Your Hub-wide preference (Settings β AI)
- Hardware-aware recommendation from the model registry
- Omitted β the app must handle the variable being absent
If Ollama is unavailable and no cloud provider is configured, Hub omits all inference variables and logs a warning.
MCP integration for agent apps
Agent apps (like OpenClaw) can declare hub_integration.mcp_client: true to receive Hub MCP endpoints:
| Variable | Purpose |
|---|---|
HUB_URL | Internal Hub API URL |
HUB_MCP_URL | MCP SSE endpoint |
HUB_MCP_MESSAGES_URL | MCP messages endpoint |
HUB_MCP_API_KEY | Auth key (when MCP is enabled) |
HUB_WAKE_SECRET | Wake hook secret |
Enable MCP on your Hub:
cihub mcp setup
cihub mcp configSee Companion Agent for the agent ecosystem.
CLI model management
cihub models list
cihub models install mistral
cihub models rm mistral
cihub status # shows installed models in the Models section