💫 Local Bench

Validate your inference. Benchmark local LLM performance on your own hardware so you know exactly how fast your Ollama models run.

Install & availability

Available now. Install from the Companion Hub App Store. Verified on the test fleet: the published image pulls, starts, serves its web UI, and survives a container restart.


Hub App Store	Search Local Bench → Install
Container image	`ghcr.io/companionintelligence/ci-local-bench:latest` (public)
Web UI port	`3000`
Architecture	`linux/amd64` and `linux/arm64` (multi-arch)

Overview

Local Bench is a benchmarking tool for local language models served by Ollama. It runs prompts through your models, records throughput and timing alongside your hardware specs, and stores the results so you can compare runs over time. Use it before deploying a model to understand how it actually performs on your Hub.

Current capabilities

What Local Bench does today, as shipped:

Benchmarks models served by Ollama
Records results together with system specs for context
Persists runs to disk and exposes them via a REST API
Ships a curated model catalog as a fallback when Ollama isn’t reachable
Shows a friendly empty state on first run before any benchmarks exist

Connecting Ollama

Local Bench talks to Ollama via the OLLAMA_API_URL environment variable (default http://localhost:11434). Inside a container, localhost is the container itself, so point it at the host or a LAN address:


# Ollama running on the Docker host
docker run -p 3000:3000 \
  -e OLLAMA_API_URL=http://host.docker.internal:11434 \
  -v localbench-data:/app \
  ghcr.io/companionintelligence/ci-local-bench:latest
 
# Or a LAN address
-e OLLAMA_API_URL=http://192.168.1.50:11434

When Ollama is unreachable, GET /api/models returns HTTP 503 along with a curated catalog of models, so the UI degrades gracefully instead of breaking. Fix the connection by correcting OLLAMA_API_URL.

Persistence

Local Bench writes two files into /app:

benchmark_data.db — the SQLite results database
benchmark_results.csv — a CSV mirror of results

Mount a volume at /app (as in the example above) to keep your benchmark history across restarts.

API

Endpoint	Method	Purpose
`/api/results`	GET	Stored benchmark results
`/api/results-with-specs`	GET	Results joined with system specs (`?limit=` supported)
`/api/system-specs`	GET	Detected hardware specs
`/api/prompts`	GET	Benchmark prompt set
`/api/models`	GET	Available models (503 + curated catalog if Ollama is down)
`/api/run-benchmark`	POST	Run a benchmark (blocks while the run is in progress)

POST /api/run-benchmark runs synchronously and blocks until the benchmark completes. Expect the request to stay open for the duration of the run.

Use Cases

“Which model runs fastest on my hardware right now?”
Capture before/after numbers when you change GPUs or quantizations
Keep a CSV/SQLite history of how your Hub performs over time
Document your Hub’s specs alongside real benchmark figures

Setup

Install from Hub

Search for Local Bench in the Hub app store and install.

Open Local Bench

Navigate to http://local-bench.ci.localhost. On first run you’ll see a friendly empty state until you record a benchmark.

Point it at Ollama

Set OLLAMA_API_URL to a reachable Ollama endpoint (e.g. http://host.docker.internal:11434 from a container, or a LAN address). With the default http://localhost:11434, the container only sees itself.

Run a benchmark

Pick a model from your Ollama library and start a run. Results are saved and appear via /api/results.

Troubleshooting

No models listed Ollama is likely unreachable, so /api/models returned 503 with the curated catalog. Check OLLAMA_API_URL points to a host/LAN address — not the container’s own localhost.

Results disappear after restart Mount a volume at /app so benchmark_data.db and benchmark_results.csv persist.

A benchmark request seems to hang POST /api/run-benchmark blocks for the whole run by design. Wait for it to finish.