Skip to Content
DocumentationFirst-Party AppsπŸ’« Local Bench

πŸ’« Local Bench

Validate your inference. Benchmark local LLM performance on your own hardware so you know exactly how fast your Ollama models run.

Install & availability

Available now. Install from the Companion Hub App Store. Verified on the test fleet: the published image pulls, starts, serves its web UI, and survives a container restart.

Hub App StoreSearch Local Bench β†’ Install
Container imageghcr.io/companionintelligence/ci-local-bench:latest (public)
Web UI port3000
Architecturelinux/amd64 and linux/arm64 (multi-arch)

Overview

Local Bench is a benchmarking tool for local language models served by Ollama. It runs prompts through your models, records throughput and timing alongside your hardware specs, and stores the results so you can compare runs over time. Use it before deploying a model to understand how it actually performs on your Hub.

Current capabilities

What Local Bench does today, as shipped:

  • Benchmarks models served by Ollama
  • Records results together with system specs for context
  • Persists runs to disk and exposes them via a REST API
  • Ships a curated model catalog as a fallback when Ollama isn’t reachable
  • Shows a friendly empty state on first run before any benchmarks exist

Connecting Ollama

Local Bench talks to Ollama via the OLLAMA_API_URL environment variable (default http://localhost:11434). Inside a container, localhost is the container itself, so point it at the host or a LAN address:

# Ollama running on the Docker host docker run -p 3000:3000 \ -e OLLAMA_API_URL=http://host.docker.internal:11434 \ -v localbench-data:/app \ ghcr.io/companionintelligence/ci-local-bench:latest # Or a LAN address -e OLLAMA_API_URL=http://192.168.1.50:11434

When Ollama is unreachable, GET /api/models returns HTTP 503 along with a curated catalog of models, so the UI degrades gracefully instead of breaking. Fix the connection by correcting OLLAMA_API_URL.

Persistence

Local Bench writes two files into /app:

  • benchmark_data.db β€” the SQLite results database
  • benchmark_results.csv β€” a CSV mirror of results

Mount a volume at /app (as in the example above) to keep your benchmark history across restarts.

API

EndpointMethodPurpose
/api/resultsGETStored benchmark results
/api/results-with-specsGETResults joined with system specs (?limit= supported)
/api/system-specsGETDetected hardware specs
/api/promptsGETBenchmark prompt set
/api/modelsGETAvailable models (503 + curated catalog if Ollama is down)
/api/run-benchmarkPOSTRun a benchmark (blocks while the run is in progress)

POST /api/run-benchmark runs synchronously and blocks until the benchmark completes. Expect the request to stay open for the duration of the run.

Use Cases

  • β€œWhich model runs fastest on my hardware right now?”
  • Capture before/after numbers when you change GPUs or quantizations
  • Keep a CSV/SQLite history of how your Hub performs over time
  • Document your Hub’s specs alongside real benchmark figures

Setup

Install from Hub

Search for Local Bench in the Hub app store and install.

Open Local Bench

Navigate to http://local-bench.ci.localhost. On first run you’ll see a friendly empty state until you record a benchmark.

Point it at Ollama

Set OLLAMA_API_URL to a reachable Ollama endpoint (e.g. http://host.docker.internal:11434 from a container, or a LAN address). With the default http://localhost:11434, the container only sees itself.

Run a benchmark

Pick a model from your Ollama library and start a run. Results are saved and appear via /api/results.

Troubleshooting

No models listed Ollama is likely unreachable, so /api/models returned 503 with the curated catalog. Check OLLAMA_API_URL points to a host/LAN address β€” not the container’s own localhost.

Results disappear after restart Mount a volume at /app so benchmark_data.db and benchmark_results.csv persist.

A benchmark request seems to hang POST /api/run-benchmark blocks for the whole run by design. Wait for it to finish.

Last updated on