Vane (Perplexica 2.0) Quickstart With Ollama and llama.cpp

llama privacy

2026-04-15 | Source: Mastodon | Original article

A new step‑by‑step guide released on glukhov.org shows how to self‑host Vane, the open‑source successor to Perplexica 2.0, using Docker and wiring it to the SearxNG meta‑search engine. The tutorial walks users through pulling the Vane container, configuring the built‑in API, and linking the search front‑end to any locally running large language model (LLM) via Ollama or llama.cpp. By default the setup supports the popular Gemma 4, Qwen 3.5 14B and other models that can be served through Ollama’s lightweight runtime or the high‑performance llama.cpp server. The guide also explains how to persist conversation history, enable tool‑calling, and expose a REST endpoint for custom integrations. Why it matters is twofold. First, it lowers the barrier for developers and privacy‑conscious users in the Nordics and beyond to run a full‑featured AI‑augmented search stack without relying on cloud APIs, cutting both latency and recurring costs. Second, the combination of Vane’s UI, SearxNG’s federated search, and locally hosted LLMs creates a modular “private copilot” that can be deployed on a home lab, a corporate intranet, or edge devices. As we reported on 13 April, the rapid maturation of Ollama and llama.cpp has already enabled private assistants and OSINT agents; Vane now adds a ready‑made search‑centric front‑end to that toolbox. What to watch next are community‑driven performance tweaks and model‑specific prompts that could make Vane competitive with commercial AI search services. The Ollama team’s upcoming support for GPU‑accelerated quantisation may further shrink inference times, while the SearxNG project is planning tighter result‑ranking hooks that could be leveraged by Vane. Keep an eye on GitHub activity around the Vane Docker repo and on any benchmark releases that compare local versus hosted search‑assistant pipelines.

Sources

Back to AIPULSEN