Running a seat-of-my-pants code-test evaluation on # AI generated these results on # ollama 7b
deepseek gpu llama qwen
| Source: Mastodon | Original article
A developer on the Nordic AI forum posted a quick‑and‑dirty benchmark of four 7‑billion‑parameter models running under Ollama on a single 16 GB GPU. The test asked each model to add a FastAPI endpoint to a small Python application whose source code was supplied. The models evaluated were Qwen‑7B, DeepSeek‑7B, Llama‑2‑7B and the newer Mist‑7B.
Qwen produced a syntactically correct FastAPI snippet but omitted error handling, while DeepSeek generated a more complete example, including request validation and a brief docstring. Llama‑2’s output was functional yet verbose, requiring manual cleanup, and Mist returned a partially formed function that failed to import the FastAPI library. The author noted the variance with a puzzled emoji, highlighting how even modest hardware can expose stark quality gaps among open‑source code assistants.
Why it matters is twofold. First, the experiment shows that developers can now run multiple code‑generation models locally without cloud fees, preserving data privacy—a key concern for Nordic enterprises handling sensitive codebases. Second, the results underscore that model choice still matters: newer entrants like DeepSeek can outperform older, more widely deployed models such as Llama‑2 on concrete programming tasks. This has implications for teams deciding whether to invest in commercial APIs or to spin up their own inference stacks.
As we reported on 4 April, running Gemma 4 locally with Ollama was already feasible on modest hardware; this latest test pushes the envelope by comparing several 7‑b models side‑by‑side. The next steps to watch include Ollama’s upcoming support for 12‑b and 30‑b models, the release of optimized kernels for NVIDIA’s RTX 40‑series GPUs, and community‑driven benchmarking suites that will standardise code‑generation evaluation. Those developments will determine whether local LLMs can reliably replace cloud‑based code assistants in production pipelines.
Sources
Back to AIPULSEN