Show HN: Robust LLM Extractor for Websites in TypeScript https:// github.com/lightfeed/extractor
| Source: Mastodon | Original article
Lightfeed has pushed a fresh release of its open‑source “Extractor” library, a TypeScript toolkit that marries Playwright’s browser automation with large language models (LLMs) to pull structured data from web pages. The update, announced on Hacker News an hour ago, adds value‑history tracking, distinct list‑vs‑detail extraction modes and optional email notifications, extending the feature set first unveiled in May 2025.
The core of Extractor is a prompt‑driven pipeline: raw HTML is handed to an LLM, which interprets natural‑language instructions and returns JSON‑compatible output. Playwright ensures the page is rendered exactly as a human would see it, while the LLM handles the messy, site‑specific logic that traditional scrapers struggle with. Lightfeed’s developers stress “great token efficiency,” a claim that matters as LLM‑driven pipelines can otherwise balloon costs when processing large volumes of pages.
Why it matters is twofold. First, the library lowers the barrier for enterprises to build production‑grade data ingestion flows without hand‑crafting brittle CSS selectors or maintaining separate parsing code for each site. Second, it showcases a growing trend where LLMs act as the “brain” of web‑automation stacks, a shift that could reshape data‑engineering roles and accelerate AI‑augmented market intelligence, price monitoring and compliance checks across the Nordics and beyond.
As we reported on 26 March, the original Show HN post introduced the concept (see our earlier coverage). The next steps to watch include community benchmarks that compare token usage and extraction accuracy against classic scrapers, integration with orchestration platforms such as LangChain or Airflow, and any security audits that address concerns about LLM‑driven code execution on untrusted sites. If the library gains traction, it may become a de‑facto standard for AI‑enhanced web data pipelines, prompting larger cloud providers to offer competing, managed equivalents.
Sources
Back to AIPULSEN