FactorSmith: Agentic Simulation Generation via Markov Decision Process Decomposition with Planner-Designer-Critic Refinement

agents reasoning reinforcement-learning

2026-03-24 | Source: ArXiv | Original article

FactorSmith, a new arXiv pre‑print (2603.20270v1), proposes a three‑stage “Planner‑Designer‑Critic” pipeline that turns natural‑language specifications into fully executable simulations. The authors decompose the task into a Markov Decision Process (MDP) and iteratively refine code fragments: a planner sketches high‑level steps, a designer expands each step into concrete code, and a critic evaluates functional correctness against the original prompt. By breaking the generation problem into smaller, context‑light sub‑tasks, FactorSmith sidesteps the limited reasoning bandwidth of today’s large language models (LLMs) when they must juggle sprawling, interdependent codebases. The work builds on the FACTORSIM framework introduced in 2024‑2025, which first applied a factored partially observable MDP to reduce context dependence during simulation generation. FactorSmith adds an agentic loop that actively checks and corrects generated snippets, yielding higher fidelity simulations that can be dropped straight into reinforcement‑learning pipelines. Early experiments reported in the paper show a 30 % drop in compilation errors and a 22 % improvement in task‑completion metrics compared with baseline LLM generation. Why it matters is twofold. First, the ability to auto‑generate reliable simulation environments from plain language could dramatically shorten the development cycle for robotics, autonomous‑vehicle testing, and digital‑twin creation—areas where Nordic firms are already investing heavily. Second, the planner‑designer‑critic architecture offers a template for making LLMs more “agentic,” echoing recent advances such as Sashiko’s code‑review agent and the retrieval‑augmented chatbots we covered last week. What to watch next: the authors promise an open‑source release of the FactorSmith toolkit by summer, and a benchmark suite that pits the system against existing simulation generators. Industry observers will be keen to see integrations with vector‑database back‑ends like Zvec for rapid retrieval of code modules, and whether the approach scales to multimodal specifications that combine text, diagrams, and sensor data. If the early results hold, FactorSmith could become a cornerstone of the next wave of AI‑driven simulation engineering.

Sources

Back to AIPULSEN