Most LLMs are derivative works, trained on GPL's code

2026-07-05 | Source: Mastodon | Original article

LLMs are derivative works bound by the GPL. They rely on GPL'd code for training.

A significant development has emerged in the realm of Large Language Models (LLMs), with implications for the tech industry. Almost all LLMs have been trained using GPL'd code, making them derivative works bound by the GPL. This includes instances where they have ingested AGPL code, further entwining them in open-source licensing requirements. This matters because the GPL, a copyleft license, dictates that derivative works must also be made available under the same license, potentially restricting big tech's ability to maintain proprietary control over their LLMs. The open-source community has long advocated for the principles of free software, and this revelation may bolster their arguments. As the situation unfolds, it will be crucial to watch how big tech companies respond to these licensing obligations. Will they adapt their business models to comply with the GPL, or will they attempt to navigate around these requirements? The outcome may have far-reaching consequences for the development and deployment of LLMs, and the future of open-source software in the AI landscape.

Sources

Mastodon

Back to AIPULSEN