RE: Are WTF and AI Companies Buying Up Antivirus Software?
google training
| Source: Mastodon | Original article
AI companies buy antique books in bulk to train models. They're then discarded after use.
AI companies are scooping up antique and second-hand books in bulk to use as training data for their large language models (LLMs). This unusual practice is aimed at filling gaps in their knowledge base, particularly on niche topics. The books are shipped to the US, where they are used to further train LLMs, and then discarded.
This development matters because it highlights the ongoing quest for diverse and high-quality training data in the AI industry. As LLMs become increasingly prevalent, their ability to provide accurate and informative responses relies heavily on the breadth and depth of their training data. By leveraging antique and second-hand books, AI companies are attempting to improve the performance and reliability of their models.
As the AI landscape continues to evolve, it will be interesting to watch how companies balance the need for extensive training data with concerns over data quality, sourcing, and environmental impact. With the rise of source-grounded AI tools like Google's NotebookLM, which emphasizes the importance of accurate and reliable information, the industry may see a shift towards more sustainable and responsible data collection practices.
Sources
Back to AIPULSEN