New York Times Bans Use of Its Content to Train AI

image provided by pixabay

This post is also available in: עברית (Hebrew)

The New York Times has instituted a ban on using its content to train artificial intelligence systems and updated its terms of service accordingly.

The terms and conditions on the website now include a prohibition on the “use the content for the development of any software program, including, but not limited to, training a machine learning or artificial intelligence (AI) system.”

This is perhaps in reaction to Google’s recent announcement that all digital content should be available to mine unless publishers state otherwise.

AI tools like ChatGPT or Google Bard use large language models (LLMs) that “mine” content on the internet to learn both human language cadence and facts, which then enables the systems to generate content.

According to The Wrap, The Times also prohibit the use of “robots, spiders, scripts, service, software or any manual or automatic device, tool, or process designed to data mine or scrape the content, data or information from the services, or otherwise use, access, or collect the content, data or information from the Services using automated means.”

Data lifting or crawling is now a major issue as the race to AI grows. Companies continuously use each other’s data, with OpenAI using videos from YouTube to train its speech-to-text AI language model Whisper and Google using ChatGPT to help train Bard.

On the other hand, while The Times is trying to prevent the use of its content, some other news outlets are striking deals to benefit from the rapid expansion of AI.

Just last month The Associated Press struck a deal with OpenAI agreeing to share access to technology developments and content, intending to give the AP the opportunity to explore generative AI in news products, while OpenAI has access to part of the AP’s text archive to further train its artificial intelligence products.