Meta faces lawsuit over alleged word-for-word Llama copying

1 2 minutes read

Book publishers sue Meta over AI’s ‘word-for-word’ copying

Llama copyright – Misryoum reports major publishers and author Scott Turow sue Meta, alleging Llama training copied copyrighted books and articles.

A coalition of major book publishers and an author has taken aim at Meta, alleging that the company’s AI models learned from copyrighted material in ways that go beyond what copyright law allows.

In a class action filed with the courts. Misryoum reports that five publishers and author Scott Turow claim Meta trained its Llama AI models using books and journal articles “repeatedly” copied without permission.. The lawsuit argues this process amounts to one of the most sweeping infringements of copyrighted works. drawing attention to how training data is sourced and used at scale.

The complaint says Meta pulled material from so-called pirate sites. including LibGen. Anna’s Archive. and Sci-Hub. among others. and then fed that content into training pipelines.. It also points to Common Crawl as a source it describes as containing unauthorized copies of copyrighted works. setting up a core dispute about whether such data can be relied on for model development.

For the public. the sharpest claim is about output behavior: the lawsuit contends that Llama can produce verbatim or near-verbatim continuations of copyrighted passages.. In one example described in the filing. a prompt tied to a mathematics textbook is said to lead the model to reproduce the continuation of a section in ways the authors and publishers argue are effectively copying.

Meanwhile, the broader legal landscape has been shifting under similar challenges across the industry.. Misryoum notes that in a separate case involving an AI company. a federal judge determined that training on legally purchased books can fall under fair use. while still allowing a class action to proceed where plaintiffs allege unauthorized copying from large volumes of works.

For publishers and creators. these cases matter because they test the boundary between “training” and “use.” If courts conclude that particular data sources or output patterns cross that line. it could reshape how AI systems are built. what gets licensed. and how companies prove their training practices are compliant.

In this case. Misryoum reports that the publishers and author seek damages and ask the court to order Meta to stop the alleged conduct.. They also request that Meta provide a list of the specific copyrighted books. journal articles. and other works used to train Llama. a demand that could force greater transparency around training datasets.

Meta. through its spokesperson. said it will contest the lawsuit aggressively. framing AI training on copyrighted material as something courts may consider permissible under fair use principles.. The outcome now hinges on how the court weighs the allegations against established legal standards for copyright and machine learning.

Ultimately, Misryoum expects the case to influence not just Meta, but the entire AI training ecosystem.. Even the act of litigating these claims can push companies toward tighter data governance and more explicit licensing strategies as the industry races to scale models while navigating copyright risk.

Ana Souza 1 hour ago

1 2 minutes read