
Meta is facing copyright lawsuits allegedly using copyrighted works to train its artificial intelligence (AI) models. The lawsuit was filed by multiple complainants, including several best-selling authors. The main allegation against the tech giant is that it uses pirated e-books and articles to train older versions of its Llama AI model in violation of copyright laws. Additionally, the documents accuse company CEO Mark Zuckerberg of allowing its Llama AI team to tilt rough link aggregators to access copyrighted materials.
The information comes from two separate documents filed Wednesday in the U.S. District Court for the Northern District of California. The documents came from complainants such as authors Sarah Silverman and Ta-Nehisi Coates, highlighting testimony provided by Meta in late 2024 and finding Zuckerberg ) Allows training of Llama AI models using a dataset called Libgen.
It is worth noting that Libgen (the abbreviation of Library Genesis) is a file sharing platform that provides free access to academic and general interest content. Many people think it is a pirate library because it has access to copyrighted works that could otherwise be available behind a paywall or not digitalized at all. The platform faces several lawsuits that have been ordered to close in the past.
The document claims that Meta used the Libgen dataset while fully understanding that it had misappropriated content and violated copyright laws. The document also cites a memorandum of Meta’s AI decision-makers, which after “upgraded to MZ”, the AI team at Meta was “approved to use Libgen”. Here, MZ is the shorthand for the name of the Chief Executive Officer.
Additionally, the memorandum mentioned that executives were warned that public knowledge about using “data sets we know we are going to be pirated (such as Libgen)” could undermine its negotiated position with regulators. The social media giant has also been accused of stripping copyright information from dataset text and metadata to cover up its infringement.
According to the document, Nikolay Bashlykov, a research engineer working in the META AI department, allegedly deleted copyright information from the Libgen dataset. To further hide the so-called dataset “Meta programmers also include “supervised samples” data samples when fine-tuning Llama to ensure that Llama’s output will include fewer crimes when answering hints about Meta AI training data sources Answer,” .
In addition, the complainant also claimed that Meta involved another copyright infringement through access to Libgen only. The document claims the tech giant torrented the Libgen dataset. The process of using torrents involves downloading and uploading (also known as seeds) content. The documents state that the uploading process can be regarded as the distribution of copyrighted materials and constitutes an illegal act.
“If Meta purchased the plaintiff’s work in a bookstore, or borrowed them from the library and trained its Llama model without a license, it would assume copyright infringement. Meta decided to bypass legal acquisition The decision to become aware participant in the Illegal Torrent Network established the CDAFA (California Integrated Computer Data Access and Fraud Act) and serves as proof of copyright infringement.”
At present, copyright litigation has been opened and the ruling is under trial. Meta has not yet made its arguments, which may be based on fair usage. The court will have to decide whether the generational capacity of AI models can be viewed as sufficient change to verify the argument.