By Marty Swant • January 20, 2025 •
Last week saw not one but two high-profile AI legal battles under the spotlight, with updates in separate copyright cases against Meta and OpenAI.
Court documents unsealed in an AI copyright case against Meta raised new questions about the use of e-books from a book piracy site Library Genesis (LibGen). They also raise new questions about how much CEO Mark Zuckerberg and other Meta execs knew about Meta teams’ use of pirated content to help train its Llama models.
Court documents allege Meta employees sought to remove copyright information — including headers and other identifiers — from various materials. One filing shows an internal Meta document with a suggestion to remove lines containing words like “ISBN,” “copyright,” and “all rights reserved.” Another filing includes messages between employees talking about the desire to compete with other AI rivals, including beat OpenAI’s GPT-4 while also describing French rival Mistral as “peanuts.”
Other documents include parts of Zuckerberg’s testimony from his December deposition. Zuckerberg said broad characterizations make the use of pirated content seem “like a bad thing” but added that Meta’s teams “think through this carefully because there are often more nuances than is kind of apparent at first.” (Meta did not reply to Digiday’s request for comment about the court documents.)
Books in the LibGen dataset include titles by top authors, including Ta-Nehisi Coates and Sarah Silverman, who are among the authors who filed the lawsuit. Zuckerberg claimed not to be familiar with LibGen. However, the plaintiff’s attorney then asked if Meta would do business with a company that brags about using pirated materials.
“In general, if someone is broadcasting loudly that they’re doing something that is illegal, that would be a pretty big red flag that I’d want us to look at carefully before engaging with them in any way,” Zuckerberg said.
When asked by an attorney if Meta should not be downloading materials from websites known to have pirated materials, Zuckerberg said YouTube hosts “some percent” of pirated content even if most of the content is “kind of good and they have the license to do.”
“Early on, I think that people did make some assertions about YouTube’s intent on this, and they were less mature about developing their IP rights management,” Zuckerberg said. “But even then, I don’t think that I would’ve said I wouldn’t want people at Meta not to use YouTube, at that point. So — so I don’t know.”
Other documents suggest Meta execs were aware Llama’s training data had LibGen content and other copyrighted materials from sources like CommonCrawl. Documents also suggest Meta teams knew there could be blowback and potential fines under the EU AI Act if the use of LibGen were uncovered. One document mentioned Meta teams suggesting datasets should be red-teamed to filter out potential information about bio-weapons and harmful stereotypes.
NYT v. OpenAI and Microsoft
Revelations in the Meta case come as tech companies face more scrutiny over the types of content used to train large language models. In a separate lawsuit between The New York Times and OpenAI, attorneys gave oral arguments in court that outlined key points both sides are crafting as part of the case. In both cases, plaintiffs allege tech companies stripped copyright information from content used to train AI models.
“You’re leaving people open for massive copyright infringement without the ability to trace it,” said Steven Lieberman, an attorney representing the New York Daily News, which filed a separate case against OpenAI and Microsoft. “It’s like it causes the alarm system in your house to go down.”
Beyond court — publishers ink new AI deals
Last week, Axios and OpenAI announced a new partnership that includes funding new local Axios newsrooms in four cities including Pittsburgh, Pa. and Kansas City, Mo. The deal also gives Axios access to OpenAI’s tech to build new AI products, processes and systems. In a blog post about the deal, Axios CEO Jim VanderHei wrote that the three-year deal will also give all Axios staff access to OpenAI’s enterprise version.
That wasn’t the news last week about AI-powered news. The Associated Press and Google also announced a new partnership that includes the AP providing a feed of real-time information to Google’s Gemini app. The companies’ blog posts didn’t disclose the terms of the deal or what it’ll entail, but noted the plan will help “enhance the usefulness of results” within the Gemini app. Kristin Heitmann, the AP’s chief revenue officer, stated the updates are part of the companies’ ongoing relationship and “based on working together to provide timely, accurate news and information to global audiences.”
Beyond Axios and the AP’s plans to expand AI news, another company starting with “A” took a step back. Last week, Apple suspended its use of AI news alerts, following criticism for generating inaccuracies in AI-summarized notifications. Meanwhile, a new report by DoubleVerify detailed a network of more than 200 websites generating “AI slop” that mimics real publishers while misleading adtech vendors and buyers.
Prompts and Products — Other AI news and announcements
Anthrologic, a new startup co-founded by former MediaMonks execs, launched with the goal of helping brands create AI agents.
Adobe debuted a new generative AI tool for its Firefly platform that aims to give retailers more ways to scale personalized content.
The U.S. Supreme Court upheld a ban on TikTok unless the company is sold to a U.S. entity.
The U.K.’s Competition Markets Authority announced a new investigation into Google’s search and search ads business, which will explore whether the giant has “strategic market status” under newly enacted competition U.K. law. One of the CMA’s reasons for the investigation is to make sure AI startups are able to fairly compete with Google’s own AI products and services.
The FTC, which has been investigating Snapchat’s My AI chatbot, announced it has referred the investigation to the U.S. Justice Department. The investigation includes the “allegedly resulting risks and harms to young users,” according to the FTC. “Although the Commission does not typically make public the fact that it has referred a complaint, we have determined that doing so here is in the public interest.”
Other AI-related stories from across Digiday
As agencies evolve AI tools for influencer vetting, they’re also discovering the tech’s limitations
Marketing Briefing: What happens to marketers when the cultural ‘cheat code’ of TikTok is gone?
OpenAI, The New York Times debate copyright infringement of AI tech companies in trial arguments
Brands are seeing an influx of traffic from ChatGPT and Google Gemini
What the agentic AI era means for ad agencies, with Omnicom’s Jonathan Nelson
Media Briefing: Dotdash Meredith’s Jon Roberts on the AI agenda in 2025
CES Briefing: Agentic AI era heralds SEO overhaul, Q&A with Mastercard’s Raja Rajamannar & Dotdash Meredith’s OpenAI ad assist
Media Buying Briefing: Looks like brand safety’s back on the menu
AI Briefing: Copyright battles bring Meta and OpenAI datasets under the microscope
Court documents raise new questions about Meta’s use of copyrighted content, and how much execs knew about pirated datasets
More in Marketing
GIPHY App Key not set. Please check settings