OpenAI’s Deleted Book Datasets Could Cost It The Copyright Case

According to Ars Technica, US District Judge A. Ona T. Wang ordered OpenAI last week to produce all internal communications about deleting the “Books1” and “Books2” datasets by December 8, 2025, and make its in-house lawyers available for deposition by December 19. The datasets, created in 2021 by scraping the shadow library LibGen, were deleted before ChatGPT’s 2022 launch. The judge ruled that OpenAI waived attorney-client privilege by flip-flopping on whether “non-use” was a privileged reason for the deletion, calling the company’s broader privilege claim a “moving target” that “strains credulity.” This discovery is central to a class-action lawsuit from authors who allege willful copyright infringement, which could increase statutory damages to up to $150,000 per work. Judge Wang also noted OpenAI’s actions risk putting its “good faith and state of mind at issue.”

OpenAI’s flip-flop is a major problem

Here’s the thing about legal strategy: consistency matters. And OpenAI, according to the judge, hasn’t been consistent at all. First, it said the datasets were deleted partly because they fell into “non-use.” Then, when authors wanted to see discussions about that “non-use,” OpenAI backtracked and said all reasons for deletion were privileged legal advice. The judge basically said, “You can’t have it both ways.” By stating a reason publicly and then trying to hide behind privilege, OpenAI may have waived that privilege entirely. That’s a huge, self-inflicted wound. Now, the internal Slack chats and emails about deleting these libraries—including a channel initially, and tellingly, called “excise-libgen”—are likely coming out. The judge already peeked and found most of those messages weren’t privileged legal discussions anyway; they were just operational talk with a lawyer copied on the channel. Oops.

could-be-a-smoking-gun”>Why this could be a smoking gun

So why do the authors care so much about why OpenAI deleted some old data? It all comes down to willfulness. In copyright law, if you can prove the infringement was knowing or reckless, damages can skyrocket. The authors’ theory is simple: if internal messages show OpenAI deleted “Books1” and “Books2” because of legal fears—because they knew the data was pirated and risky—that’s pretty direct evidence of being “actually aware of the infringing activity.” It shows they knew they were on shaky ground. One of the authors’ lawyers even suggested OpenAI might be using the data under different names now. That’s a more aggressive claim, but you can see their angle. If the reason for deletion was, “Hey, this is legally toxic, get rid of it,” then using that same data to train the model that made you billions looks really, really bad. You can read the judge’s full opinion and order here.

The Anthropic connection and a bizarre citation

This case is deeply intertwined with the one against Anthropic, which just settled for a historic sum. And get this: the datasets in question were reportedly created by former OpenAI employees, including Anthropic CEO Dario Amodei. The authors have already won the right to depose him. But the real kicker from this ruling is how Judge Wang called out OpenAI for “bizarrely” misrepresenting the Anthropic ruling. OpenAI tried to cite it as saying downloading pirated books for AI training is lawful. Judge Wang pointed out the actual ruling said the exact opposite—that such piracy is “inherently, irredeemably infringing.” She even suggested OpenAI’s own conduct of pirating and then deleting the data “seemed to fall squarely” into the category of behavior that ruling condemned. When a judge says you’re grossly misrepresenting a key precedent in your favor, you’re not having a good day in court.

What happens next

OpenAI says it will appeal, but the immediate pressure is on. They have to cough up those comms by early December. This isn’t just about one document dump; it’s about the entire trajectory of the case. Judge Wang pointed out the “fundamental conflict” in asserting a good-faith defense while blocking all inquiry into your state of mind. That conflict may have already weakened OpenAI’s position fatally. Look, the Hollywood Reporter suggests this could tip the scales toward a settlement, and it’s easy to see why. The authors in the Anthropic case pointed to evidence that company got cold feet about pirated books “for legal reasons.” If similar phrases show up in OpenAI’s Slack history—and why wouldn’t they?—it becomes the textbook definition of willful infringement. Suddenly, the calculus changes from fighting a principle to limiting catastrophic financial exposure. The docket is going to get a lot more interesting in the next few weeks.

Electronic Arts has announced a strategic partnership with Stability AI to embed generative artificial intelligence directly into game development pipelines. The collaboration places AI researchers within EA’s production teams to co-create tools for faster asset generation and world-building. Industry analysts suggest this could significantly shorten development cycles for major game releases.

Electronic Arts is making a major push into generative AI for game development, announcing Thursday what sources describe as a deep strategic partnership with Stability AI. According to company statements, the collaboration aims to fundamentally reshape how games are “imagined, built, and refined” across EA’s global studio network.