OpenAI Ordered to Produce 20 Million ChatGPT Logs in NYT Copyright Case
Magistrate Judge Ona T. Wang has ordered OpenAI to produce approximately 20 million de-identified ChatGPT logs to The New York Times and other plaintiffs as part of a copyright lawsuit. OpenAI's attempt to limit discovery was denied by the court. The logs are considered proportional and necessary to determine whether ChatGPT outputs reproduce New York Times content and will be produced under a protective framework.
The lawsuit, filed in December 2023, alleges that OpenAI trained its models on copyrighted news content without permission. OpenAI has countersued, claiming the New York Times is not disclosing the full story. While privacy concerns have been acknowledged by the court, they were not deemed controlling in the proportionality analysis, which focuses primarily on relevance with minimal burden.
OpenAI warned that the broad production of these logs would impose significant burdens on both privacy and their operations. A June ruling had ordered OpenAI to retain a wide range of user data, including deleted chats. An October ruling clarified disagreements about the 20 million log sample, its relation to deleted logs, and prior agreements on what data should be turned over.
Late last month, OpenAI filed a formal objection asking the district judge to overturn the discovery order, arguing that it is erroneous and disproportionate to disclose millions of private conversations. This case is part of a broader wave of copyright challenges concerning the use of AI training data, with similar claims being pursued in the United States and Europe.