New York Times Lawsuit Against OpenAI and Microsoft Could Redefine AI's Use of Copyrighted Content
The New York Times (NYT) case against OpenAI and Microsoft has raised many discussions regarding copyright infringements against AI that most people would be shocked to hear. Within this case, the Times claims that OpenAI and Microsoft have unpermittedly copied millions of their articles to train their language models.
According to the Times, OpenAI, “can generate output that recites Times' content verbatim, closely summarizes it, and mimics its expressive style.” This quote here proves that OpenAI and Microsoft have been stealing and copying millions of documents word for word, and bending them to not be identical to the actual documents that you would have to pay for. This leads to many copyright issues amongst AI companies in general because now they have to find a new solution to train their AI to be more knowledgeable. Since these companies have used their articles without the Times' consent, the consequences for OpenAI and Microsoft will be massive, since the NYT is a big company that is strongly held by people with different opinions, meaning many people will support NYT. OpenAI and Microsoft could be looking at owing billions of dollars in damages over illegally copying and using the newspaper's data for their own model's repurposing. These huge payments are a risk for these companies and the datasets currently used to produce responses, with GPT-3's Common Crawl dataset being said to use more than 66 million total records of content from the Times. However, although many may favor the NYT, others may strongly support Microsoft and OpenAI since they would view this lawsuit from their eyes, making it seem like they were in the right to be pirating NYT’s articles.
To understand what this case's outcome may be, the results of the case of Andersen et al v. Stability AI (January 12th, 2023) can be looked at. This case involved a group of visual artists who alleged direct and induced copyright infringement, DMCA (Digital Millennium Copyright Act) violations, false endorsement, and trade dress claims based on the creation of Stability AI’s Stable Diffusion, and DreamStudio. In the end, the court decided that since none of the output images that Stability AI was identical to the artist's work, Stability could not be legally responsible for the removal of copyright management information that happens in the AI's training process. With this ruling, the work of the visual artists may lose significance since AI can copy it, if not make their work better using these features, which could lead to artists not having jobs in the future. In comparison, this lawsuit against Stability AI and the one involving the NYT are quite similar in the sense that one party is accusing the other of copyright infringement due to the AI having either a certain feature or that the AI directly took something usually that you had to pay for to train their language models.
The New York Times v. OpenAI and Microsoft case shows that the AI companies have good motives to train their AI to be knowledgeable, but in bad ways leading to this lawsuit. One alternative could be so that the people stealing the content from others can research these bits of content and reword it and use their own words instead of making it a one-on-one copy, in the sense that they don’t have to steal to train AI which most people don’t like. The reason why most people don’t like AI is because they’re talking about how scary it is that it can basically do their job for them, especially for artists who spend hours doing their work, and now AI comes around and can do a perfect drawing in just one second. If samples of work that are not copyrighted but instead created for training AI were made, this problem of AI stealing other's work could be solved. Ultimately, with the rise of these cases, AI will have to be shaped differently in the future for it to not get banned in the future for a severe crime like this one where Microsoft and OpenAI have been stealing documents to train their language models to better improve their AI.
Overall, the outcome of this lawsuit will shape the future of many other AIs, as they will need to be programmed differently. AI companies will have to find new ways to store knowledge in their systems instead of relying on content taken from other companies that invest time and resources into creating it. This is all due to theTimes’ wanting their data permanently removed. This one lawsuit can cause problems in training other AI, but it can also make AI not be relied on too much as it is now.