OpenAI admits that they’re thieves

Summary: The article discusses the conflict between The New York Times and OpenAI, focusing on copyright issues. Publishers, including The New York Times, made their content freely available online, inadvertently providing their copyrighted material to people who would misuse it. OpenAI is accused of exploiting this by using the Times’ content to develop competing services. The legality of this, under the guise of “fair use,” is debatable. The article criticizes the narrow legal focus on verbatim quoting by AI, suggesting that the core issue is the unauthorized use of copyrighted material. It also mentions an ongoing lawsuit in England and concludes with a nuanced view on the benefits and ethical dilemmas posed by AI technologies like ChatGPT.

My friend Charles Benaiah takes apart the on-going conflict between The New York Times and OpenAI. Bo Sacks picked it up the other day, and I’ll provide a link below.

It pains me to say it – honestly it does – but The New York Times is doing God’s work on this topic. This is, of course, the only topic where it’s possible to say that.

Here’s the basic problem. Publishers put their content online for “free” – supported by ads – because that was their path to discovery in the search engines. They didn’t have the foresight to put restrictive terms on access to that content. They should have made it clear that the content was only available under certain terms.

I mean – why the heck do we hire lawyers?

OpenAI took advantage of the ambiguity. They slurped up NYT content and used it to create a service to compete with The New York Times — and every other publisher in the world.

That’s disgusting and bad form and all that, but is it technically illegal? That’s what we need to find out.

OpenAI claims this is “fair use” – which is an exception to copyright protection. I’m no lawyer, but I’ve been in and around copyright questions my whole career, and I think this is transparently stupid.

That doesn’t mean it won’t win in court.

The argument seems to be centering on whether or not AI is quoting copyrighted material verbatim. That is an incredibly short-sighted approach, because if the court rules that AI can’t quote verbatim, the AI chatbots will just put in a subroutine to make sure they don’t do that, and the fundamental problem will persist.

The fundamental problem being that OpenAI is using copyrighted content in a way that the copyright owner never approved. Unfortunately, they didn’t have the foresight to specifically disclaim this use of their content.

There’s a lawsuit in England right now where OpenAI has apparently admitted that it’s impossible to train their pet dragon without eating lots of young people.

They said “[l]imiting training data to public domain books and drawings created more than a century ago might yield an interesting experiment, but would not provide AI systems that meet the needs of today’s citizens.”

This is a classic “the ends justify any means” argument.

Don’t get me wrong. I love ChatGPT. Just last night I had a great conversation with it about recipes to make fortified wines. It’s an amazing service.

And the neighborhoods controlled by the mafia were pretty safe.

Links

New York Times: All is fair in love and AI

OpenAI admits it’s impossible to train generative AI without copyrighted materials