Can publishers effectively fight Skynet?

Summary: This article discusses the ongoing battle between publishers and AI, caricatured as “Skynet.” The author suggests four strategies for publishers to combat AI: altering the robots.txt file, enforcing server-side measures, updating terms and conditions for potential lawsuits, and developing their own large language models. However, the article acknowledges the challenges, such as AI’s ability to find content elsewhere and exploit technical loopholes. The author emphasizes the importance of perception and the potential value of reputable sources. While it may seem like a losing battle, the article encourages publishers to fight for copyright protection.

Watch on YouTube

You won’t out-tech them, but that doesn’t mean you have to lie down and take it.

I have encouraged publishers to fight back against AI in 4 ways.

To change their robots.txt file to kindly request the monsters to leave you alone.
To enforce that on the server side.
To change their terms and conditions to make the lawyers happy and provide the basis for future lawsuit.
To start working on their own large language models.

My friend Charles Benaiah thinks that’s tilting at windmills. Skynet will win.

Charles makes four points.

1. Skynet doesn’t need your content. He says “media is highly derivative” and points out that they can get something equivalent to your content from 100 other sites that won’t block the bots.

2. There are loopholes. Despite your best efforts, Skynet will find a way to get your content if it wants to.

3. Even if they couldn’t scrape your website, they can sign up to your emails and get a lot of your content that way. I should add they can also follow your social media.

4. If they really want your content, they can send people to the library and scan it.

These are all good points, but I’d like to give a broader context.

Charles’ first point is the strongest. No matter how good you think your journalism is, there’s so much stuff out there for free that AI will be able to create something as good as your content even if they can’t crawl your site.

That’s true. But there’s a perceived value to an article that comes from a so-called reputable source. If the chat bots are required to show sources, or if people come to believe that their content is compiled from Fred and Ethel’s Blog, Twitter Posts, and YouTube videos by people who sniff glue all day, that will give other sources an edge — even if, objectively speaking, the LLM is able to turn that Internet straw into gold.

In other words, perception is important here. Chat bots are mostly free right now, but they won’t always be, so at some point they’ll be competing against you. Will the market pay $10 to get news that’s spun from the straw of “free crap on the internet,” or would they rather pay $10 to get something from a professional? I don’t know, but let’s at least lay the groundwork to make that a possible future.

The second, third, and fourth points — that there are technical loopholes, that they can get your content from your emails, and that they can scan hard copies at the library — are also true, which is why publishers also need to update their terms and conditions so they have the grounds to sue Skynet.

The wheels of justice move slowly, but publishers did recently win against The Internet Archive for violating copyright. So there is some hope.

It’s possible — probably even likely — that this is a losing battle. The tech companies give a lot more to Congress than media companies do, so they’ll probably get some insane carve out, like Section 230.

But ancestry DNA says I have a lot of northern blood in my veins. The idea of a desperate losing battle against the forces of wickedness and chaos kinda appeals to me.

So I say fight the power. Order and sanity may prevail. We might come to realize that copyright is an important thing. Or … maybe this train will run you over. But you don’t have to lay down on the tracks.

Links

A beautiful (AI) mind