OpenAI and photo generator Midjourney will soon pay to train their AI models using public Tumblr content, according to internal documents reviewed by the site 404 Media.
404 Media has reported that a deal is “imminent” between Tumblr parent company Automattic and the two AI giants but could not specify what types of data would be sold to each company. The deal also reportedly includes the sale of data from WordPress.com, another Automattic property.
Posts detailing how user content is used for AI training were published on Feb. 27 on the staff blogs of both Tumblr and WordPress.com. However, the posts did not tell users that Automattic was in talks to sell that data.
Here’s what you need to know about how the sale may affect your Tumblr content.
Which content will Automattic reportedly sell?
404 Media has reported that the documents it reviewed did not specify the types of data that would be sold to each company. It is also unclear if this deal will affect future posts to Tumblr only, or if it encompasses past content as well. AI companies have been critiqued for their rampant use of “publicly available” content to train their models, since much of what is publicly available online is still beholden to copyright.
According to a support article on OpenAI’s website, “ChatGPT and our other services are developed using information that is publicly available on the internet” among other sources. Ostensibly, OpenAI has already scraped and used any and all content once publicly available on Tumblr. Given that, the current deal could serve as a sort of mea culpa on the part of OpenAI and Midjourney as they offer to pay for the use of all future Tumblr content as well.
Automattic did not respond to requests for comment from 404 Media regarding the deal but posted a statement called “Protecting User Choice” in which the company wrote, “We currently block, by default, major AI platform crawlers—including ones from the biggest tech companies—and update our lists as new ones launch.” It is unclear when the site began blocking the crawlers, which is important considering that OpenAI has been training its algorithm on public content for years.
How do I opt out?
To opt out of sharing your public Tumblr content with third parties, you’ll need to toggle on a new “Prevent third-party sharing” option in the settings of each individual blog you run. This needs to be done on a web browser, not through the Tumblr app. These updates have been added to Tumblr’s support article about user privacy.
If you have already elected to discourage searching of your blog in the past, the new “prevent third-party sharing” option will already be toggled on by default.
But what if you decide to forgo toggling on the setting now, opting instead to do it in three months? 404 Media reported that, in a document it accessed from Feb. 23, a Tumblr staff member asked a question addressing this issue. “Do we have assurances,” they wrote, “that if a user opts out of their data being shared with third parties that our existing data partners will be notified of such a change and remove their data?”
Automattic’s head of AI, Andrew Spittle, replied, “We will notify existing partners on a regular basis about anyone who’s opted out… I want this to be an ongoing process where we regularly advocate for past content to be excluded based on current preferences. We will ask that content be deleted and removed from any future training runs. I believe partners will honor this based on our conversations with them to this point.”
Is this normal?
It certainly seems to be, at the very least, the new normal. OpenAI is licensing news stories from the Associated Press and is reportedly in talks to do the same with CNN, Time, and Fox. Reddit is working with Google to monetize its database of content.
It was just a matter of time before Automattic started selling its own data, especially considering how much money it’s losing on Tumblr. In its entire 17-year history, the site has never been profitable, and Automattic has failed to turn it around. In November, TechCrunch reported that resources had been diverted from the struggling site to support projects elsewhere within Automattic.
Topics
Artificial Intelligence
Tumblr