The legal squabbles between artists and the companies that train AI on their artworks don’t seem to be abating.
In the span of a few months, several lawsuits over corporate generative AI technology, including OpenAI and Stability AI, have been filed by plaintiffs alleging that copyrighted data — mostly art — was used without their consent to train the generative models . Generative AI models “learn” to create art, code, and more by “training” on sample images and text, which are usually scraped indiscriminately from the internet.
In an effort to give artists more control over how and where their art is used, Jordan Meyer and Mathew Dryhurst co-founded the startup Turn off AI. Spawning created HaveIBeenTrained, a website that allows creators to opt out of the training dataset for one art-generating AI model, Stable Diffusion v3, to be released in the coming months.
As of March, artists had used HaveIBeenTrained to remove 80 million pieces of art from the Stable Diffusion training set. By the end of April, that figure had eclipsed 1 billion.
As demand for Spawning’s service grew, the company—up to then fully booted— sought outside investment. And it got it. Spawning announced today that it has raised $3 million in a seed round led by True Ventures with participation from the Seed Club Ventures, Abhay Parasnis, Charles Songhurst, Balaji Srinivisan, Jacob.eth and Noise DAO.
Speaking to businessroundups.org via email, Meyer said the funding will enable Spawning to continue developing “IP standards for the AI era” and establish more robust opt-out and opt-in standards.
“We are excited about the potential of AI tools. We have developed domain expertise in the field from our passion for new opportunities AI offers to creators, but I believe that consent is a fundamental layer to making these developments something everyone is comfortable with,” said Meyer.
Spawning’s stats speak for themselves. Clearly there is a demand from artists for more say in how their art is used (or scraped, as the case may be). But aside from partnerships with art platforms like Shutterstock and ArtStation, Spawning hasn’t been able to get the industry around a common opt-out or provenance standard.
Adobe, which recently announced generative AI tools, follows its own opt-out mechanisms and tools. So does DeviantArt, which in November launched a protection that relies on HTML tags to prevent the software robots that crawl pages for images from downloading those images for training sets. OpenAI, the generative AI giant in the room, still doesn’t offer an opt-out tool — nor has it announced any plans to do so any time soon.
Spawning has also been criticized for the opacity – and vagueness – of the opt-out process. As Ars Technica noted in a recent piece, the opt-out process appears to fall short of the definition of consent to the use of personal data in the European General Data Protection Regulation, which states that consent should be actively given and not assumed by default. It’s also unclear how Spawning intends to legally verify the identities of artists who submit opt-out requests – or indeed, if it plans to attempt to do so at all.
Spawning’s solution is versatile. First, it plans to make it easier for AI model trainers to honor opt-out requests and streamline the process for creators. Subsequently, Spawning will offer more services to organizations that want to protect their artists’ work, says Meyer.
“We want to build the permission layer for AI, which we think will be a fundamentally useful piece of infrastructure moving forward,” he added. “We plan to grow spawn to address the many different domains covered by the AI economy, as each domain has its own specific needs.”
As a first step toward this ambitious vision, Spawning enabled domain opt-outs in March, allowing creators and content partners to quickly opt out of content from entire websites. Spawning says there are 30,000 domains registered in the system so far.
April marks the release of an API and open source Python package that will vastly expand the breadth of content Spawning touches. Previously, Spawning opt-out requests only applied to the LAION-5B dataset — the dataset used to train Stable Diffusion. Starting in April, any website, app or service can use Spawning’s API to automatically comply with opt-outs not only for image data, but also for text, audio, videos and more.
Meyer says Spawning will merge any new opt-out method (e.g., Adobe’s and DeviantArt’s) into its Python package for model trainers, with the goal of reducing the number of accounts model makers must manage to fulfill opt-out requests. .
To increase visibility, Spawning has teamed up with Hugging Face, one of the larger platforms for hosting and running AI models, to add a new info box to Hugging Face that will alert users to the proportion of “unsubscribed data in text-to-image datasets. The box will also link to a Spawning API login page so that model trainers can delete logged-out images during training.
“We believe that once companies and developers know that the option to honor creators’ wishes is available, there is little reason not to honor them,” said Meyer. “We are excited about the future of generative AIbut makers and organizations need standards to ensure their data works to their advantage.”
Moving forward, Spawning plans to release an “exact-duplicate” detection feature to match deregistered images with copies the platform finds on the web, followed by a “near-duplicate” detection feature to notify artists when Spawning finds likely copies of their work that has been cropped, compressed, or otherwise slightly modified.
There are also plans for a Chrome extension that would allow creators to pre-emptively unsubscribe from their work being posted anywhere on the web, and a subtitle search feature on the HaveIBeenTrained website to search directly for image descriptions. The site’s current search function only uses approximate matches between text and images, as well as URL searches to find content hosted on specific websites.
Spawning – now beholden to investors – plans to make money by building services on top of its content infrastructure, though Meyer wouldn’t reveal much. How that will sit with content creators remains to be seen.
“We’ve spoken to quite a few organizations, with many conversations being too premature to announce, and we think our funding announcement and increased visibility will provide some assurance that what we’re building is a robust and reliable standard to work with,” Meijer said. “After we complete these features, we will begin building infrastructure to support more data sets, including music, video, and text.”