What is Have I Been Trained?
Have I Been Trained? (HIBT) is a pioneering web-based tool designed to provide transparency and agency to creators in the age of generative artificial intelligence. Launched by the organization Spawning.ai, founded by artists Mat Dryhurst and Holly Herndon, the platform serves as a search engine for the massive datasets used to train popular AI models like Stable Diffusion and Midjourney. Its primary mission is to allow artists, photographers, and illustrators to discover if their copyrighted work has been scraped and used to "teach" AI without their explicit consent.
The tool gained significant traction following the explosive growth of text-to-image generators in late 2022. At the time, many creators were shocked to find that their life’s work had been ingested into the LAION-5B dataset—a collection of over five billion image-text pairs—without any notification or compensation. Have I Been Trained? emerged as a direct response to this "wild west" era of data scraping, offering a bridge between the tech companies developing these models and the human creators who provide the raw material for their intelligence.
Beyond simple discovery, the platform has evolved into a central hub for data sovereignty. It facilitates a "Do Not Train" registry, which has been formally recognized by major industry players like Stability AI and Hugging Face. In an era where the legal boundaries of "fair use" are still being litigated in courts worldwide, Have I Been Trained? represents a practical, artist-led effort to establish a standard for consensual AI training, shifting the paradigm from passive exploitation to active participation.
Key Features
- LAION Dataset Search: The core of the platform is a searchable index of the LAION-5B dataset. Users can search by text (keywords, artist names) to see what images associated with those terms appear in the training data.
- Reverse Image Search: While occasionally limited due to server load or safety updates, the tool allows users to upload an image to find exact matches or visually similar works within the dataset. This is crucial for artists whose work might be hosted on third-party sites without proper metadata.
- "Do Not Train" Registry: This is the platform's most impactful feature. By creating an account and claiming their work, artists can flag their images to be opted out of future training sets. This registry is used by AI developers who have committed to ethical data sourcing.
- Domain-Level Opt-Out: For professional photographers, galleries, or studios with large portfolios, HIBT offers the ability to opt out entire domains. This prevents scrapers from targeting an artist's personal website or portfolio host in one sweeping action.
- Spawning Browser Extension: A companion tool that allows users to "inspect" any webpage they are currently browsing. It identifies which images on the page are already in training datasets and provides a quick link to opt them out.
- Kudurru Integration: Part of the broader Spawning ecosystem, Kudurru is an active defense network. It identifies scrapers in real-time and allows site owners to block them or even serve "poisoned" data (incorrect labels) to discourage non-consensual scraping.
- CSAM Safeguards: Following a brief hiatus in early 2024, the tool implemented advanced safety protocols to ensure that harmful or illegal content (such as CSAM) is identified and removed from the search results, ensuring a safe environment for researchers and creators.
Pricing
Have I Been Trained? is primarily a free public service for individual creators. The founders have maintained that the ability to search for one's own data and opt out of training should be a fundamental right, not a paid privilege.
- Individual Search & Opt-Out: Free. Anyone can search the dataset and register their "Do Not Train" preference at no cost.
- Spawning Browser Extension: Free. Available for major browsers to help creators manage their data while browsing.
- Enterprise/Model Trainer Services: Spawning.ai offers "Data Diligence" packages for AI companies. These are paid B2B services that help developers clean their datasets and respect the "Do Not Train" registry at scale.
- Kudurru Defense: While the basic network is often free to join for small site owners, advanced protection and API access for high-traffic platforms may involve custom pricing.
Pros and Cons
Pros
- Empowers Creators: It is the first tool to give artists a tangible way to fight back against non-consensual data scraping.
- Industry Recognition: Unlike many "protest" tools, HIBT is actually respected by major AI developers like Stability AI, making the opt-out meaningful.
- User-Friendly Interface: The site is clean, intuitive, and requires no technical knowledge to use.
- Promotes Ethical AI: It forces a conversation about consent and helps establish a "gold standard" for how AI models should be built in the future.
Cons
- Not Retroactive: Opting out prevents your work from being used in future models (e.g., Stable Diffusion 3), but it cannot "un-train" models that have already been released.
- Limited Scope: The tool primarily focuses on the LAION dataset. It cannot search private datasets owned by companies like OpenAI (DALL-E) or Midjourney unless they choose to share that data.
- Onus on the Artist: The burden of discovery and opting out still falls on the creator. Artists must spend their own time searching for and flagging their work.
- Search Speed: Due to the massive size of the dataset and the implementation of safety filters, search results can sometimes be slow or return "unavailable" messages during peak times.
Who Should Use Have I Been Trained??
The tool is an essential resource for several specific profiles within the creative economy:
- Digital Artists and Illustrators: Anyone who posts work on platforms like ArtStation, DeviantArt, or Instagram should check HIBT to see how their style might be influencing AI-generated outputs.
- Professional Photographers: Since high-quality photography is a prime target for AI training, photographers should use the domain-level opt-out to protect their commercial portfolios.
- Estate Managers and Rights Holders: Galleries and organizations managing the legacy of deceased artists can use the tool to ensure that historical works are not being exploited by commercial AI firms.
- AI Researchers and Ethicists: The tool provides a transparent window into the composition of the datasets that are currently shaping the future of technology, making it a valuable resource for academic study.
Verdict
Have I Been Trained? is more than just a utility; it is a critical piece of infrastructure for the modern internet. While it isn't a "magic bullet" that can erase an artist's digital footprint from the AI models already in existence, it is the most effective tool available for drawing a line in the sand for the future. By centralizing the opt-out process and gaining the cooperation of major AI labs, Spawning.ai has turned a chaotic situation into a manageable one.
For any creator concerned about the intersection of their intellectual property and artificial intelligence, HIBT should be the first stop. It is a rare example of a tool that actually shifts power back to the individual in a landscape dominated by tech giants. Even if you aren't ready to opt out, the insight gained by seeing how the "machine" sees your work is invaluable. Highly recommended for all digital creators.