M

Make-A-Scene

Make-A-Scene by Meta is a multimodal generative AI method puts creative control in the hands of people who use it by allowing them to describe and illustrate their vision through both text descriptions and freeform sketches.

What is Make-A-Scene?

Make-A-Scene is a multimodal generative AI research project developed by Meta AI (formerly Facebook). Introduced as an "exploratory AI research concept," it represents a significant shift in how artificial intelligence handles image generation. While traditional models like the original DALL-E or early versions of Midjourney relied almost exclusively on text prompts, Make-A-Scene was designed to give users a higher degree of spatial and compositional control by allowing them to combine text descriptions with freeform sketches.

The core philosophy behind Make-A-Scene is that words alone are often insufficient to describe a complex visual vision. For example, a prompt like "a cat on a fence" might result in various compositions, many of which might not match what the creator intended. By providing a simple sketch alongside the text, users can dictate exactly where the cat should sit, the angle of the fence, and the overall layout of the background. This "multimodal" approach (using more than one type of input) bridges the gap between human intent and machine execution.

Currently, Make-A-Scene remains a research-oriented tool rather than a mass-market consumer product. Meta has shared the demo with a select group of AI artists—such as Sofia Crespo and Refik Anadol—to test its creative limits. While it has not been released as a standalone public app, the technology behind it serves as a foundation for Meta’s broader AI ecosystem, influencing the generative features now appearing across Instagram, Facebook, and WhatsApp.

Key Features

  • Multimodal Input (Text + Sketch): The standout feature of Make-A-Scene is its ability to process both a written prompt and a hand-drawn sketch simultaneously. The AI uses the sketch as a layout guide (segmentation map) and the text to define the style, textures, and details.
  • Unprecedented Spatial Control: Unlike most text-to-image generators that "guess" the placement of objects, Make-A-Scene follows the user's lead. If you draw a circle on the left and a square on the right, the AI will place the corresponding objects exactly in those positions.
  • High-Resolution Output: At its unveiling, Meta announced that Make-A-Scene could generate images at a resolution of 2,048 x 2,048 pixels. This was a massive leap forward at the time, as many competing models were limited to 256x256 or 512x512 outputs.
  • Scene Layout Generation: For users who don't want to draw, the model can still generate its own layout based solely on text. However, the human-led sketch remains the preferred method for precision.
  • Artist-Centric Design: The tool was built with feedback from professional digital artists, ensuring it addresses real-world creative needs like "compositional consistency"—the ability to keep elements in place while iterating on style.

Pricing

As of early 2026, Make-A-Scene does not have a public pricing tier because it is not available as a commercial product. It remains an internal research project at Meta AI. There is no subscription model, pay-per-image fee, or public API currently available for this specific model.

However, users looking for Meta’s accessible AI tools can use Meta AI (meta.ai) or the "Imagine" feature within Meta's social apps. These public-facing tools are currently free to use for anyone with a Facebook or Instagram account in supported regions. While they don't offer the full sketch-to-image control of the Make-A-Scene research model, they represent the closest consumer-ready alternative.

Pros and Cons

Pros:

  • Superior Controllability: It solves the "frustration of randomness" common in AI art by letting users draw the layout.
  • Professional Quality: The 2K resolution and focus on "key aspects" (like objects and animals) result in cleaner, more intentional images.
  • Innovative Workflow: It encourages a collaborative process between the human and the AI, rather than the human just being a "prompt engineer."
  • Foundational Tech: The research has pushed the entire industry toward "ControlNet" style features, which are now becoming standard in advanced AI art.

Cons:

  • Not Publicly Accessible: The biggest drawback is that most people cannot actually use the Make-A-Scene demo; it is restricted to select researchers and artists.
  • No Open Source: Unlike Meta's Llama models, the weights and code for Make-A-Scene have not been fully open-sourced, making it difficult for developers to build upon.
  • Research Limitations: As an exploratory concept, it may still reflect biases found in its training data (public datasets), which Meta has openly acknowledged.

Who Should Use Make-A-Scene?

Since Make-A-Scene is a research concept, its "users" are currently a very specific group, but its target audience for the future includes:

  • AI Researchers: Those studying multimodal learning and how to improve the relationship between different types of data (vision and language).
  • Digital Artists and Illustrators: Creative professionals who need exact control over composition for storyboarding, concept art, or complex illustrations.
  • Metaverse Developers: Builders looking to quickly generate specific, high-quality assets for 3D environments and virtual worlds.
  • Creative Enthusiasts: People who enjoy the "sketching" aspect of art but lack the technical skill to render high-fidelity final pieces.

Verdict

Make-A-Scene is a visionary milestone in the evolution of generative AI. By proving that a simple sketch can provide the "creative anchor" that text prompts often lack, Meta has set a new standard for how we interact with AI models. It moves us away from the "slot machine" style of AI generation—where you pull a lever and hope for a good result—and toward a true digital paintbrush.

While the lack of public access is disappointing for the average user, the legacy of Make-A-Scene is already visible in the rapid development of "image-to-image" and "sketch-to-image" features in other tools like Stable Diffusion and Photoshop. For now, Make-A-Scene is less a tool you can use and more a blueprint for the future of human-AI collaboration. If you need a tool today, look to Meta's public "Imagine" feature, but keep an eye on Make-A-Scene's research papers to see where the technology is headed next.

Compare Make-A-Scene