Home Blog Newsfeed Teaching AI models the broad strokes to sketch more like humans do
Teaching AI models the broad strokes to sketch more like humans do

Teaching AI models the broad strokes to sketch more like humans do

In an age where artificial intelligence excels at generating hyper-realistic images and intricate artworks, a fundamental aspect of human creativity—the iterative and spontaneous act of sketching—has largely remained elusive. Unlike detailed paintings, sketches serve as fluid tools for brainstorming, communicating ideas, and refining concepts stroke by stroke. Recognizing this gap, researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) and Stanford University have unveiled “SketchAgent,” a groundbreaking drawing system designed to teach AI models to sketch with a human-like, iterative approach.

SketchAgent leverages a multimodal language model, such as Anthropic’s Claude 3.5 Sonnet, to translate natural language prompts into expressive sketches within seconds. This innovative system can independently doodle concepts like a house, or engage in collaborative drawing sessions with human users, incorporating text-based input to develop specific parts of a sketch. The researchers have showcased SketchAgent’s ability to create abstract representations of diverse subjects, from a robot or butterfly to a DNA helix, a flowchart, and even the iconic Sydney Opera House.

Yael Vinker, a CSAIL postdoc and lead author of the paper introducing SketchAgent, emphasizes the system’s potential to foster a more natural mode of human-AI interaction. “Not everyone is aware of how much they draw in their daily life. We may draw our thoughts or workshop ideas with sketches,” Vinker states. “Our tool aims to emulate that process, making multimodal language models more useful in helping us visually express ideas.”

A key innovation behind SketchAgent is its method of teaching AI to draw stroke-by-stroke without relying on pre-existing human-drawn datasets, which are often limited in scale and diversity. Instead, the researchers developed a unique “sketching language.” This language translates a sketch into a numbered sequence of strokes on a grid, with each stroke labeled for what it represents—for instance, the seventh stroke of a house might be a rectangle labeled “front door.” This approach allows the model to generalize and apply its sketching principles to entirely new concepts it hasn’t explicitly encountered during training.

The research, co-authored by CSAIL affiliates Tamar Rott Shaham, Alex Zhao, and MIT Professor Antonio Torralba, alongside Stanford University’s Research Fellow Kristine Zheng and Assistant Professor Judith Ellen Fan, is set to be presented at the prestigious 2025 Conference on Computer Vision and Pattern Recognition (CVPR) this month. This presentation will shed more light on the technical intricacies and broader implications of SketchAgent.

While existing text-to-image models like DALL-E 3 can produce compelling visuals, they often miss the spontaneous and iterative nature inherent in human sketching. SketchAgent, by modeling drawings as a sequence of strokes, achieves a more natural and fluid appearance. The team rigorously tested SketchAgent’s collaborative mode, where humans and the AI work in tandem. Their findings confirmed that SketchAgent’s contributions were crucial; removing AI-generated strokes, such as a sailboat’s mast, rendered the entire drawing unrecognizable.

In performance evaluations, SketchAgent, utilizing Claude 3.5 Sonnet as its default backbone model, consistently generated the most human-like vector graphics, outperforming other prominent models like GPT-4o and Claude 3 Opus. Tamar Rott Shaham, a co-author, notes, “The fact that Claude 3.5 Sonnet outperformed other models like GPT-4o and Claude 3 Opus suggests that this model processes and generates visual-related information differently.” She envisions SketchAgent as a transformative interface, expanding human-AI collaboration beyond conventional text-based communication, making AI more intuitive and versatile.

Despite its impressive capabilities, SketchAgent is still in its early stages. It currently produces simple representations, akin to stick figures and doodles, and struggles with complex tasks such as drawing logos, full sentences, intricate creatures like unicorns or cows, and specific human figures. The model can also occasionally misinterpret user intentions in collaborative sessions, as exemplified by a two-headed bunny—a result possibly stemming from its “Chain of Thought” reasoning breaking tasks into smaller, sometimes misaligned, steps. Future work aims to refine these drawing skills through training on synthetic data generated by diffusion models and improving the user interface for more seamless interaction.

This pioneering work, supported by grants from the U.S. National Science Foundation, a Hoffman-Yee Grant, Hyundai Motor Co., the U.S. Army Research Laboratory, the Zuckerman STEM Leadership Program, and a Viterbi Fellowship, marks a significant step towards enabling AI to engage in the creative, step-by-step collaboration that defines human sketching. The promise of SketchAgent lies in its ability to facilitate more aligned and intuitive human-AI design processes, paving the way for new applications in education, design, and interactive art.

Add comment

Sign Up to receive the latest updates and news

Newsletter

© 2025 Proaitools. All rights reserved.