Home Blog Newsfeed AI Sketches Like Humans: MIT and Stanford Develop SketchAgent for Intuitive AI Drawing
AI Sketches Like Humans: MIT and Stanford Develop SketchAgent for Intuitive AI Drawing

AI Sketches Like Humans: MIT and Stanford Develop SketchAgent for Intuitive AI Drawing

In a leap towards more intuitive AI interaction, researchers at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) and Stanford University have unveiled “SketchAgent,” a novel drawing system that enables AI models to sketch concepts more like humans. This innovative system bridges the gap between natural language and visual representation, allowing AI to participate in brainstorming and visual communication in a more fluid and iterative manner.

Unlike existing AI models that excel at generating realistic paintings or cartoonish drawings, SketchAgent captures the essence of sketching – a stroke-by-stroke process that allows for brainstorming and editing. By leveraging a multimodal language model, similar to Anthropic’s Claude 3.5 Sonnet, SketchAgent translates natural language prompts into sketches within seconds. The system can independently create doodles or collaborate with a human, integrating text-based input to sketch each part separately.

The research team demonstrated SketchAgent’s capabilities by creating abstract drawings of diverse concepts, including robots, butterflies, DNA helixes, flowcharts, and even the Sydney Opera House. This suggests the potential for future applications such as interactive art games, educational tools for diagramming complex concepts, and quick drawing lessons for users.

A paper introducing SketchAgent, lead author and CSAIL postdoc Yael Vinker highlights the system’s role in fostering a more natural communication between humans and AI. “Not everyone is aware of how much they draw in their daily life. We may draw our thoughts or workshop ideas with sketches,” she says. “Our tool aims to emulate that process, making multimodal language models more useful in helping us visually express ideas.”

SketchAgent learns to draw stroke-by-stroke without relying on pre-existing datasets. Instead, the researchers developed a “sketching language” that translates sketches into numbered sequences of strokes on a grid. By providing examples of how objects like houses are drawn, with each stroke labeled, the model can generalize to new concepts. The system’s ability to actively collaborate with humans during the sketching process was also tested. In one experiment, removing SketchAgent’s strokes from a drawing of a sailboat rendered the sketch unrecognizable, highlighting the tool’s essential contribution.

Different multimodal language models were tested within SketchAgent to determine which could produce the most recognizable sketches. Claude 3.5 Sonnet outperformed models like GPT-4o and Claude 3 Opus, generating more human-like vector graphics. Co-author Tamar Rott Shaham notes, “The fact that Claude 3.5 Sonnet outperformed other models like GPT-4o and Claude 3 Opus suggests that this model processes and generates visual-related information differently.” She envisions SketchAgent as a valuable interface for collaborating with AI models beyond text-based communication.

While SketchAgent demonstrates promising drawing capabilities, it is not yet capable of producing professional-grade sketches. It excels at simple representations using stick figures and doodles but struggles with complex figures. The model can also occasionally misunderstand users’ intentions in collaborative drawings, such as drawing a bunny with two heads. The researchers suggest refining these skills through training on synthetic data from diffusion models and improving the system’s interface.

The research, supported by organizations like the U.S. National Science Foundation and the Hyundai Motor Co., indicates a future where AI can draw diverse concepts with human-like collaboration, leading to more aligned and intuitive designs.

Video thumbnail
Play video
SketchAgent: a collaborative system that teaches AI models to sketch more like humans do.
Video: MIT CSAIL

Add comment

Sign Up to receive the latest updates and news

Newsletter

© 2025 Proaitools. All rights reserved.