Vana Empowers Users to Own AI Models Trained on Their Data

In a groundbreaking move, Vana, a decentralized platform born from an MIT class project, is shifting the power dynamic of AI development by allowing users to own a piece of the AI models trained on their data. This initiative directly addresses the growing concern over tech giants monopolizing user data for AI training, as highlighted by Reddit’s $60 million deal with Google in February 2024, where user data was leveraged without user involvement.

Vana’s user-owned network enables individuals to upload their data and govern its use. AI developers can propose new model ideas to users, and if the users consent to contribute their data, they gain proportional ownership in the resulting AI models. This innovative approach aims to democratize AI development, giving everyone a stake in the AI systems that are increasingly shaping society while unlocking new data sources to advance the technology.

“This data is needed to create better AI systems,” says Vana co-founder Anna Kazlauskas ’19. “We’ve created a decentralized system to get better data — which sits inside big tech companies today — while still letting users retain ultimate ownership.”

Kazlauskas’s journey from economics to blockchain began at MIT, where she mined Ethereum and explored distributed systems. This led her to collaborate with Art Abal, and together they sought ways to distribute data contribution for AI systems. Their approach contrasts with the current norm, where models are often trained by scraping public internet data or purchasing datasets from other companies.

Vana leverages a provision that allows users of major tech platforms to export their data. Users can upload this data into encrypted digital wallets within Vana and allocate it to train models. These data pools are organized as data DAOs (decentralized autonomous organizations). The platform ensures user privacy by preventing exposure of identifiable information during data use. Users retain ownership of the models, receiving proportional rewards each time their data contributes to the model’s usage.

Last year, Vana users contributed Reddit data to train an AI model capable of generating Reddit posts. Over 140,000 users participated, defining the model’s usage terms and retaining ownership. Similar projects have been launched using data from X, Oura rings, and other sources. These data pools can also be combined to develop broader AI applications.

According to Kazlauskas, the ability to combine data from various platforms like Spotify, Reddit, and fashion sources empowers users to create powerful models that would otherwise be impossible due to corporate silos and regulations. Vana currently hosts over 1 million users and more than 20 live data DAOs, with over 300 additional data pools proposed by users slated for production this year.

Vana’s data pools enable users to achieve what even the most powerful tech companies struggle with today. Kazlauskas emphasizes that by building these data pools, users benefit from the rise of AI through model ownership, preventing a single entity from controlling all-powerful AI models. This win-win scenario promotes better technology with benefits for everyone involved.