
AI Predicts Protein Location in Human Cells, Revolutionizing Disease Diagnosis
Researchers at MIT, Harvard University, and the Broad Institute have developed a groundbreaking AI-driven method capable of predicting the precise location of virtually any protein within a human cell. This innovation promises to accelerate disease diagnosis, identify drug targets, and deepen our understanding of complex biological processes.
The challenge: With approximately 70,000 proteins and variants in a single human cell, manually identifying their locations is a costly and time-intensive endeavor. Proteins misplaced within a cell can contribute to diseases like Alzheimer’s, cystic fibrosis, and cancer, making accurate localization crucial.
Existing computational techniques often rely on machine-learning models trained on extensive datasets like the Human Protein Atlas, which catalogs the subcellular behavior of over 13,000 proteins across more than 40 cell lines. However, even this vast resource only scratches the surface of all possible protein-cell line pairings.
The new approach: The team’s method, called PUPS, overcomes these limitations by predicting protein location at the single-cell level, offering a more detailed and precise analysis than previous methods that provide averaged estimates across entire cell types. This level of precision could be invaluable in pinpointing a protein’s location within a specific cancer cell following treatment.
How it works: PUPS combines a protein language model with a computer vision model to capture intricate details about proteins and cells. The protein sequence model analyzes the amino acid chain and 3D structure of the protein. The computer vision model uses three stained images of a cell to gather information about its type, features, and overall state. The system then generates an image of a cell with a highlighted area indicating the predicted protein location.
“You could do these protein-localization experiments on a computer without having to touch any lab bench, hopefully saving yourself months of effort,” explains Yitong Tseo, a graduate student at MIT and co-lead author of the research paper published in Nature Methods. “While you would still need to verify the prediction, this technique could act like an initial screening of what to test for experimentally.”
The researchers trained PUPS using a secondary task: explicitly naming the compartment of localization, such as the cell nucleus. This technique improved the model’s general understanding of possible cell compartments, analogous to a teacher asking students to label parts of a flower in addition to drawing it.
The team validated PUPS’s predictive capabilities through lab experiments, confirming its ability to accurately predict the subcellular location of new proteins in unseen cell lines. Compared to baseline AI methods, PUPS demonstrated reduced prediction error across tested proteins.
Future directions: The researchers plan to enhance PUPS to understand protein-protein interactions and predict the localization of multiple proteins within a cell. Ultimately, they aim to enable PUPS to make predictions in living human tissue, rather than just cultured cells.



