
AI Predicts Protein Location in Human Cells with Unprecedented Accuracy
In a groundbreaking advancement, researchers at MIT, Harvard University, and the Broad Institute have developed an AI-driven method capable of predicting the location of virtually any protein within a human cell. This innovative approach, published in Nature Methods, promises to revolutionize disease diagnosis, drug discovery, and our understanding of complex biological processes (Source: Nature Methods).
The precise location of a protein within a cell is crucial to its function; mislocalization can contribute to diseases like Alzheimer’s, cystic fibrosis, and cancer. Identifying these locations manually is an arduous task, given the estimated 70,000 different proteins and variants within a single human cell. Traditional methods are costly and time-consuming, limiting the scale of exploration.
Existing computational techniques, including those leveraging the Human Protein Atlas, still only scratch the surface of possible protein-cell line pairings. The new method overcomes these limitations by accurately predicting the location of proteins even in previously untested cell lines and localizing proteins at the single-cell level, providing unparalleled precision.
“You could do these protein-localization experiments on a computer without having to touch any lab bench, hopefully saving yourself months of effort. While you would still need to verify the prediction, this technique could act like an initial screening of what to test for experimentally,” says Yitong Tseo, a graduate student at MIT and co-lead author of the research.
The AI model, named PUPS, combines a protein language model with a computer vision model to analyze protein sequences and cell images. By inputting the amino acid sequence of a protein and three stained cell images, PUPS predicts and highlights the protein’s location within a single cell. This detailed localization can be particularly useful in pinpointing a protein’s location in specific cancer cells post-treatment.
PUPS utilizes a protein sequence model to capture the properties of a protein and its 3D structure. Additionally, it incorporates an image inpainting model, designed to fill in missing parts of an image. This model analyzes stained cell images to gather information about the state of that cell, such as its type, individual features, and whether it is under stress.
During training, PUPS is assigned a secondary task: explicitly naming the compartment of localization, such as the cell nucleus. This additional step enhances the model’s overall understanding of possible cell compartments and improves its ability to generalize across proteins and cell lines.
The researchers validated PUPS’s accuracy through laboratory experiments, comparing predicted subcellular locations with actual results. The model demonstrated less prediction error compared to baseline AI methods, showcasing its potential for revolutionizing protein localization studies.
Future enhancements aim to enable PUPS to understand protein-protein interactions and make localization predictions for multiple proteins within a cell, ultimately extending its capabilities to living human tissue.



