
AI Predicts Protein Location Within Human Cells with Unprecedented Accuracy
In a significant leap for cell biology, researchers at MIT, Harvard University, and the Broad Institute have developed a novel AI-driven method capable of predicting the precise location of virtually any protein within a human cell. This breakthrough, published in Nature Methods, promises to accelerate disease diagnosis, drug discovery, and our fundamental understanding of cellular processes.
The challenge lies in the sheer complexity of the human proteome. With approximately 70,000 different proteins and protein variants residing within a single cell, manually identifying their locations—a crucial determinant of their function and potential role in diseases like Alzheimer’s, cystic fibrosis, and cancer—is an arduous and resource-intensive task.
Existing computational techniques, often relying on machine-learning models trained on datasets like the Human Protein Atlas, have made inroads. However, even the largest datasets have only scratched the surface of possible protein-cell line combinations.
The new approach overcomes these limitations with a two-part method called PUPS (prediction of unseen proteins’ subcellular location). PUPS combines a protein language model, which captures the localization-determining properties of a protein based on its amino acid sequence and 3D structure, with an image inpainting model. The image inpainting model analyzes stained cell images to gather information about the cell’s state, including its type and individual features.
Unlike many AI methods that provide an averaged estimate across all cells of a specific type, PUPS offers single-cell localization. This level of precision allows researchers to pinpoint a protein’s location within a specific cell, such as a cancer cell after treatment.
The output is a highlighted image of a cell, clearly indicating the model’s predicted location of the protein. According to Yitong Tseo, a graduate student at MIT and co-lead author of the paper, this technique can potentially save researchers months of effort by acting as an initial screening tool, guiding experimental testing.
The researchers trained PUPS using a secondary task, explicitly naming the compartment of localization (e.g., cell nucleus). This enhanced the model’s understanding of cellular compartments. By training the model on proteins and cell lines simultaneously, PUPS develops a deeper understanding of protein localization within cell images. The model can even discern how different parts of a protein’s sequence contribute to its overall localization.
Xinyi Zhang, a graduate student at MIT and the Broad Institute, emphasizes the uniqueness of their approach: “Most other methods usually require you to have a stain of the protein first, so you’ve already seen it in your training data. Our approach is unique in that it can generalize across proteins and cell lines at the same time.”
The researchers validated PUPS’s predictive capabilities through lab experiments, comparing the results to a baseline AI method and demonstrating a reduction in prediction error. Future efforts will focus on enhancing PUPS to understand protein-protein interactions and enable predictions in living human tissue.



