AI-Powered Prediction: Researchers Pinpoint Protein Location Within Human Cells with Unprecedented Accuracy

In a significant leap for cell biology, researchers from MIT, Harvard University, and the Broad Institute have developed an AI-driven computational method capable of predicting the location of virtually any protein within a human cell. This innovation promises to accelerate disease diagnosis, drug target identification, and our fundamental understanding of cellular processes.

The breakthrough addresses a critical challenge in proteomics: the sheer number and variability of proteins within a cell. With approximately 70,000 different proteins and protein variants present, manually identifying their locations is an expensive and time-consuming endeavor. Mislocalized proteins can contribute to diseases like Alzheimer’s, cystic fibrosis, and cancer, making accurate localization crucial.

Existing computational techniques leverage machine learning models trained on large datasets, such as the Human Protein Atlas, which catalogs the subcellular behavior of over 13,000 proteins across more than 40 cell lines. However, even this extensive atlas only covers a fraction of the possible protein-cell line combinations.

The new approach overcomes these limitations by accurately predicting protein location in any human cell line, even for proteins and cells not previously tested. Furthermore, it achieves single-cell resolution, allowing for precise localization within individual cells rather than providing an averaged estimate across a cell type. This level of detail could be invaluable for understanding protein behavior in specific cancer cells following treatment.

The method, termed PUPS (Prediction of Unseen Proteins’ Subcellular location), combines a protein language model with a computer vision model. The protein language model captures localization-determining properties based on the protein’s amino acid sequence and 3D structure. The computer vision model, an image inpainting model, analyzes stained cell images to gather information about the cell’s type, features, and condition.

By integrating the representations from both models, PUPS can predict protein location and generate an image highlighting the predicted area within a single cell. Users input the protein’s amino acid sequence and three cell stain images (nucleus, microtubules, and endoplasmic reticulum), and PUPS automates the localization prediction.

“You could do these protein-localization experiments on a computer without having to touch any lab bench, hopefully saving yourself months of effort. While you would still need to verify the prediction, this technique could act like an initial screening of what to test for experimentally,” says Yitong Tseo, a graduate student in MIT’s Computational and Systems Biology program and co-lead author of the study published in Nature Methods.

The researchers trained PUPS with a secondary task – explicitly naming the compartment of localization – to improve its understanding of cell compartments. Training on both proteins and cell lines simultaneously further enhanced its ability to understand protein localization patterns.

The team validated PUPS’s performance through lab experiments, comparing its predictions with actual protein locations in unseen cell lines. The results demonstrated that PUPS exhibited less prediction error compared to baseline AI methods.

Future research will focus on enhancing PUPS to understand protein-protein interactions and predict the location of multiple proteins within a cell. The ultimate goal is to enable PUPS to make predictions in living human tissue rather than cultured cells.