Yingying Xu (TALK)
Knowledge of protein subcellular location plays an important role in understanding protein functions and activities in cells. Approximately 30% human proteins are multi-label proteins that localize at more than one subcellular organelle, and automated predicting the distribution patterns of these proteins is a long-term and challenging work. Compared with the protein sequence data used in conventional subcellular location studies, bioimages can describe the complex distributions of proteins as well as their variations among different cell types and states, so some studies have begun to analyze subcellular locations based on biological images in recent years.
In our study, immunohistochemistry images from the human protein atlas database were used as data source, and an image-based multi-label protein subcellular location predictor iLocator was established. To build this predictor, we studied the ability of global and local image features in describing subcellular patterns, and proposed a chained multi-label classification algorithm and dynamic threshold criterion to distinguish multiple subcellular classes. In addition, we used unsupervised topic models to learn the subcellular position of the proteins, and quantified the distribution fractions of proteins in different subcellular structures. In terms of application, the predictor was applied to screening potential biomarker proteins that have significant translocations between normal and cancer tissues. Many of the screened proteins have been found support literatures of biological experiments, demonstrating that the iLocator is effective in recognizing both protein subcellular locations and location variations.