TY - JOUR
T1 - Co-clustering analysis of protein secondary structures
AU - Ma, Lichun
AU - Wang, Debby D.
AU - Liu, Xinyu
AU - Zou, Bin
AU - Yan, Hong
N1 - Publisher Copyright:
© 2017 Bentham Science Publishers.
PY - 2017/6/1
Y1 - 2017/6/1
N2 - Background: The protein secondary structure provides a crucial link between a protein sequence and its final 3D structure. Thus, accurate prediction of protein secondary structure becomes very important. Objective: In this study, we try to obtain a subset of highly regular features of the protein secondary structures. Then these features can be used in the prediction of other chains’ secondary structures. Method: The experiment data was obtained from the Dictionary of Protein Secondary Structure (DSSP), in which eight types of secondary structures are defined. We carried out statistical analysis of the amino acids for each type of secondary structure and then concentrated our attention on α-helix and β-strand, the two most common regular secondary structures. The features of amino acids, neighbors, and hydrogen bonds (α-helix) were extracted. Then a co-clustering based method was conducted to analyze α-helix and β-strand chain-feature matrices, respectively. Results and Conclusion: By using the features obtained from the co-clustering process, we are able to predict other chains’ structures. The prediction performs well for β-strands and long α-helices but poorly for short α-helices. Then, we further represented the features of each short α-helix by a vector. Afterwards, the prediction was made by comparing the testing vector and the training vectors in coclusters. Results show that the testing accuracy for short α-helices can reach 96% when using amino acid features as a vector. Therefore, the secondary structure of a protein sequence can be predicted with a high accuracy by using the co-clustering based method.
AB - Background: The protein secondary structure provides a crucial link between a protein sequence and its final 3D structure. Thus, accurate prediction of protein secondary structure becomes very important. Objective: In this study, we try to obtain a subset of highly regular features of the protein secondary structures. Then these features can be used in the prediction of other chains’ secondary structures. Method: The experiment data was obtained from the Dictionary of Protein Secondary Structure (DSSP), in which eight types of secondary structures are defined. We carried out statistical analysis of the amino acids for each type of secondary structure and then concentrated our attention on α-helix and β-strand, the two most common regular secondary structures. The features of amino acids, neighbors, and hydrogen bonds (α-helix) were extracted. Then a co-clustering based method was conducted to analyze α-helix and β-strand chain-feature matrices, respectively. Results and Conclusion: By using the features obtained from the co-clustering process, we are able to predict other chains’ structures. The prediction performs well for β-strands and long α-helices but poorly for short α-helices. Then, we further represented the features of each short α-helix by a vector. Afterwards, the prediction was made by comparing the testing vector and the training vectors in coclusters. Results show that the testing accuracy for short α-helices can reach 96% when using amino acid features as a vector. Therefore, the secondary structure of a protein sequence can be predicted with a high accuracy by using the co-clustering based method.
KW - Clustering
KW - Co-clustering
KW - Protein secondary structure
KW - α-helix
KW - β-strand
UR - http://www.scopus.com/inward/record.url?scp=85020897331&partnerID=8YFLogxK
U2 - 10.2174/1574893612666170111145319
DO - 10.2174/1574893612666170111145319
M3 - Article
AN - SCOPUS:85020897331
SN - 1574-8936
VL - 12
SP - 213
EP - 224
JO - Current Bioinformatics
JF - Current Bioinformatics
IS - 3
ER -