Abstract
In this thesis, the author pursues the target of improving accuracy of protein structural prediction through the procedure of data purification. A Protein Attributes Microtuning System (PAMS) is developed to prepare a variety of new datasets as and when required. Furthermore, a Protein Structural Accuracy Reckoner (PSAR) framework is used to recommend procedures that might lead to high prediction accuracy. By using the PSAR, it is shown that using a refined dataset generated by the PAMS, and implementing an appropriate window mechanism considerably improves the accuracy of protein structure prediction by 12%, giving a best accuracy of 90.97%. On average, almost all classifiers that are applied in the experiments result in accuracy increases of 10%-15%. A list of classifiers is categorized according to their prediction performances and classification efficiencies. A few refined datasets are proposed as benchmark datasets. Apart from the aforementioned achievements, examination of a total of 3,135,393 predictions tasks, which carried out by the PSAR framework, yielded 139 'best' and 73 'worst' combinations of amino acid features descriptors. In this analysis, the 'best' prediction gave 82.34%, and the 'worst' prediction gave 73.65%. To achieve a greater computational capacity the PSAR infrastructure is hosted on the Condor platform in the Department of Computing, University of Surrey. (Abstract shortened by ProQuest.).