About the XtalPred-RF server

About the XtalPred server

Job submission

Input sequence

Paste your sequence or sequences in FASTA format. At most 10 sequences can be submitted as a single job. The XtalPred server accepts sequences between 50 and 1000 residues (there is no enough data to train the method for sequences outside this range).

Optional features

Select checkbox "Also submit this job to the SERp server for analysis" to perform Surface Entropy Reduction prediction at SERp server.

Select checkbox "Find close bacterial homologs more likely to crystallize" to get a list of target's homologs in complete microbial genomes with the information about their crystalizability class. The list also contains links to detailed information about each homolog. Currently XtalPred uses pre-calculated results for 487 complete microbial genomes, i.e. 1,549,504 protein sequences. Selecting this feature increases the calculation time.

You can provide your email address at the time of submission or later. Once the job is finished, the link to the results will be sent to this address.

Server policy

The XtalPred web server is available only for non-profit and academic users. Other users are asked to contact lukaszj@medsch.ucr.edu
(XtalPred uses several external programs which are freely available only for non-profit and academic users - see here).

Retrieve the old results

Paste job id provided after job submission - for example: webdb/1182365731.4638
The results are kept on the server for 7 days.

Crystallization prediction

Statistics of protein features

XtalPred compares predicted biochemical and biophysical features of the submitted protein with corresponding distributions of crystallization probability calculated from TargetDB. The plots show distributions of crystallization failures and successes in the learning sets extracted from TargetDB, the corresponding distributions of crystallization probability, and indicate values of these features calculated or predicted for the submitted protein (see Figure 1).

Figure 1. Comparison of features of the submitted protein with distributions of crystallization probability calculated from TargetDB.

Crystallization class by Expert Pool method

The prediction is made by combining individual crystallization probabilities calculated for eight protein features into a single crystallization score. Based on this score, the protein is assigned to one of the five crystallization classes. Each class represents different crystallization success rate observed in TargetDB (see Figure 2.). The server shows the distribution of the graph of relative crystallization probability in these crystallization classes and the position of the submitted protein in this distribution (see Figure 3). The used features are: length, isoelectric point, gravy index, predicted structural disorder, instability index, predicted coil secondary structure, predicted coiled-coil structure and insertion score. For details see references 1-3 below.

Figure 3. Example of crystallization classification.	Figure 2. Distribution of targets into crystallization classes and observed successes and failures in protein crystallization.

Crystallization classification by Random Forest Classifier

In XtalPred-RF the list of protein features has been extended by adding predicted surface ruggedness, hydrophobicity, side-chain entropy of surface residues, and amino-acid composition of the predicted protein surface. Then, Random Forest classifier was used to predict protein crystallizability class. It resulted in almost two-fold improvement of the prediction of crystallization success as compared to the original XtalPred version from 2007. For details see reference 4 below.

Training and testing set used in the development of XtalPred-RF

Lists of TargetTrack IDs of PSI targets used as training and testing sets for XtalPred-RF and their sequences are available at: http://ffas.godziklab.org/XtalPred/data.tar

File names of the lists of positive and negative cases in training and testing sets:

learn.pos.txt - training set, positive cases (2265 solved structures)
learn.neg.txt - training set, negative cases (2355 targets which failed to crystallize)

test.pos.txt - testing set, positive cases (2445 solved structures)
test.neg.txt - testing set, negative cases (2440 targets which failed to crystallize)

fasta.txt - sequences of all targets from training and testing set.

Notice:

XtalPred crystallizability classification is based on statistics on non-secreted wild-type microbial proteins and is optimized for identifying the most promising crystallization targets from large protein families. XtalPred is also helpful in construct design, although crystalizability class itself is usually not a sufficient criterion to find precise construct boundaries.

References

Publications

The XtalPred server
1. Slabinski L, Jaroszewski L, Rychlewski L, Wilson I.A., Lesley S.A., Godzik A. XtalPred: a web server for prediction of protein crystallizability. Bioinformatics, 2007 23(24):3403-5. [PubMed]

The method
2. Slabinski L, Jaroszewski L, Rodrigues A.P.C., Rychlewski L, Wilson I.A., Lesley S.A., Godzik, A. The challenge of protein structure determination - lessons from structural genomics. Protein Science, 2007 16(11):2472-82. [PubMed]

3. Jaroszewski L, Slabinski L, Wooley J, Deacon AM, Lesley SA, Wilson IA, Godzik A. Genome pool strategy for structural coverage of protein families. Structure. 2008 Nov 12;16(11):1659-67. [PubMed]

4. Jahandideh S, Jaroszewski L, Godzik A. Improving the chances of successful protein structure determination with a random forest classifier. Acta Crystallogr D Biol Crystallogr. 2014 Mar;70(Pt 3):627-35 [PubMed]

External software

The XtalPred web server uses several programs (many of them are freely available only for non-profit and academic users) for calculation and prediction of protein features:

PSI-BLAST - homology searches. Ref.: Altschul, SF, W Gish, W Miller, EW Myers, and DJ Lipman. Basic local alignment search tool. J Mol Biol, 1990 215(3):403-10.

PSIPRED - secondary structure prediction. Ref.: Jones, D.T. Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol, 1999 292: 195-202.

DISOPRED2 - prediction of structurally disordered regions. Ref.: Ward, J.J., Sodhi, J.S., McGuffin, L.J., Buxton, B.F., and Jones, D.T. Prediction and functional analysis of native disorder in proteins from the three kingdoms of life. J Mol Biol, 2004 337: 635-645.

COILS - prediction of coiled-coil regions. Ref.: Lupas, A., Van Dyke, M., and Stock, J. 1991. Predicting coiled coils from protein sequences. Science 252: 1162-1164.

TMHMM - prediction of transmembrane helices. Ref.: Krogh, A., Larsson, B., von Heijne, G., and Sonnhammer, E.L. 2001. Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J Mol Biol 305: 567-580.

SEG - calculation of low-complexity regions. Ref.: Wootton, J.C. 1994. Non-globular domains in protein sequences: automated segmentation using complexity measures. Comput Chem 18: 269-285.

RPSP - prediction of signal peptides. Ref.: Plewczynski, D., Slabinski, L., Tkacz, A., Kajan, L., Holm, L., Ginalski, K., Rychlewski, L. The RPSP: web server for prediction of signal peptides. 2007.Polymer 48: 5493-5496.

NetSurfP - prediction of protein surface accessibility. Ref.: A generic method for assignment of reliability scores applied to solvent accessibility predictions. Petersen, B., Petersen, T.N., Andersen, P., Nielsen, M. and Lundegaard, C. BMC Structural Biology 2009, 9:51

About XtalPred | References | Contact Us | Pre-calculated Results

This server is supported by the NIH Protein Structure Initiative grants: P20 GM076221 and U54 GM074898 (JCSG).

Last update May 15, 2015