Methods
This page contains brief and concise information about the methodology and steps opted for the development of PlantPepDB and further analysis conducted on the data.
Data Collection
Plant peptide data was collected from 12 different databases and various published articles. To download the data we used the options available on each database or in case there wasn't any option to download the data we used our inhouse perl scripts to retrieve the data. Extensive manual curation was done on the collected data after which it was converted to tabular format so that it could be integrated with the database user interface
Homology modelling
Out of 6172 peptide entries, 3199 peptides were modelled using MODELLER v9.21. Templates for modelling was downloaded from PDB database. Basic modelling was performed using single template for every peptide. Top hit in the BLAST search was downloaded from PDB and aligned with our query sequence. 5 models were made for every peptide and the model with the lowest DOPE score was considered as final model.
De-novo modelling
Total 1111 peptide sequences were modelled using de-novo approach as these did not have any template with significant homology. For this, we used I-TASSER suite v5.1 (standalone version) to model the structures. The process was run in parallel manner so that multiple cores can be used at a time.
$pkgdir/I-TASSERmod/runI-TASSER.pl -libdir /home/yourname/ITLIB -seqname example -datadir /home/yourname/I-TASSER5.1/seq.fasta -light true -traj false -runstyle gnuparallel
Physico-Chemical Property extraction
The physico chemical properties of all the peptides was calcuated using ProtParam tool. We used the offine method since we had a lot of entries. Biopython module SeqUtils
was used from which ProtParam was imported.
>>> from Bio.SeqUtils.ProtParam import ProteinAnalysis
Further and inhouse python script was used to run every analysis like amino acid count, amino acid percent, isoelectric point, molecular weight, GRAVY index, aromaticity, instability index, atomic composition, molar extinction coefficient and secondary structure fraction on all the peptide entries.
However, some properites like total number of positively and negatively charged residues and aliphatic index had no options to be calculated using ProtParam in Biopython but the formula to calculate them was given on Expasy Proteomics Server and we used that to calculate them via inhouse bash scripts. Atomic Composition of peptides was calculated by PROTEOMICS TOOLKIT
DSSP State
The DSSP (Define Secondary Structure of Proteins) state information was calculate using API (Application Program Interface) perl program from WHAT-IF Server for all the peptide sequences which have a 3D modelled structure.
perl pdb_to_dssp.pl input_pdb_structure.pdb output_dssp_file.dssp