In silico Protein Structure Prediction and Molecular Docking Analysis through Bioinformatics

Protein structure prediction and Molecular docking
analysis through bioinformatics

This article will be very helpful for students, new leaner in the field of
bioinformatics and those who are interested to explore Insilco approaches for Protein structure
prediction and molecular docking analysis.

2.1 Tools and
Software’s used for Protein structure prediction and docking analysis

2.1.1 Uniprot Knowledgebase

Universal
Protein Resource (Uniprot) acts as a central hub for protein information. The aim of the Uniprot is establishment and
formulation of stable and high quality protein databases with efficient access
strategies to facilitate protein information retrieval across multiple
databases having complementary information (Cathy et al., 2006).

2.1.2 BLAST (Basic Local Alignment Searching Tool)

BLAST
works on a heuristic algorithm. It depends on some shortcuts to execute quick
search. BLAST performs local alignment.
There
are many different types of BLAST analysis are available to user for different
sequence comparisons, e.g., a DNA query from a DNA database, a protein query
from a protein database, and a DNA query translated in all six reading frames
from a protein sequence database (Thomas
Madden., 2013).

2.1.3 MODELLER 9.14

MODELLER
is a computer program bases on homology modeling approach that generates homologous
models of protein tertiary structures as well as quaternary structures. The input
includes an alignment file of sequence to be modeled with template sequence,
atomic coordinates of templates and a simple script file. MODELLER then
automatically calculates a model having all non-hydrogen atoms. MODELLER use
scripting language and don’t have a graphical interface. It is written in
standard FORTRAN 90 and could run on UNIX, Windows, or Mac operating
systems (Narayanan et al., 2008).

2.1.4 I-TASSER

I-TASSER
web server produce automated full-length protein structures. I-TASSER Suite is
a package of standalone computer programs, developed for high-resolution
protein structure prediction, refinement, and structure-based function
annotations. The output of the server consist of five models the confidence
score, the estimated TM-score and RMSD, and the standard deviation of the
estimations (Yang Zhang., 2008).

2.1.5 M4T

M4T (Multiple Mapping Method with
Multiple Templates) is a comparative and fully automated protein structure
modeling server. It allows comparative Modelling using a combination of
multiple templates and iterative optimization of alternative alignments (Narcis
et al., 2007).

2.1.6 RaptorX

RaptorX
is a protein structure prediction server consists of three components (single-template
threading, alignment quality prediction, and multiple-template threading. RaptorX
process input sequence by predicting its secondary and tertiary structures as
well as solvent accessibility and disordered regions. ReptorX consist of a
multiple threading component and have a new module for alignment quality prediction
(Jian Peng and Jinbo Xu., 2011).

2.1.7 IntFOLD2

The
IntFOLD server integrates various cutting edge methods for structure prediction
and function from given sequence. The output from server
is presented as a simple table that summarizes overall results graphically via
plots and annotated 3D models (Daniel et
al., 2011).

2.1.8 Phyre2

Phyre2
(Protein Homology/analogY Recognition Engine) is a suite of tools to predict
and analyze protein structure and function. The aim of Phyre2 is to contribute the
biologists with a simple and intuitive interface (Kelley et al., 2015).

2.1.9 Errat

Errat
is a knowledge base program for the verification of protein structures
determined by crystallography. Errat use statistical information from real
proteins. Errat interpret the statistics of non-bonded atom-atom (Nitrogen,
Carbon and Oxygen) interaction in reported structure (Björn and Arne, 2006).

2.1.10 Ramachandran Plot

Ramachandran
plot show phi (Φ) and psi
(Ψ)
angles also known as Ramachandran angles in a protein structure. Ramachandran
plots are used to analyze the stereochemical quality of protein three
dimensional structures. The plot regions are the combinations of phi and psi
values (Oliviero and Kristina, 2013).

2.1.11 Protparam Expasy

Protparam
Expasy tool allows the computation of different physical and chemical
parameters for a given protein. The computed parameters include the molecular
weight, theoretical Iso-electric point (pI), amino acid composition, atomic
composition, extinction coefficient, estimated half-life, instability index,
aliphatic index and grand average of hydropathicity (John et al., 2005).

2.1.12 AutoDock Tools

AutoDock
is a suite of automated docking tools. Both AutoDock4 and AutoDock Vina are
able for docking of smaller molecules with few rotatable bonds. AutoDock Vina
executes faster and capable to more accurately rank extensive molecules. For
virtual screening purpose researchers should look to it first (Max et al., 2010).

2.1.13 UCSF Chimera

Chimera
is a molecular graphics package with wide range of functions that is freely
available and used for the interactive visualization and analysis of molecular
structures (Thomas et al., 2005).
Chimera also support visualization of known structures by taking advantage of the
internet based services (Eric et al.,
2004).

2.1.14 LIGPLOT

LIGPLOT
is a desktop based program that automatically develops schematic 2-D
representations of protein-ligand complexes. The output is a postscript file
that contains informative representation of intermolecular interactions. Moreover
protein and nucleic acid interactions can also be visualized from LIGPLOT
(Andrew et al., 1995).

2.1.15 Chem Draw Ultra

Chem
Draw ultra is used for ligand drawing and for energy minimization of ligand
molecules. This software is specialized in chemical structure information with
regards to processing, sorting and editing (Zhenjiang et al., 2004).

2.1.16 GRAMM-X

GRAMM-X
is a protein-protein docking web server. It provides simple interface to the
users. GRAMM-X
is implemented in C++ and python and it is freely available (Andrey and Ilya,
2006).

2.1.17 ClusPro

ClusPro
is a fully automated and web-based program for computational protein-protein
docking. The user can upload PDB file of two protein structures with PDB code
of respective structure as input. The output is a list of complexes that is
ranked according to their clustering properties (Stephen et al., 2004).

2.1.18 PyMOL

PyMOL
is a molecular graphic system that provides most of the graphics packages. PyMOL
supports most of the common representations for macromolecular structures
include the ball-and-stick, dot surfaces, solid surfaces, ribbons, and cartoon
ribbons (Warren DeLano., 2002).

2.2 Supposed Methodology for my
selected gene NPR1 in
plants

Methods were designed to get information
about the structure of NPR1. Methods include homology modeling of NPR1,
protein-ligand docking and protein-protein docking studies.The amino acid
sequence of NPR1 (593) length having accession number P93002 was retrieved from
UniProtKB in Fasta format. The structure of NPR1 was not reported in Protein
Data Bank (PDB). The critical first step in homology modeling is the
identification of the best template structure, if indeed any are available. The
sequence of NPR1 was subjected to Protein-BLAST against PDB. The suitable able
template for NPR1 was 4RLV with 25 % query coverage and 36% identity. In structural bioinformatics, the errors in
protein structure prediction considered as critical issues and numerous methods
available for in silico structure prediction while none of them considered as
error free method. Homology modeling approach was utilized for 3D structure
prediction. Homology models were built by the help of Modeller 9.14 manually. The models were
also built by different web servers (I-Tasser, M4t, IntFold2, RaptorX, and
Phyre2). All predicted models were subjected to validation tools (Errat and
Rampage). Validation tools show the quality and accuracy of predicted
structures on the basis of different parameters. Theses parameters (Quality
factor, favored regions, allowed regions and outliers) were used to make excel
sheet to select the final model among all predicted structures. The excel sheet
generate a graph from which one best or appropriate model was selected as final
model. The final model was visualized in UCSF Chimera. After this the model was
subjected to energy minimization process with Amber force field parameters.
After energy minimization process the structure final structure was subjected
to protein-docking analysis. Salicylic acid (SA), 2, 6-dichloro-isonicotinic acid
(INA) and Benzo-(1, 2, 3)-thiadiazole-7-carbothioic acid S-methyl ester, was
used as ligand molecules for docking purpose. The selected three ligand
molecules were subjected to energy minimization through 3D Chemdraw Ultra.
Docking analyses were carried out by Auto Dock Vina and Auto Dock tools 4.2.
UCSF Chimera 1.8 visualizing tool was used for interaction and binding analyses
among the atoms of receptor and ligand molecules. After protein-ligand docking
the protein-protein docking studies were performed. Protein-protein docking was
carried out by GRAMM-X and ClusPro web servers. The output from GRAMM-X and
ClusPro web servers were analyzed by PyMOL and LIGPLOT. The output from GRAMM-X
web server was interpreted by PyMOL. PyMOL extract 100 files from output. Excel
sheet was made for this out put on the basis of hydrogen-bonding, ligand
residues and receptor residues. A graph was generated from excel sheet for the
selection of final best and appropriate results. The output from ClusPro web
server was visualized in LIGPLOT. Another excel sheet was made for the analysis
of ClusPro result. The best results were finalized by graph. LIGPLOT was used
to analyze the interactions among ligand and receptor proteins.

Figure
1:
A flow chart showing the whole methodology used for protein structure prediction and docking analysis.

2.3 Summary of Tools/Software’s used

Table 1: The
analytical tools and software’s used in the present study are summarized in
table

Sr.No	Tool/Database	Output/Functions
1	UniProt	Protein sequence retrieval
2	BLAST	Compare query sequence
3	MODELLER	Structure prediction
4	I-TASSER	Structure prediction
5	M4T	Structure prediction
6	RaptorX	Structure prediction
7	IntFold2	Structure prediction
8	Phyre2	Structure prediction
9	ERRAT	Evaluation
10	Rampage	Evaluation
11	Auto Dock Tools	Docking Analyses
12	Chimera	Visualization, Interaction, Energy minimization of protein
13	LIGPLOT	Visualization of intermolecular interactions
14	ChemDraw	Ligand Drawing, Energy minimization of Ligand
15	GRAMM-X	Protein-Protein Docking
16	ClusPro	Protein-Protein Docking
17	PyMOL	Visulizatin, Docking Analyses
18	Protparam Expasy	Primary structure evaluation tool

Supposed Results of above
mentioned techniques for one my selected gene name NPR1

Due to limited resources and with increasing number of proteins it is not
always possible to analyze them at structural level. Moreover the atomic level
details of energy changes between folding and unfolding of a protein are
correctly not possible through in vivo
and in vitro techniques (Fresht and
Daggett, 2002). Therefore the majority of the proteins are structurally
analyzed by the application of computational methods like homology modeling
(Marti-Renom et al., 2000). In this
study the 3D structure of NPR1 was predicted by homology modeling approach. For
comparative modeling and evaluation different web servers were used.

3.1 Protparam analysis of NPR1

Protparam analysis was performed to analyze the general properties of NPR1.
The molecular weight of NPR1 is 66031.8 Å. Total number of amino acid residues
of NPR1 is 593. The computed value for isoelectric point (pI) of NPR1 was 5.72.
The isoelectric point (pI) will be useful for developing buffer system for
purification by isoelectric focusing method. The NPR1 has grand average of
hydropathicity (GRAVY) value of -0.213. The aliphatic index of NPR1 is 95.55.
It is defined as the relative volume of the protein occupied by aliphatic side
chain is regarded as a positive factor for increasing of thermal stability of
globular proteins.

Table 2: Physical
properties of NPR1

Property	Value
Molecular weight	66031.8
No. of amino acid residues	593
Isoelectric Point (pI)	5.72
Instability Index (II)	41.34
Aliphatic Index (AI)	95.55
GRAVY	-0.213

Protparam analysis also provides information about the number of each amino
acid and its percentage present in respective protein.

Table 3:
Distribution of Amino acid residues in NPR1

Symbols of Amino acids	Number of Amino acids	Percentage (%) Amino acid
Ala (A)	52	8.8
Arg (R)	32	5.4
Asn (N)	19	3.2
Asp (D)	42	7.1
Cys (C)	17	2.9
Gln (Q)	16	2.7
Glu (E)	49	8.3
Gly (G)	19	3.2
His (H)	12	2.0
Ile (I)	26	4.4
Leu (L)	71	12.0
Lys (K)	48	8.1
Met (M)	12	2.0
Phe (F)	20	3.4
Pro (P)	20	3.4
Ser (S)	8	8.1
Thr (T)	32	5.4
Trp (W)	0	0.0
Tyr (Y)	11	1.9
Val (V)	47	7.9
Pyl (O)	0	0.0
Sec (U)	0	0.0

3.2 Primary
sequence analysis of NPR1

The protein BLAST (pBLAST) was done to align NPR1 with PDB database for
searching the best template. The suitable template for NPR1 was 4RLV_A with 25%
query coverage. This template was used for structure prediction by MODELLER.

Figure 2: Blast results
of NPR1 sequence against PDB databases.

Table 4: Blast
results showing different templates for NPR1

Accession no. of Template	Max score	Total score	E value	Query coverage	Identity
4RLV_A	53.1	206	5e-07	25%	36%
1N11_A	49.7	87.8	6e-06	23%	37%
1N0R_A	42.0	78.6	2e-04	20%	31%
2P2C_P	41.6	41.6	6e-04	16%	33%
4HLL_A	40.8	73.9	0.001	17%	32%
2XEE_A	40.4	113	0.001	18%	30%
2XEH_A	40.4	113	0.001	18%	30%
2QYJ_A	39.3	74.7	0.003	18%	31%
1N0Q_A	37.4	72.4	0.006	17%	34%
2L6B_A	37.0	37.0	0.009	8%	38%
2XEN_A	34.3	34.3	0.053	9%	36%
4K5B_A	34.3	34.3	0.16	9%	36%
3ZU7_B	33.5	33.5	0.28	11%	35%
4RLY_A	33.1	33.1	0.74	16%	31%
3NOC_D	32.0	32.0	0.84	9%	34%
2RFA_A	32.0	32.0	1.4	9%	33%
3UTM_A	31.6	31.6	2.1	13%	36%
4R6U_C	30.4	30.4	5.3	13%	32%
3WO3_B	30.0	30.0	5.7	13%	32%
3HRA_A	29.3	29.3	9.1	8%	36%

3.3 Comparative Model Building
Assessment

In this study comparative homology
modeling approach was applied to predict best model. Different webservers
(I-Tasser, Phyre2, ReptorX, IntFold2, MODELLER, and M4t) were utilized for 3D
structure prediction of NPR1. 10 models
were built by using MODLLER 9.14 manually
based
on the crystal structure of template 4RLV. The 13 models were obtain from
I-Tasser. Phyre2, RaptorX, IntFold2 and M4t. All the predicted 3D models were
employed to validation by Errat (Björn Wallner and Arne Elofsson., 2006) and
Rampage (Oliviero Carugo and Kristina Djinović-Carugo., 2013). A comparison between MODELLER and web servers
had been analyzed by Rampage and Errat. A complete analysis of results is
plotted in Figure 3.2. In comparison among all the predicted structures, the model named
(itasser4) was selected through evaluation tools and subjected for energy
minimization procedure. Model building in generally does not refine the models.
The suitable model with satisfactory value and high amino acid ratio was chosen
for energy minimization step.

Figure
3: A
comparative model assessment plot. It shows Ramachandran most favored regions,
allowed regions, outliers and Errat values of models.

3.4 Energy Minimization

The energy minimization on the selected
model of NPR1 was employed for the stability of a protein structure. UCSF Chimera 1.8
molecular visualization software was used for energy minimization process. The protein
structure was minimized at steepest decent and conjugate gradients at 750 runs and Errat showed
an overall quality factor of 85.103 of structure (Figure 3.3).

Figure
4: Errat
results showed the 85.103 quality factor for final selected model.

Figure
5: The
minimized 3D structure of NPR1 protein

The selected 3D structure of NPR1 was
visualized by UCSF Chimera 1.8 (Figure 3.4).

3.5 Ligand Molecules

After energy minimization the structure
was ready for docking analysis. Protein-ligand docking were performed with
three ligand molecules (Salicylic acid (SA), 2, 6-dichloroisonicotinic acid
(INA) and benzo-(1, 2, 3)-thiadiazole-7-carbothioic acid S-methyl-ester (ASM).
These ligand molecules were taken from the literature. The structures were
drawing in the ChemDraw Ultra manually. Energy minimization of the ligands had
been done in the chem3D ultra and the resultant file was saved in .pdb format. The structures of the ligand molecules are
shown in the Table 3.4.

Table
5: 2D
structure of three ligand molecules used for protein-ligand docking analysis

Salicylic acid (SA)	2, 6-dichloroisonicotinic acid (INA)	Benzo-(1,2,3)-thiadiazole-7-carbothionic acid S-methyl-easter (ASM)

3.6 Docking Analysis

The aim of docking studies is to revel
the best interaction between the protein and ligand molecules. Three ligand
molecules (SA, INA and ASM) were docked with NPR1 and the results show the binding
residues that interact with ligand molecules. ASM and INA are the functional
analogues of salicylic acid (SA) and both bind at the same position where
salicylic acid bind to receptor protein (Figure 3.5, 3.6 and 3.7 respectively).
The docking was done by PyRx Auto dock vina. The results of Auto dock vina are
shown in the Table 3.5.

Table
6: Auto
dock vina results of three ligand molecules with their binding energy and amino
acid residues

Ligand Molecules	Binding Energy	Interacting Residues
Salicylic acid (SA)	-8.7	Gly-504,Lys-505,Arg-506,Phe-507,Phe-508,Pro-509,Arg-510,Cys-511,Ser-512,Ala-513,Asp-516,Ile-518
2, 6-dichloroisonicotinic acid (INA)	-9.2	Arg-506,Phe-507,Phe-508,Pro-509,Arg-510,Cys-511,Ser-512, Val-514,Ile-518
Benzo-(1,2,3)-thiadiazole-7-carbothionicacid S-methyl-easter (ASM)	-9.9	Phe-507,Phe-508,Pro-509,Arg-510,Cys-511,Ser-512,

Figure
6: NPR1
& SA docking results. It represent the binding residues of NPR1

Figure
7: Docking
results of NPR1 & INA. It represents the binding site of NPR1

Figure
8: Binding
view of ASM ligand molecule with NPR1

Docking result reveal that ASM has the
lowest binding energy. All of these three ligands have many common residues.
The residues Phe-507, Phe-508, Pro-509, Arg-510, Cys-511, and Ser-512 are the
critical residues for the receptor-ligand interaction. The resultant structure
of Auto dock vina interacting residues of each ligand is shown in (Figure 3.5,
3.6 and 3.7 respectively).

3.7 Protein-Protein Docking

3.7.1 GRAMMX Docking Analysis

After protein-ligand docking the protein-protein docking were done by
GRAMMX and ClusPro.NPR1 was used as receptor protein and NIMIN1b was used as
ligand protein. GRAMMX return a pdb file as output. This file is subjected to
PyMOL. 100 files were extracted through PyMOL. Then excel sheet for 100 models
were made on the basis of hydrogen-bonding, ligand residues and receptor
residues. The final 10 models were selected that have highest hydrogen-bonding
with the help of graph. Then from these 10 models 3 models were selected on the
basis of important residues. These three models show three different domains
(Figure 3.9, 3.10 and 3.11respectively)

Figure
9:
A graph for 100 models from GRAMMX which shows the hydrogen-bonding, ligand and
receptor residues.

Figure
10:
Docking results revel the interacting residues of receptor and ligand protein.
The green color show the ligand protein (NIMIN1b) and coral color shoe the
receptor protein (NPR1).

The computational analysis reveal
different domains. These domains reveal the binding regions of receptor and
ligand protein other than C-terminus.

Figure 11: Second binding domain of ligand and receptor protein other than c-terminus. The pink color represent receptor (NPR1) and aquamarine color show ligand (NIMIN1b)

Figure
12:
The interacting residues of receptor and ligand proteins revel third biding
domain

3.7.2 ClusPro Docking Analysis

ClusPro web server was also employed for
docking studies. The ClusPro server revel two binding domains, GRAMMX also show
these domains and that are reveled via bioinformatics computational analysis.
The ClusPro webserver generated fifteen models. A graph was made for theses
fifteen models on the basis of ligand and receptor residues and hydrogen
bonding. The final two models that shows two different domains were selected
with the help of graph.

Figure
13: A
graph for 15 models from ClusPro which shows the hydrogen-bonding, ligand and
receptor residues.

Figure
14: Binding
residues of ligand (NMIN1b) and receptor (NPR1) proteins. The golden color show
NPR1 and brown color show NIMIN1b.

The interaction shown in (Figure 3.13)
reveled that ligand and receptor binding occur at C-terminus of the receptor
protein. This domain also shown by GRAMMX results.

Topic: In silico Protein Structure Prediction and Molecular Docking Analysis through Bioinformatics

Protein structure prediction and Molecular docking analysis through bioinformatics

2.1 Tools and Software’s used for Protein structure prediction and docking analysis