General description

PepComposer is a tool for designing protein-binding peptides given a protein structure and an approximate definition of the desired binding site. The method is based on the assumption that protein-peptide interactions adopt similar structural arrangements that are present within single monomoric protein structure. It first derives a set of peptide backbones scaffolds from monomeric proteins that harbor the same backbone arrangement as the binding site of the protein of interest and retrieve putative interacting peptide backbones from them. Next, it uses Monte Carlo to design optimal sequences for the identified peptide scaffolds.

Glossary of terms used in this document

  • Query region: the region of the input target protein indicated by the user as containing the desired binding site
  • Hit protein: the database protein that harbors a region similar to the query region
  • Target region: the region of the hit protein that is similar to the query region
  • Scaffold: the backbone of the hit protein region used as template for designing the peptide sequence after superimposing the target region to the query region. Each scaffold can generate more peptides differing for the sequence of their designed side chains

Input

The server only requires a PDB file of the target protein and an approximate definition of the desired binding site (query region) as input. The file can be uploaded or directly downloaded from the PDB database by inserting its PDB ID. An optional email address to be informed about the job completion can be provided.

If the PDB file contains more than one chain by default all chains will be used, the user can specify specific chains by providing chain names. For example:

A,C

will tell the server to only use chains A and C from the provided PDB structure.
The structure of the target protein is displayed in the web page as soon as the file is uploaded.


Notes:
The chain identifier "P" should not be used as it denotes the designed peptide in the output file.
Only atoms specified in the ATOM PDB records are considered.

The binding site (query region) can be defined in two ways:

  • A set of residues separated by commas. Example: 13:A, 25:A,15:C to indicate residue 13 (as named in the residue name column in the PDB) from chain A, residue 25 from the same chain, etc.
  • A single residue. In this case all residues having at least one atom within a user-specified distance (default = 7.5 Ang) from the listed residue form the query region .
    Note: The larger the radius around the selected residue, the longer will the job run.

Upon clicking the button “Check binding site in Jmol and confirm your selection”, the selected residue(s) and, if the second option is selected, the region corresponding to the binding site (query region) are highlighted (in darkcyan and darkviolet, respectively) in the input structure in the graphics window.

Responsive image

Once the process is started (by clicking the “Submit” button), a page containing a Job Id that can be bookmarked will be displayed:

Responsive image

If the user provided an email, she/he will receive a mail from pepcomposer.biocomputing@gmail.com with the Job Id and another mail upon job completion.

The Job Id can be inserted in the box in the left most part of the input page to verify the status of the job and to visualize the results upon completion.

Responsive image

Results are kept on the server for two weeks

Output

The output web page includes a main table with a list of representative scaffolds. For each scaffold, the following information is provided:

  • The designed peptide sequence with the lowest FoldX energy
  • The average FoldX binding energy [kcal/mol] of the generated complexes for that backbone
  • A sequence logo obtained using all generated sequences for that backbone
  • A radio button to display the list of peptides obtained from the same backbone (right table *) and the complex structure of the representative one (the one with the lowest energy from the most populated group) in the jsmol window (http://www.jmol.org/)
  • A button to the download the coordinates of the latter complex in PDB format
Responsive image

* The list of peptides includes the number of times each sequence has been obtained and the corresponding representative peptide. The protein peptide complexes can downloaded in PDB format (chain P denotes the peptide).

Error messages

  • PDB code is not valid! – The PDB file cannot be retrieved from the PDB database
  • File type not allowed! – The uploaded file is not in the PDB format
  • Chain ID should be single character Please check chain: ‘xxx’ – Different chains should be separated by commas
  • The selected chain ID is not present in the PDB file XXX.pdb: a Be aware that in the PDB format the chain is case sensitive – The chain does not exist in the file
  • The chain of the chosen residue(s) has not been selected or does not exist – The chain of the chosen residue(s) is not present in the PDB file
  • The specified residue is not present in the PDB file XXX.pdb: XXX:x – The residue does not exist in the PDB file
  • Please confirm your input data before submitting – The user should confirm the data and visualize it in the graphics window clicking the appropriate button before submitting
  • Please provide a valid e-mail address – The string is not a valid email address

Short method overview

We first search for regions sharing local backbone similarity with the query region in a non­redundant set of protein structure chains (chains from PDB filtered for redundancy at the level of 70% sequence identity) using TriangleMatch [1] with default parameters. TriangleMatch identifies similar backbone arrangements in an amino acid sequence and order independent fashion.
The resulting superpositions are sorted by their size i.e., the number of residues superposed between the query region and the hit protein.
For the 1,000 largest hit regions, contiguous backbone segments that are in contact with the hit regions in the corresponding hit proteins are retrieved. The contacts are defined based on Almost Delaunay tessellation for Cα atoms using the ADCGAL program [2] with default parameters.
Only backbone scaffolds equal to or longer than four residues, with extended conformation are selected.
The selected backbone scaffolds are merged with the query protein into protein­-peptide complexes based on the superposition of the query and the hit regions. These segments are then used as backbone scaffolds in the sequence design step. This is done using a flexible backbone design protocol based on the PyRosetta [3] package. We perform both structure diversification (small rigid­ body movements of the peptide and small local changes (backrubmoves)) in both the peptide and the query region and sequence design.
The amino acid sequence of the peptide is optimized using a standard simulated annealing Monte Carlo method implemented in PyRosetta where rotamers are changed in both the protein and the peptide and residues are substituted only in the peptide. Finally a further structure refinement is performed on the protein-peptide complex following the Rosetta Classic Relax protocol [4]. In this final step the backbone of the protein is kept fixed.
The models differing by more than 1Å from the initial protein­ peptide structure and the 25% of structures with the worst Rosetta score are filtered out in order to avoid both significantly distorted structures and peptide conformations that deviate too much from the starting backbone scaffold.
10% of the remaining peptides with the lowest binding energy calculated with FoldX [5] are selected and clustered by sequence identity. The average FoldX binding energy for each cluster is computed by averaging the energy of its members. The sequence of the peptide with the lowest FoldX energy is selected as a representative of each backbone scaffold.

  1. Wolfson,H.J. and Rigoutsos,I. (1997) Geometric hashing: an overview. IEEE Computational Science & Engineering, 4, 10-21
  2. Hagberg,A.A. et al. Exploring Network Structure, Dynamics, and Function using NetworkX. In, Proceedings of the 7th Python in Science conference (SciPy 2008),. G Varoquaux, T Vaught, J Millman (Eds.), pp. 11-15
  3. Chaudhury,S. et al. (2010) PyRosetta: a script-based interface for implementing molecular modeling algorithms using Rosetta. Bioinformatics, 26, 689-691
  4. Bradley, P. et al. (2005) Toward high-resolution de novo structure prediction for small proteins. Science., 309(5742):1868-71.
  5. Schymkowitz,J. et al. (2005) The FoldX web server: an online force field. Nucleic Acids Res., 33, W382-388