The Protein Preparation Process

Declaration: The article was reprinted from The Protein Preparation Process.

The preparation of a protein involves a number of steps, which are outlined below. The procedure assumes that the initial protein structure is in a PDB-format file, includes a cocrystallized ligand, and does not include explicit hydrogens. The result is refined, hydrogenated structures of the ligand and the ligand-receptor complex, suitable for use with other Schrödinger products. In many cases, not all of the steps outlined below need to be performed.

  1. Import a ligand/protein cocrystallized structure, typically from the Protein Data Bank, into Maestro.
  2. Locate any waters you want to keep, then delete all others.

Water molecules that mediate receptor-ligand interactions (so-called "structural waters" that bridge the receptor and ligand by way of H-bonds) can be retained during target preparation. In the Glide docking experiment, these waters will be retained and treated as part of the receptor environment — for example, a ligand H-bond to a water molecule will receive an energetic reward, the exact value of which depends on interaction geometry and the surrounding environment (not unlike a ligand H-bond to a protein residue).

During target preparation, you will need to make an informed decision about which water molecules to retain in the active site and which water molecules should be deleted before the docking experiment is carried out. Among other things, deleting unnecessary water molecules allows the active site to accommodate novel ligands that wouldn't otherwise fit.

One way of making these informed decisions is by consulting publications that describe the active site. There are also computational tools that can help in deciding which water molecules to retain. One such computational method is to align different PDB structures of the same target, color the structures by entry number in the Workspace, and look for highly conserved water molecules. The idea here is that highly conserved water molecules are important for binding.

It is known that in some targets, a structural water can be replaced by a ligand with a functional group that forms the same H-bonds to the receptor that the water molecule did. If you suspect this may be the case for the prepared target, you may choose to retain or displace the water molecule depending on the chemotype of the ligands being docked. Such instances can be treated by preparing two versions of the target - one that retains the water and one that removes it. A single ligand library can then be docked against both target models in a single experiment using our Virtual Screening Workflow interface, which automatically sorts and filters the results.

Note that the Glide SP and XP scoring functions both include terms that are designed to account for solvation of the active site. Thus, water molecules do not need to be added to the active site in order to obtain an estimate of desolvation effects. For example, the energetics of desolvation account for the extra reward term that is incurred by hydrophobic ligand groups that are fully enclosed by hydrophobic receptor residues. Glide XP further accounts for the energetics of desolvation by placing so-called "virtual waters" in the active site to estimate water displacement and ligand-solvent interactions.

These waters are identified by the oxygen atom, and usually do not have hydrogens attached. Generally, all waters (except those coordinated to metals) are deleted, but waters that bridge between the ligand and the protein are sometimes retained. If waters are kept, hydrogens are added to them in the preparation process.

Refer to https://www.schrodinger.com/kb/31.

  1. Simplify multimeric complexes.
    • Determine whether the protein-ligand complex is a dimer or other multimer containing duplicate binding sites and duplicate chains that are redundant.
    • If the structure is a multimer with duplicate binding sites, remove redundant binding sites and the associated duplicate chains.
  2. Adjust the protein, metal ions, and cofactors.
    • Fix any serious errors in the protein. Incomplete residues are the most common errors, but are relatively harmless if they are distant from the active site. Structures that are missing residues near the active site should be repaired.
    • Check the protein structure for metal ions and cofactors.
    • If there are bonds to metal ions, delete the bonds, then adjust the formal charges of the atoms that were attached to the metal as well as the metal itself.
    • Set charges and correct atom types for any metal atoms, as needed.
    • Set bond orders and formal charges for any cofactors, as needed.
    • Fix the orientation of any misoriented groups (such as amide groups of Asn and Gln).
  3. Adjust the ligand bond orders and formal charges.

If you are working with a dimeric or large protein and two ligands exist in two active sites, the bond orders have to be corrected in both ligand structures.

  1. Adjust the ionization and tautomerization state of protein and ligand, if necessary.
  2. Refine the structure.

This step relieves any strain from the adjustments, and can also reorient groups.

  1. Review the prepared structures.
    • Examine the refined ligand/protein/water structure for correct formal charges and protonation states and make final adjustments as needed.
    • Check the orientation of water molecules and other groups, such as hydroxyls, amides, and so on.