Schrödinger Notes—Similarity Search

Declaration

This note is based on the article, “Rapid Screening of Chemical Libraries with GPU Shape”1, and created with the Schrödinger Software Release 2023-4.

This note contains only minimal annotations to the original text, along with corrections to formatting errors. It is intended for educational and communicative purposes only, and all rights remain with the original author.

Introduction

In this tutorial, you will learn how to perform rapid shape-based screening of a chemical library with Shape GPU. We will use information from nearly 70 CDK2 small-molecule inhibitors to evaluate a library of compounds provided by DUD-E for their propensity to bind CDK2  (http://dude.docking.org/). We will then run a screen on GPU using Shape GPU, and perform enrichment calculations using the true actives in the dataset as provided by DUD-E.

1 Creating Projects and Importing Structures

At the start of the session, change the file path to your chosen Working Directory in Maestro to make file navigation easier. Each session in Maestro begins with a default Scratch Project, which is not saved. A Maestro project stores all your data and has a .prj extension. A project may contain numerous entries corresponding to imported structures, as well as the output of modeling-related tasks. Once a project is created, the project is automatically saved each time a change is made.

  1. Double-click the Maestro icon

Figure 1-1. Change Working Directory option.

  1. Go to File > Change Working Directory

  2. Find your directory, and click Choose

  3. Pre-generated input and results files are included for running jobs or examining output. Download the zip file here: https://www.schrodinger.com/sites/default/files/s3/release/current/Tutorials/zip/gpu_shape.zip

  4. After downloading the zip file, unzip the contents in your Working Directorythe location that files are saved for ease of access throughout the tutorial  

Figure 1-2. Save Project panel.

  1. Go to File > Open Project

  2. Select CDK2_screen.prjzip and click Open

  3. In Save Scratch Project, click OK

  4. Go to File > Save Project As

  5. Change the File name to GPU-Shape, click Save

2 Selecting Template (Probe) Molecules for the Screen

The first step in executing a Shape screen of a chemical library is to construct a shape-based model of known small-molecule binders. Each molecule in the screening deck will be analyzed in the same way and compared with the profiles of each of the template or probe molecules. The ability of Shape to recover true actives is influenced by the diversity and number of probes. Increasing the number of diverse probes generally improves performance, and begins to saturate around 10 molecules. As we have 127 actives to choose from, we will start by selecting a diverse subset of these molecules using clustering tools available in Maestro.

Figure 2-1. Fingerprint Similarity in Ligand-Based Virtual Screening.

  1. Select the CDK2-Actives group in the Entry List
  • Go to Tasks > Browse > Ligand-Based Virtual Screening > Fingerprint Similarity
  1. The Canvas Similarity and Clustering panel opens

Figure 2-2. Fingerprints tab in Canvas Similarity and Clustering panel.

  1. For Precision, select 32-bit

  2. For Fingerprint type select Dendritic

  3. For Atom Typing Scheme, choose 4. Atoms distinguished by functional type: {H}, {C}, {F,Cl}, {Br,I}, {N,O}, {S}, {other}; bonds by hybridization

Figure 2-3. Similarity tab in the Canvas Similarity and Clustering panel.

  1. Go to the Similarity tab

  2. Set the Similarity metric to Tanimoto

Figure 2-4. Apply Clustering.

  1. Go to the Cluster tab

  2. For Linkage method, choose Centroid

  3. Click Calculate Clustering

  4. Next to number of clusters, type 10

  5. For Apply Clustering, choose A group containing the structures nearest the centroid in each cluster

  6. Click Apply Clustering

  • A new group has been added to the Entry List

  • The representative molecules from this group will be the probe molecules for the Shape screen

3 Preparing a Screening Deck for Shape

Figure 3-1. Create Shape Data File in Ligand-Based Virtual Screening.

  1. Go to Tasks > Browse > Ligand-Based Virtual Screening > Create Shape Data File (for GPU)

Figure 3-2. Create Shape Data File panel.

  1. For Use ligands from, choose File

  2. For Input structure file, click Browse and locate cdk2-screen-deck-shuffled.maegz

Note: The input ligands have already been prepared with LigPrep

  1. For Conformers, select Generate ligand conformers and leave it set to Rapid

Both ligand preparation and conformer generation are unnecessary when using a Phase Database as the input. Typed pharmacophore is the recommended Shape type and requires that a shape from a probe molecule will only match a ligand from the screening deck when the two spheres have matching pharmacophore types

Figure 3-3. Run Create Shape Data File job.

  1. Change Job name to shape_data_CDK2

  2. Click Run

  • This job takes ~15 minutes

  • To save time, we will look at pregenerated results

4 Running GPU Shape Screen

Figure 4-1. Shape Screening in Ligand-Based Virtual Screening.

  1. Go to Tasks > Browse > Ligand-Based Virtual Screening > Shape Screening

  2. Select the header of the Representative Entries group

Figure 4-2. Load query file in Shape Screening panel.

  1. For Use shape query from, choose Project Table

  2. For Run screen on, choose GPU

Note: The GPU option for Run screen on is unavailable unless you have a machine with a GPU specified in your host file

Figure 4-3. Set up Shape Screening job.

  1. For Screen structures in, choose Shape data file (local)

  2. Click Browse, locate shape_data_cdk2.bin and click Open

  3. Click Screening Settings

  4. For Maximum number of structures to save, type 1000

  5. Click OK

  6. Click the Job Settings (cog) button

  • The Job Settings panel opens

If you check Include PDF report a PDF report of the top X matched ligands will be generated along with their alignment to a single probe (if you have selected multiple probes it will run just the first probe and show the results in the PDF)

Figure 4-4. Run Shape Screening job.

  1. For Incorporate, select Append new entries as a new group

  2. Change Job name to cdk2-shape-screen

  3. Select your desired GPU host

  4. Click Run

  • The job will take a few minutes to complete

  • Results from the screen can also be found in cdk2-shape-screen_test-out.maegz

1 GPU will be used by default. Jobs can be split across several GPUs from the command line. Job runtime varies depending on the graphics card used

5 Analyzing Shape Screening Results

In order to determine the impact of the Shape screen, we will look toward various enrichment metrics. These metrics help quantify a screening method’s effectiveness in differentiating true-actives from decoys.

5.1 Analyze enrichment for a single probe

Figure 5-1. Import Shape Screening results.

  1. Go to File > Import Structures

  2. Select cdk2-shape-screen-out.maegz (or cdk2-shape-screen_test-out.maegz) and click Open

  3. In the Entry List, select the top group labeled query 1

Figure 5-2. Enrichment Calculator in Receptor-Based Virtual Screening.

  1. Go to Tasks > Browse > Receptor-Based Virtual Screening > Enrichment Calculator

Figure 5-3. Set up Enrichment Calculator job.

  1. For Use structures from, select Project Table (1000 selected entries)

  2. For Actives file click Browse, locate cdk2-true-active-titles.txt and click Open

  3. For Number of decoys, type 19757

Figure 5-4. Run Enrichment Calculator job.

  1. Change Job name to cdk2-enrichment-1aq1

  2. Click Run

  • The Enrichment Report populates with data within a few seconds

Figure 5-5. ROC Plot.

  1. Click ROC Plot
  • The ROC Plot illustrates the performances of the screen in terms of active recovery compared with random

5.2 Analyze enrichment across all probes

To illustrate the impact of using additional probe molecules on the performance, we will merge the results across all of the probe molecules and repeat the enrichment calculation.

Figure 5-6. Query groups selected in the Entry List.

  1. In the Entry List, select each of the query groups

Figure 5-7. Duplicate selected structures.

  1. Right-click on the selected groups and select Duplicate > Into New Group
  • The Duplicate into New Group panel opens

Figure 5-8. Duplicate into New Group panel.

  1. For New group title, type cdk2-screen-merge

  2. For Location of new group, choose At top level

  3. Click Duplicate

  • A new group has been added to the bottom of the Entry List

Figure 5-9. Ungroup the groups.

  1. Select the cdk2_screen-merge group

  2. Right-click on the selected groups and select Ungroup groups

Note: This is a shortcut to ungroup the groups for each of the the individual probes so we can group them all together in the same group (without any subgroups) in the next step

Figure 5-10. Re-group the outputs.

  1. Right-click one of the selected compounds and select Group
  • The  Group Selected Entries & Groups panel opens
  1. For New group title type cdk2-screen-merge

  2. For Location of new group, choose At top level and First select row

  3. Click Create Group

Figure 5-11. Add Shape Sim property.

  1. At the top right corner of the Entry List, click the Settings button (cog)

  2. Choose Show Property

  3. Click Choose

  4. Type Shape Sim

  5. Choose Shape Sim

  6. Click OK

Figure 5-12. Sort structures by Shape Similarity Score.

  1. Select the cdk2-screen-merge group in the Entry List

  2. Right-click the Shape Sim column header

  3. Select Sort Selected (Descending)

All of the grouping and ungrouping was necessary to get the output from all of the probes together in the same group so they could be sorted by Shape Sim

Figure 5-13. Run Enrichment Calculator job.

  1. Open the Enrichment Calculator panel

  2. Change Job name to cdk2-enrichment-merge

  3. Click Run

  • The Enrichment Report populates with data within a few seconds

Note: There is a clear improvement across all enrichment metrics compared to using a single probe alone

Figure 5-14. ROC Plot.

  1. Click ROC Plot

Note: The ROC Plot improves considerably with the addition of the remaining probes

6 Conclusion

In this tutorial, we screened over 20,000 compounds using GPU Shape. The diverse probe molecules were selected by clustering the known actives and retrieving representative structures. We then used those probes to screen our library of compounds and evaluate the enrichment with just one of the probes and then with all 10 probes.

References

[1] https://www.schrodinger.com/sites/default/files/s3/release/2023-4/Documentation/html/tutorials/gpu_shape/gpu_shape.htm