FTMap: fast and free* druggable hotspot prediction

*free to academics

FTMap is a useful and fast online tool that attempts to mimic experimental fragment-screening methodologies (SAR-by-NMR and X-ray crystallography) using in silico methods.   The algorithm is based on the premise that ligand binding sites in proteins often show “hotspots” that contribute most of the free energy to binding.

Often, fragment screening will identify these hotspots when clusters of different types of fragments all bind to the same subsite of a larger binding site.   In fact, x-ray crystallography studies of protein structures solved in a variety of organic solvents demonstrate that small organic fragments often form clusters in active sites.

In the FTMap approach, small organic probes are used for an initial rigid-body docking against the entire protein surface.  The “FT” of FTMap stands for the use of fast Fourier transform (FFT) methods to quickly sample billions of probe positions while calculating accurate energies based on a robust energy expression.

Following docking of each probe, thousands of poses are energy-minimized and clustered based on proximity.  The clusters are then ranked for lowest energy.   Consensus sites (“hot spots”) are determined by looking for overlapping clusters of different types of probes within several angstroms of each other.   If several consensus sites appear near each other on the protein surface, that is a strong indication of a potentially druggable binding site.

Virtual screening capability for under $5K?

Many early stage companies may be missing out on the value that docking can provide at the validated hit and hit-to-lead stages of development, where structure/activity relationships (SAR) can help guide chemistry development of lead compounds.

While docking large HTS libraries with millions of compounds may require specialized CPU clusters, docking of small libraries (i.e., thousands of compounds) and SAR compounds from experimental assays is readily achievable in short time frames with a relatively inexpensive Intel Xeon workstation.

Following an initial investment in the workstation and software, follow-on costs are minimal (e.g., electricity, IT support and data backup). Turnaround times may be faster than with CRO services.  Also, sensitive IP data is also protected by being retained onsite and not transmitted over the internet.

Equipment / cost breakdown:

Software:

AutoDock Vina (non-restrictive commercial license)       cost: free

Accurate (benchmarked against 6 other commercial docking programs)

Compatible with AutoDock tools

Optimized for speed (orders of magnitude faster than previous generation)

Parallelized code for multi-core systems

AutoDock Tools (non-restrictive commercial license)      cost: free

PyMol Incentive (commercial license)                                        cost ~$90 / mo

Visualize docking results, free plugin can allow Vina to be run within PyMol GUI

Fedora Linux                                                                                              cost: free

Hardware:

HP Z620 Workstation (stock configuration)                          cost: $2999

2 GHz (6 Core) Intel Xeon E5-2620 2GHz

USB keyboard and mouse                                                                  cost: $50

Dell Ultrasharp 27” LED monitor                                                  cost: $649

1TB USB HD for data backup                                                          cost: $150

IT support for initial setup ~ 4 hours                                           cost: $400

Total initial capital expenditure:                                                  ~$4350

 

 

Using R to automate ROC analysis

ROC analysis is used in many types of research.  I use it to examine the ability of molecular docking to enrich a list of poses for experimental hits.  This is a pretty standard way to compare the effectiveness of docking methodologies and make adjustments in computational parameters.

An example ROC plot on a randomly generated dataset
An example ROC plot on randomized data

Normally this kind of plot would take at least an hour to make by hand in Excel, so I wrote a function in R that generates a publication-quality ROC plot on the fly.  This is handy if you want to play around with the hit threshold of the data (i.e., the binding affinity) or experiment with different scoring functions.

According to wikipedia:

a receiver operating characteristic (ROC), or simply ROC curve, is a graphical plot which illustrates the performance of a binary classifier system as its discrimination threshold is varied. It is created by plotting the fraction of true positives out of the total actual positives (TPR = true positive rate) vs. the fraction of false positives out of the total actual negatives (FPR = false positive rate), at various threshold settings.

There are already several ROC plot calculators on the web.  But I wanted to write my own using the R statistical language owing to its ability to produce very high-quality, clean graphics.  You can find the code here:

https://github.com/mchimenti/data-science-coursera/blob/master/roc_plot_gen.R

The function takes a simple 2 column input in csv format.   One column is “score,” the other is “hit” (1 or 0).   In the context of docking analysis, “score” is the docking score and hit is whether or not the molecule was an experimental binder.   The area-under-curve is calculated using the “trapz” function from the “pracma” (practical mathematics) package.

 

Improve your docked poses with receptor flexibility

I have noticed that rigid docking methods, even when run with high-precision force fields, don’t always capture the correct poses for your true positives.  Sometimes a hit will be docked somewhere other than into the site that you specified because the algorithm could not fit the molecule into the rigid receptor.  This will cause true positives to be buried at the bottom of your ranked list.

You may want to try introducing receptor flexibility to improve the poses of your true positives.  There are two main ways to do this:  scale down the Van der Waals interactions to mimic flexibility (i.e., make the receptor atoms “squishy”) or use induced-fit docking (IFD) methods.  I have found that while setting a lower threshold for VdW scaling can rescue false negatives (poorly docked true binders), at least in one case, it does not improve the overall ranking of all of the true positives.  So it is not a panacea.

Induced fit methods work by mutating away several side chains in the binding pocket, docking a compound, mutating the side chains back, and energy minimizing the structure.  Then the compound is re-docked to the minimized structure using a high-precision algorithm.  There are two main applications for IFD: (1) improving the pose of a true positive that cannot be docked correctly by rigid docking and (2) rescuing false negatives.

My experience has been that IFD improves the docking scores of true positives and false positives by about the same amount, so the value of running the method on an entire library remains unclear.  However, there is much value in running IFD on a true hit where you are not sure the rigid pose is optimal.  Often, the improvement in the shape complementarity and number of interactions will be dramatic.

Also, you can use the alternative receptor conformations generated by IFD to a true positive to rescreen your library with faster rigid docking methods.  If you are screening on a prospective basis, this approach could help you identify other chemotypes that may bind well but are missed in a first pass rigid docking screen.