One of our more interesting GitHub projects is the Docking Validation project. We use this to establish and document best practices in virtual screening tools (such as docking) and approaches to semi-automating and scaling these procedures.
We just completed a new ‘experiment’ that is related to our work at the Diamond Light Source’s XChem project which has done some amazing work on fragment based screening using XRay crystallography.
With fragment screening you frequently get a number of crystal structures for your target, each with a different ligand. XChem has made this process relatively routine and the challenge is now to follow up the fragment structures and turn them into drug leads. A key part of this is to screen potential analogues of those fragments using docking algorithms.
The new experiment looks into one aspect of this - how to select the protein target for docking. You have a handful of crystal structures for your protein target, each with a different fragment ligand. Once you remove the ligand each structure is slightly different. You don’t want to use all of these for your docking as it is computationally expensive. So which one to use? Or do you need more than one.
So we took an approach suggested by Thomas Exner, author of the PLANTS docking program. This is to dock each ligand into each protein structure and compare the docked pose with the actual pose in the crystal structure. You may find that some protein structures are better at docking the range of ligands than others. Hopefully one can dock the whole lot successfully, but maybe you might be better with two or three structures to get a good spread across the range of ligands.
So that’s what we set off to achieve and to partly automate. The data for the NUDT7 bromo domain target studied at the SGC in Oxford and was prepared by Anthony Bradley while he was working at Diamond. There were 5 crystal structures of this target each with a different fragment ligand.
One key part of the process is to define the binding cavity of the protein. We are using rDock for the docking, partly because it has good tooling to support the docking process. Part of this is the ‘rbcavity’ program that can be used to define the binding site. One way this can be used is using an existing ligand. The coordinates of that ligand’s atoms are used to define the binding cavity. But we have multiple crystal structures for our NUTD7 protein, each with a different ligand. And those fragment ligands can occupy different parts of the binding cavity. So no one ligand really does the job for us. We want the space occupied by all those ligands. But the rbcavity program can only take a single molecule as input.
The solution seemed to be to create a single hybrid molecule that contained all the ligands and use that to map out the binding cavity. This seemed a bit crazy, but a quick email exchange with Peter Schmidtke and Xavier Barril from the rDock team indicated that this was the way ahead, and Xavier confirmed that he had used this approach before and provided a Perl script that performs this. Peter suggested the name ‘Frankenstein molecule’ which seems to have stuck. The script combines the atoms from the multiple ligands into one molecule, skipping atoms that don’t contribute to the ‘outer’ surface of the ligands. No bonds are included as rbcavity does not need them. We have a cut and paste molecule with all the useful atoms, but no bonds. A true monster, but loveable as it does exactly what we need. That monster molecule is used for the cavity definition and seems to work well.
For more information on the experiment look at the details on GitHub.
We welcome comments and contributions on the Docking Validation project. As you might expect from us it’s freely accessible to all.