Fragment network intro

2020-07-15

POST

We’ve previously mentioned the fragment network and our Fragnet Search application that provides a user friendly way to search and explore the data in the fragment network. But we’ve not really explained the basis of the fragment network and how it can be utilised in a drug discovery program. This is the first is a series of posts that covers this topic.

The fragment network was conceived by Richard Hall at Astex Pharmaceuticals and is described in this paper that nicely describes the methodology. However, source code was not made available, and seeing the potential for the XChem fragment screening program at the Diamond Light Source, Anthony Bradley decided to re-implement the core algorithms using the RDKit cheminformatics toolkit. The code is available here and forms a key part of the Fragalysis application that is used to explore the Diamond fragment screening data.

That is where we at Informatics Matters step in as we are managing the Fragalysis environment for Diamond and have taken over managing the generation of the the fragment network data so that we can both support Diamond’s needs and provide access to this technology to other organisations.

So what is the fragment network? One way of thinking about it is as an alternative approach to chemical similarity. When you have a series of fragment leads from a fragment based drug design (FBDD) program the first thing you are likely to want to so is to find a set of close analogues to the those hits and test these. But how would you go about doing this? Traditional fingerprint based similarly search might be an obvious approach, but this does not work well for small molecules like fragments, primarily because too few bits in the fingerprint are set, so the signal to noise ratio is poor. As you drop the similarity threshold you rapidly get into the noise and return compounds that a chemist really wouldn’t consider to be similar.

Instead you might use substructure search which is a valid way to finding analogues, and does work reasonably well, but requires a lot of skill and is very time consuming if you have more than a handful of hits to investigate. Also, this does not lend itself to automation.

This is where the fragment network comes to the rescue. It allows to very easily and very quickly identify a large number of analogues of a collection of hits, and a chemist generally considers those analogues as “reasonable”. This is what we use at Diamond for the fragment screening follow up. For instance, in the current screen of the SAR-Cov-2 main protease (see here for details) there are currently 23 non-covalent fragment hits in the active site region of the protease. Each of those 23 is a potential starting point for a drug discovery program and we want to do a virtual screening campaign to investigate a wide number of analogues of those 23 fragment hits. Using the fragment network this becomes remarkably simple. Just a single REST API call expands those 23 hits into a large number of candidates for virtual screening. With the current expansion parameters in use we generate approximately 95,000 candidates in a matter of seconds, with no specialised knowledge or decision making needed. For all of those molecules there is a clear rationalle for how the candidate is derived from the fragment hit. Those candidates come from compound sets from MolPort, ChemSpace and Enamine and so should all be purchasable “off the shelf” or synthesisable with well established synthetic routes.

In future posts in this series I will explain more about how the fragment network works and how it is generated, as well as describing the virtual screening workflows in more detail. An early version of the virtual screening workflow is described here.

If you are interested in getting access to the fragment network for use at your own organisation then please get in touch.