The Informatics Matters Blog

2024-06-17

POST

Kubernetes provides containers with lifecycle hooks that allow them to execute code when they start or before they stop. In this post we’ll detail the use of the PreStop hook to ensure that containers shutdown in a graceful manner.

More

2024-02-01

Squonk

POST

In my last post I talked about how you can use Squonk jobs to calculate properties of molecule. Another useful thing it can do is molecule depiction, including the ability to display atomic properties. The basic flow is to run a job that does the calculations and then run the Depict molecules job to generate graphics of those molecules.

More

2024-01-29

Squonk

POST

We’ve not blogged about our Squonk2 products much up to now. This is the first is a series of posts that addresses this. We start with describing Jobs, a key part of Squonk Data Manager. A job is something you can execute. Something that does some work. A job can be simple, such as a Python or Java script that processes some molecules and calculates or predicts some molecular properties, or it could be a complete virtual screening workflow that runs in parallel on the cluster and could take days to complete.

More

2023-08-04

Kubernetes

POST

Have you ever wanted to move a Kubernetes volume (Persistent Volume Claim) to another storage class or back it up? We do it now and again so we created a generally useful container image that can be attached to a pair of volumes to then automatically copy (using rsync) from one to the other.

Have a look at our docker-volume-replicator repository and accompanying public Docker image.

More

2023-07-31

Squonk

POST

This is a brief post to describe the launch of our new Squonk2 products and the phasing out of the Squonk Computational Notebook. The Squonk2 products were launched at a webinar on 19th July and are available for evaluation.

More

2023-05-15

Kubernetes

POST

We recently encountered a Kubernetes cluster that had experienced catastrophic etcd failure. We had 3 nodes running etcd that had suddenly been reduced to one without quorum. Repairing the situation required action on a number of fronts.

We had no viable backups, and had to rely on the db file that was left on the remaining etcd node.

More

2023-04-03

Automation IaC

POST

If you’ve read our earlier post you’ll understand how to create an AWS ParallelCluster and adapt the head and compute nodes with custom scripts that you place in an AWS S3 bucket.

Here we discus using the AWS ParallelCluster [ImageBuilder] utility to create a custom AMI image that you can use to create your cluster, useful if you find you’re installing a lot of custom packages that slow down the formation of new compute nodes.

Rather than installing packages and tools every time a cluster node is created (costing valuable minutes) the [ImageBuilder] lets you construct a customised AMI.

More

2023-03-13

Automation IaC

POST

Here we’re using the AWS ParallelCluster, an AWS supported open source cluster management tool that helps you to deploy and manage high performance computing clusters in the AWS Cloud. For a background refer to the AWS introduction.

In our earlier post we demonstrated the use of AWS ParallelCluster (v2) for Nextflow. Here we briefly discuss switching from ParallelCluster v2 to v3.

More

2023-02-27

Automation Kubernetes

POST

Try using popeye, a Kubernetes Cluster Sanitizer, to help lint objects deployed to Kubernetes to detect misconfigurations, and give you some feedback on compliance with community best practices.

More

2021-06-01

Python

POST

In this article we’ll cover lessons learned implementing a Keycloak authentication solution into Django Rest Framework (DRF) using the mozilla-django-oidc library. Note that this article assumes some familiarity with Django.

Requirements

These can be summarized as follows:

Replace the current authentication solution with a Keycloak-based solution so that users can be authenticated and enable single sign-on between applications using different authentication providers.
The solution should cover both Django (session authentication) and DRF (token authentication).
It should be able to handle a dedicated Keycloak client and expandable to allow roles-based authorisation.
OpenID connect preferred
Any libraries used should be currently supported and widely used

More

2021-05-28

Docking

POST

Docking is often used in virtual screening to attempt to identify potential drug leads. To be effective you typically need to screen a large number of candidate molecules and that means you need to parallelise the process across multiple servers. This post describes how this can easily be done using AWS Parallel Cluster and Nextflow.

More

2021-03-11

Containers

POST

After a long break I complete a series of posts on building smaller container images, this time showing how to build lightweight images for RDKit using buildah. In the previous post I introduced using buildah. In short, it showed how to package what you want into a container images without needing to include the build infrastructure (make etc.) or package managers (yum etc.).

More

2021-02-08

Automation Containers

POST

[Actions] are a welcome addition to the GitHub service. They add a free, built-in CI/CD capability similar to [GitLab-CI], [Travis] and others.

What we’ll see in this blog post is a simple pattern to build a container image and push it to [Docker Hub]. What happens is based on whether we’re on a branch, responding to a pull-request, or responding to changes on main.

More

2020-08-31

Fragment network

POST

In my last post I described the basic details of how the fragment network is composed and how it allows queries for similar molecules that are “chemically intuitive”. Here I show how queries can be executed through the REST API and how this has turned out to be extremely useful in the virtual screening work we are doing on the SARS-Cov-2 main protease in collaboration with the Diamond Light Source.

More

2020-08-06

Ansible Automation Kubernetes

POST

Here we’re going to explore repetitive project content and one method you can employ to automate its generation.

After creating a few Ansible-based Kubernetes projects the boilerplate begins to emerge on two fronts - a number of mandatory Ansible files and the Kubernetes object definitions. What’s most frustrating is that, for the most part, Kubernetes objects are often detailed (verbose) yet irritatingly repetitive and predictable.

More

2020-07-26

Containers Kubernetes

POST

In this article we’ll see how to deploy container images from a [GitLab] private registry into Kubernetes.

Public container images, in registries like [Docker Hub], can be deployed easily without needing to provide any credentials. Kubernetes Deployments (and other objects like StatefulSets) simply need the image, i.e. informaticsmatters/neo4j:3.5.20. However, images resident on a private registry will require you to deploy an ImagePullSecret that Kubernetes uses to pull the image.

[Kubernetes documentation] describes such secrets with a section explaining how they can be created from the command-line.

Here we provide a brief cheat-sheet that explains how to create a pull-secret using [GitLab] and then use that in a Deployment.

More

2020-07-21

Fragment network

POST

In my last post I described how the fragment network can be used as a key part of a virtual screening project by providing allowing your initial hits to be expanded out to a large number of candidates to screen. In this post we describe how the fragment network works and why is more ‘chemically intuitive’ that traditional fingerprint based similarity search.

More

2020-07-15

Fragment network

POST

We’ve previously mentioned the fragment network and our Fragnet Search application that provides a user friendly way to search and explore the data in the fragment network. But we’ve not really explained the basis of the fragment network and how it can be utilised in a drug discovery program. This is the first is a series of posts that covers this topic.

More

2020-06-03

Kubernetes Web

POST

In this brief article we’ll see how to setup a Kubernetes nginx ingress to redirect HTTP traffic from example.com to www.example.com.

Prerequisites here are a cluster with an nginx ingress controller and a route to the cluster. This relies on your domain routing example.com and www.example.com to your cluster, usually through some form of load-balancer. We’re not going to cover these aspect of the solution, just the ingress definition you need.

More

2020-05-11

Automation Kubernetes

POST

In this article we’ll see how simple it is to install Kubernetes onto some Ubuntu hosts using [Pharos].

Pharos is a Certified Kubernetes with all batteries included. It is powered by the latest upstream version of Kubernetes kernel and include tools for cluster lifecycle management.

More

2020-05-06

Docking Fragment network

POST

We’ve been very busy recently helping out on the fragment screening program at the Diamond Light Source. They generated fragment hits of the SARS-Cov-2 main protease back in early March. See here for details.

With that data we’ve been running virtual screening using compounds expanded out from the fragment screening hits using the API of Fragnet Search. Details of the initial virtual screening workflow that was executed on usegalaxy.eu are described here and I, along with Simon Bray from the Freiburg Galaxy team, will be presenting this in session 3 of the Galaxy-ELIXIR webinar series on COVID-19 on 14 May 2020, 17pm CEST. Come along and hear the details. You will need to register first.

We’ll be describing various aspects of the work in more details in this blog at a later stage.

More

2019-12-16

Containers

POST

We now have a better approach for building Centos based RDKit Docker images now that Centos 8 is available.

We previously described our approach to building container images for RDKit (see here for example). Key to our approach is keeping the image size relatively small.

Up till now building Centos based images has been tricky as since the March 2019 release of RDKit a very recent version of the boost libraries were required (due to the switch to modern C++), but the ones that come with Centos 7 are too old, as well as RDKit now requiring Python 3 which traditionally has not played nicely with Centos 7. We did find a workaround for all this, but it was not nice.

Now that Centos 8 is available this problem is solved and we’ve been able to simplify the process. The approach closely follows the one we use for Debian based images. We’ve pushed Python3 and Java images for the October 2019 RDKit releases and a build from the master branch and from now on will do our best to generate these as well as the Debian based images for future releases. We might also provide Centos based cartridge and Tomcat images, but generating those is more complex so stick with Debian based images for those for now.

See the GitHub repo for all the details (and the various branches you’ll find there).

More

2019-12-15

Fragment network

POST

We’ve recently demonstrated our Fragnet Search application in a SaferWorldbyDesign webinar.

In it we demonstrate using the web application and programmatically accessing the REST API behind it.

Take a look at the recording on YouTube.

More

2019-09-10

Fragment network

POST

We’ve been working with the fragment network as part of our work with the fragment screening program at the Diamond Light Source for a couple of years now. The fragment network was conceived by Astex and provides a chemically intuitive way to identify similar compounds without the limitations of traditional fingerprint based similarity searching.

More

2018-11-23

Docking

POST

One of our more interesting GitHub projects is the Docking Validation project. We use this to establish and document best practices in virtual screening tools (such as docking) and approaches to semi-automating and scaling these procedures.

We just completed a new ‘experiment’ that is related to our work at the Diamond Light Source’s XChem project which has done some amazing work on fragment based screening using XRay crystallography.

More

2018-07-10

Automation IaC

POST

You’ve probably created a machine image at some point. A base image for AWS that builds upon someone’s work by adding a particular version of Java or Python or a new utility. Did you create the image on AWS using an EC2 instance, login, run some yum or apt-get and then save it? Great, and if someone wants the source code for that image or you want to build a similar image on a different region or provider? Well, [Packer] is an IaC tool for automating the construction of machine images.

More

2018-06-18

Automation

POST

The Python Jenkins module is a convenient wrapper for the Jenkins REST API that gives you control of a Jenkins server in a pythonic way. Here we’ll see how to grab all the jobs from a Jenkins server and also how these jobs can be re-created from the captured material.

More

2018-05-31

Containers

POST

In this post we look at using buildah to generate container images that only contain what we want, no extra fluff. We show how this can let us generate truly small images that will load faster and be more secure, and do this without the need for the Docker daemon to be running.

More

2018-04-30

Automation IaC

POST

Here we’re going to be looking at the the idea of applying automation tools to the wider product development process. Tools that help you do this are part of a collection known as “Infrastructure as Code”, which refers to the the provisioning of compute instances (physical machines and their operating systems) and software applications using revision-controlled machine-readable text files.

More

2018-04-15

Containers

POST

This series of posts describes how we can generate smaller Docker images. In the first post we outlined a common problem with container images - that they frequently contain artefacts that were needed to build the software or to install it into the container. We’ll show one approach that can be used to avoid this extra bloat, and so generate smaller and more secure containers.

More

2018-04-12

Software design

POST

I’ve been in meetings, often driven by the root-cause-analysis of a software fault found in the field, where the topic of code coverage has cropped up. I’m sure many of us have been in similar meetings. On occasions I’ve also been asked to justify some of my apparently poor line coverage figures, where the percentage has fallen short of what was perceived by the inquisitor as acceptable.

More

2018-04-04

Containers

POST

This is the first in a series of blog posts about building better Docker images.

Docker Inc is widely acknowledged for transitioning containers from geekdom to the real world inhabited by us developers, and did this by providing easy to use tools for building, sharing and running containers. Key to this is docker build command and the Dockerfile.

More

2018-04-03

Blog

POST

Welcome to the Informatics Matters blog.

This is the first post of what will become a regular stream of information about our activities at Informatics Matters in providing solutions for scientific computing, including bioinformatics, genomics, cheminformatics and computational chemistry.

More

Kubernetes PreStop Lifecycle Hooks

Molecule depiction in Squonk

Squonk job execution

A kubernetes volume replicator

Squonk2 launch

Fixing a broken etcd cluster

AWS ParallelCluster v3 Custom Images

Migrating to AWS ParallelCluster v3

Kubernetes object linting with popeye

Installing Keycloak on Django Rest Framework

Requirements

Virtual screening with Parallel Cluster and Nextflow

Smaller Containers - Part 4

GitHub Actions for container images

Fragment Network REST API

Cookie-cutting Ansible Kubernetes Projects

Deploying container images from a private GitLab registry

Fragment network basics

Fragment network intro

Redirecting to www with an nginx ingress

Installing Kubernetes with Pharos

Virtual screening for SARS-Cov-2 main protease inhibitors

RDKit Docker Images for Centos8

Fragnet search webinar

Fragment network webinar

Cavities and Frankenstein molecules

Building machine images with Packer

Python and the Jenkins API

Smaller Containers - Part 3

Applying the build process to the deployment

Smaller Containers - Part 2

What is a Good Test Coverage Target?

Smaller Containers - Part 1

Blog Introduction