If you’ve read our earlier post you’ll understand how to create an AWS ParallelCluster and adapt the head and compute nodes with custom scripts that you place in an AWS S3 bucket.
Here we discus using the AWS ParallelCluster ImageBuilder utility to create a custom AMI image that you can use to create your cluster, useful if you find you’re installing a lot of custom packages that slow down the formation of new compute nodes.
Rather than installing packages and tools every time a cluster node is created (costing valuable minutes) the ImageBuilder lets you construct a customised AMI.
The essence of the custom image is to put your package configuration into a shell-script and store this in an Amazon S3 bucket, which you refer to in an ImageBuilder YAML-based configuration file.
Your IAM user will need the image build pcluster user policy described in the iam-roles part of the AWS ParallelCluster documentation.
Find a suitable Amazon Linux image to base your custom image on. You can use pcluster list-official-images
to find some.
If you don’t have a default VPC you will need to provide a
Subnet
(and aSecurityGroup
).
With this information you can create a simple YAML file. Here we’ve chosen to use a c6a.4xlarge
instance for the image build and our custom script (which you can also use) has been placed on S3 at s3://im-aws-parallel-cluster/imagebuilder-amazon.sh
. Use whatever instance and script is suitable for your cluster: -
Now we just have to compile the custom image using the pcluster build-image
command. We need to provide an identity for the image and a Region
.
In the following example our YAML file is called imagebuilder-nextflow.yaml
: -
Building an Image can take a substantial length of time (an hour or so) but you can track image build status using the following command: -
When the imageBuildStatus
from the above command is BUILD_COMPLETE
you should also find the image AMI under ec2AmiInfo -> amiId
and can use this in your cluster configuration, removing the corresponding CustomActions
, which are no longer required, by placing the AMI in the Image
block of your cluster configuration: -
You can see a fuller discussion of ParallelCluster and the ImageBuilder in our nextflow-pcluster repository.