If you’ve read our earlier post you’ll understand how to create an AWS ParallelCluster and adapt the head and compute nodes with custom scripts that you place in an AWS S3 bucket.

Here we discus using the AWS ParallelCluster ImageBuilder utility to create a custom AMI image that you can use to create your cluster, useful if you find you’re installing a lot of custom packages that slow down the formation of new compute nodes.

Rather than installing packages and tools every time a cluster node is created (costing valuable minutes) the ImageBuilder lets you construct a customised AMI.

The essence of the custom image is to put your package configuration into a shell-script and store this in an Amazon S3 bucket, which you refer to in an ImageBuilder YAML-based configuration file.

Your IAM user will need the image build pcluster user policy described in the iam-roles part of the AWS ParallelCluster documentation.

Find a suitable Amazon Linux image to base your custom image on. You can use pcluster list-official-images to find some.

If you don’t have a default VPC you will need to provide a Subnet (and a SecurityGroup).

With this information you can create a simple YAML file. Here we’ve chosen to use a c6a.4xlarge instance for the image build and our custom script (which you can also use) has been placed on S3 at s3://im-aws-parallel-cluster/ Use whatever instance and script is suitable for your cluster: -

  InstanceType: c6a.4xlarge
  ParentImage: ami-00000000000000000
  SubnetId: subnet-00000000000000000
  - sg-00000000000000000
  # Components to add to the image.
  # Here we're running our custom script (on S3)
  # that installs nextflow and apptainer (singularity)
  - Type: script
    Value: s3://im-aws-parallel-cluster/
  # Allow the builder to access S3
    - Policy: arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess
  # Other stuff...
    Enabled: true

Now we just have to compile the custom image using the pcluster build-image command. We need to provide an identity for the image and a Region.

In the following example our YAML file is called imagebuilder-nextflow.yaml: -

$ pcluster build-image \
    --image-configuration imagebuilder-nextflow.yaml \
    --image-id nextflow \
    --region eu-central-1

Building an Image can take a substantial length of time (an hour or so) but you can track image build status using the following command: -

$ pcluster describe-image --image-id nextflow --region eu-central-1

When the imageBuildStatus from the above command is BUILD_COMPLETE you should also find the image AMI under ec2AmiInfo -> amiId and can use this in your cluster configuration, removing the corresponding CustomActions, which are no longer required, by placing the AMI in the Image block of your cluster configuration: -

  Os: alinux2
  CustomAmi: ami-00000000000000000

You can see a fuller discussion of ParallelCluster and the ImageBuilder in our nextflow-pcluster repository.

latest posts
by year
by category
Software design
Fragment network