AMI Management using Lambda Functions

· 653 words · 4 minutes read python aws lambda serverless

If you are building services to automatically scale on AWS EC2, or trying to build AMIs to sell on the marketplace, odds are that you’re going to end up generating a-lot of AMIs (Amazon Machine Images). You could go in and manually delete them, but that can be a hassle to maintain, plus if you leave then it won’t happen anymore.

This was the case I was called in on, we had around 5,000 AMIs registered on our account though after a week most could be deleted anyways, apparently someone was manually deleting them on a weekly basis but had left the company. I wasn’t about to keep this manual, I looked around at some other scripts people had written to solve this issue and they seemed to be overcomplicating the issue, so I will outline my requirements below, if these fit your needs, this script might be useful for you.

  1. Has to be automated.
  2. Preferably in the cloud and not local calling AWS.
  3. Should be time based, so that anything over x number of days is deleted.
  4. There should be a way to prevent images from being deleted.

So The solution I came up with was a lambda function which will run in AWS automatically behind the scenes. If you are familiar with lambda functions skip to the next paragraph, otherwise a lambda function is Amazon’s entry into the serverless field, just like Azure’s Functions or Google’s Cloud Functions. What makes lambda “serverless” is that you don’t worry about the underlying infrastructure, you don’t run EC2 instances, it just runs somewhere when needed, pretty cool huh?

So let’s break down this solution, you’re going to need to create a new role which is able to describe images, deregister images, and delete snapshopts in addition to the default lambda actions. Your role should look something like this:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "logs:CreateLogGroup",
        "logs:CreateLogStream",
        "logs:PutLogEvents"
      ],
      "Resource": "arn:aws:logs:*:*:*"
    },
    {
      "Action": [
        "ec2:DeleteSnapshot",
        "ec2:DeregisterImage",
        "ec2:DescribeImages"
      ],
      "Effect": "Allow",
      "Resource": "*"
    }
  ]
}

Once you have the role in place lets setup the lambda function. If you’re creating this in the UI you’ll want to start off with the Canary lambda function. Name it whatever you want, and select the role created earlier from the Existing Role dropdown. If you’ve already created a CloudWatch rule for this then chose that from the cloudwatch-events Rule dropdown, otherwise select Create a new rule, come up with a Rule name, and a Schedule expression. As the hint suggests this supports both rate and cron format, like rate(1 day) or cron(30 2 * * *).

Now let’s add the code using the gist below:

There are 4 environment variables in this lambda function, here’s a breakdown of them:

Variable Description Default Vaule
threshold_days The max number of days before an AMI is pruned 30
enable_delete Actually prune the AMIs, by default it’s in no-op mode false
region The region to perform the pruning in us-east-1
limit The maximum number of AMIs which would be deleted (doesn’t affect no-op mode) 400

If you have any AMIs you want to keep, you need to tag them with a keep tag as per the code. The value of it doesn’t matter to the lambda function, I suggest you put a value you will remember, perhaps a release name or something like that. So go through all images you want to keep before enabling it with the environment variable enable_delete.

Once you have this all in place you can run a test run of the script to verify it deletes works how you expect (or would have if you have it in noop mode). You can verify the output via the CloudWatch log stream which will list off every AMI which was deleted as well as a total count. Also if you haven’t already make sure to enable the trigger otherwise the function won’t run automatically.