Using AWS EC2 Instances to train a Convolutional Neural Network to identify Cows and Horses

First published at https://nivleshc.wordpress.com

Background

Machine Learning (ML) and Artificial Intelligence (AI) has been a hobby of mine for years now. After playing with it approximately 8 years back, I let it lapse till early this year, and boy oh boy, how things have matured! There are products in the market these days that use some form of ML – some examples are Apple’s Siri, Google Assistant, Amazon Alexa.

Computational power has increased to the point where calcuations that took months can now be done within days. However, the biggest change has come about due to the vast amounts of data that the models can be trained on. More data means better accuracy in models.

If you have taken any programming course, you would remember the hello world program. This is a foundation program, which introduces you to the language and gives you the confidence to continue on. The hello world for ML is identifying cats and dogs. Almost every online course I have taken, this is the first project that you build.

For anyone wanting a background on Machine Learning, I would highly recommend Andrew Ng’s https://www.coursera.org/learn/machine-learning in Coursera. However, be warned, it has a lot of maths 🙂 If you are able to get through it, you will get a very good foundational knowledge on ML.

If theory is not your cup of tea, another way to approach ML is to just implement it and learn as you go. You don’t need to get a PhD in ML to start implementing it. This is the philosophy behind Jeremy Howard’s and Rachel Thomas’s http://www.fast.ai. They take you through the implementation steps and introduce you to the theory on a need to know basis, in essence you are doing a top down approach.

I am still a few lessons away from finishing the fast.ai course however, I have learnt so much and I cannot recommend it enough.

In this blog, I will take you through the steps to implement a Convolutional Neural Network (CNN) that will be able to pick out horses from cows. CNNs are quite complicated in nature so we won’t go into the nitty-gritty details on creating them from scratch. Instead, we will use the foundational libraries from fast.ai’s lesson 1 and modify it abit, so that instead of identifying cats and dogs, we will use it to identify cows and horses.

In the process, I will introduce you to a tool that will help you scrape Google for your own image dataset.

Most important of all, I will show you how the amount of data used to train your CNN model affects its accuracy.

So, put your seatbelts on and lets get started!

 

1. Setting up the AWS EC2 Instance

ML requires a lot of processing power. To get really good throughput, it is recommended to use GPUs instead of CPUs. If you were to build a kit to try this at home, it can easily cost you a few thousands of dollars, not to mention the bill for the cooling and electricity usage.

However, with Cloud Computing, we don’t need to go out and buy the whole kit, instead we can just rent it for as long as we want. This provides a much affordable way to learn ML.

In this blog, we will be using AWS EC2 instances. For the cheapest GPU cores, we will use a p2.xlarge instance. Be warned, these cost $0.90/hr, so I would suggest turning them off after using them, otherwise you will surely rack up a huge bill.

Reshma has done a fantastic job of putting together the instructions on setting up an AWS Instance for running fast.ai course lessons. I will be using her instructions, with a few modifications. Reshma’s instructions can be found here.

Ok lets begin.

  • Login to your AWS Console
  • Go to the EC2 section
  • On the top left menu, you will see EC2 Dashboard. Click on Limits under it
  • AWS_Dashboard_EC2_Limits
  • Now, on the right you will see all the type of EC2 instances you are allowed to run. Search for p2.xlarge instances. These have a current limit of zero, meaning you cannot launch them. Click on Request limit increase and then fill out the form to justify why you want a p2.xlarge instance. Once done, click on Submit. In my case, within a few minutes, I received an email saying that my limit increase had been approved.
  • Click on EC2 Dashboard from the left menu
  • Click on Launch Instance
  • In the next screen, in the left hand side menu, click on Community AMIs
  • On the right side of the screen, search for fast.ai
  • From the results, select fastai-part1v2-p2
  • In the next screen (Instance Type) filter by GPU compute and choose p2.xlarge
  • In the next screen configure the instance details. Ensure you get a public IP address (Auto-assign Pubic IP) because you will be connecting to this instance over the internet. Once done, click Next: Add Storage
  • In the next screen, you don’t need to do anything. Just be aware that the community AMI comes with a 80GB harddisk (at $0.10/GB/Month, this will amount to $8/Month). Click Next
  • In the next screen, add any tags for the EC2 Instance. To give the instance a name, you can set the Key to Name and the Value to fastai. Click Next
  • For security groups, all you need to do is allow SSH to the instance. You can leave the source as 0.0.0.0/0 (this allows connections to the EC2 instance from any public IP address). However, if you want to be super secure, you can set the source to your current ip address. However, doing this means that should your public ip address change (hardly any ISPs give you a static IP address, unless you pay extra), you will have to go back into the AWS Console and update the source in the security group. Click Next
  • In the next section, check that all details are correct and then click on Launch. You will be asked for your key pair. You can either choose an existing key pair or create a new one. Ensure you keep the key pair in a safe place because whoever possesses it can connect to your EC2 instance.
  • Now, sit back and relax, Within a few minutes, your EC2 instance will be ready. You can monitor the progress in the EC2 Dashboard

DON’T FORGET TO SHUTDOWN THE INSTANCE WHEN NOT USING IT. AT $0.90/hr, IT MIGHT NOT SEEM MUCH, HOWEVER THE COST CAN EASILY ACCUMULATE TO SOMETHING QUITE EXPENSIVE

2. Creating the dataset

To train our Convolutional Neural Network (CNN), we need to get lots of images of cows and horses. This got me thinking. Why not get it off Google? But, then this provided another challenge. How do I download all the images? Surely I don’t want to be sitting there right clicking each search result and saving it!

After some googling, I landed on https://github.com/hardikvasa/google-images-download. It does exactly as to what I wanted. It will do a google image search using a keyword and download the results.

Install it using the instructions provided in the link above. By default, it only downloads 100 images. As CNNs need lots more, I would suggest installing chromedriver. The instructions to do this is in the Troubleshooting section under ## Installing the chromedriver (with Selenium)

To download 1000 images of cows and horses, use the following command line (for some reason the tool only downloads around 800 images)

  • the downloaded images will be stored in the subfolder cows/downloaded and horses/downloaded in the /Users/x/Documents/images folder.
  • keyword denotes what we are searching for in google. For cows, we will use cow because we want a single cow’s photo. The same for horses.
  • –chromedriver provides the path to where the chromedriver has been stored
  • the images will be in jpg format
googleimagesdownload --keywords "cow" --format jpg --output_directory "/Users/x/Documents/images/" --image_directory "cows/downloaded" --limit 1000 --chromedriver /Users/x/Documents/tools/chromedriver
googleimagesdownload --keywords "horse" --format jpg --output_directory "/Users/x/Documents/images/" --image_directory "horses/downloaded" --limit 1000 --chromedriver /Users/x/Documents/tools/chromedriver

3. Finding and Removing Corrupt Images

One disadvantage of using googleimagedownload script is that, at times a downloaded image cannot be opened. This will cause issues when our CNN tried to use it for training/validating.  To ensure our CNN does not encounter any issues, we will do some housekeeping before hand and remove all corrupt images (images that cannot be opened).

I wrote the following python script to find and move the corrupt images to a separate folder. The script uses the matplotlib library (the same library used by the fast.ai CNN framework) If you don’t have it, you will need to download it from https://matplotlib.org/users/installing.html.

The script assumes that within the root folder, there is a subfolder called downloaded which contains all the images. It also assumes there is a subfolder called corrupt within the root folder. This is where the corrupt images will be moved to. Set the root_folder_path to the parent folder of the folder where the images are stored.

#this script will go through the downloaded images and find those that cannot be opened. These will be moved to the corrupt folder.

#load libraries
import matplotlib.pyplot as plt
import os

#image folder
root_folder_path = '/Users/x/Documents/images/cows/'
image_folder_path = root_folder_path + 'downloaded/'
corrupt_folder_path = root_folder_path + 'corrupt' #folder were the corrupt images will be moved to

#get a list of all files in the img folder
image_files = os.listdir(f'{image_folder_path}')

print (f'Total Image Files Found: {len(image_files)}')
num_image_moved = 0

#lets go through each image file and see if we can read it
for imageFile in image_files:
 filePath = image_folder_path + imageFile
 #print(f'Reading {filePath}')
 try:
 valid_img = plt.imread(f'{filePath}')
 except:
 print (f'Error reading {filePath}. File will be moved to corrupt folder')
 os.rename(filePath,os.path.join(corrupt_folder_path,imageFile))
 num_image_moved += 1

print (f'Moved {num_image_moved} images to corrupt folder')

For some unknown reason, the script, at times, moves good images into the corrupt folder as well. I would suggest that you go through the corrupt images and see if you can open them (there won’t be many in the corrupt folder). If you can, just manually move them back into the downloaded folder.

To make the images easier to handle, lets rename them using the following format.

  • For the images in the cows/downloaded folder rename them to a format CowXXX.jpg where XXX is a number starting from 1
  • For the images in the horses/downloaded folder rename them to a format HorseXXX.jpg where XXX is a number starting from 1

 

4. Transferring the images to the AWS EC2 Instance

In the following sections, I am using ssh and scp which come builtin with MacOS. For Windows, you can use putty for ssh and WinSCP for scp

A CNN (or any other Neural Network model) is trained using a set of images. Once training has finished, to find how accurate the model is, we give it a set of validation images (these are different to those it was trained on, however we know what these images are of) and ask it to identify the images. We then compare the results with what the actual image was, to find the accuracy.

 

In this blog, we will first train our CNN on a small set of images.

Do the following

  • create a subfolder inside the cows folder and name it train
  • create a subfolder inside the cows folder and name it valid
  • move 100 images from the cows/downloaded folder into the cows/train folder
  • move 20 images from the cows/downloaded folder into the cows/valid folder

Make sure the images in the cows/train folder are not the same as those in cows/valid folder

Do the same for the horses images, so basically

  • create a subfolder inside the horses folder and name it train
  • create a subfolder inside the horses folder and name it valid
  • move 100 images from the horses/downloaded folder into the horses/train folder
  • move 20 images from the horses/downloaded folder into the horses/valid folder

Now connect to the AWS EC2 instance the following command line

ssh -i key.pem ubuntu@public-ip

where

  • key.pem is the key pair that was used to create the AWS EC2 instance (if the key pair is not in the current folder then provide the full path to it)
  • public-ip is the public ip address for your AWS EC2 instance (this can be obtained from the EC2 Dashboard)

Once connected, use the following commands to create the required folders

cd data
mkdir cowshorses
mkdir cowhorses/train
mkdir cowhorses/valid
mkdir cowhorses/train/cows
mkdir cowhorses/train/horses
mkdir cowhorses/valid/cows
mkdir cowhorses/valid/horses

Close your ssh session by typing exit

Run the following commands to transfer the images from your local computer to the AWS EC2 instance

To transfer the cows training set
scp -i key.pem /Users/x/Documents/images/cows/train/*  ubuntu@public-ip::~/data/cowshorses/train/cows

To transfer the horses training set
scp -i key.pem /Users/x/Documents/images/horses/train/*  ubuntu@public-ip::~/data/cowshorses/train/horses

To transfer the cows validation set
scp -i key.pem /Users/x/Documents/images/cows/valid/*  ubuntu@public-ip::~/data/cowshorses/valid/cows

To transfer the horses validation set
scp -i key.pem /Users/x/Documents/images/horses/valid/*  ubuntu@public-ip::~/data/cowshorses/valid/horses

5. Starting the Jupyter Notebook

Jupyter Notebooks are one of the most popular tools used by ML and data scientists. For those that aren’t familiar with Jupyter Notebooks, in a nutshell, it a web page that contains descriptions and interactive code. The user can run the code live from within the document. This is possible because Jupyter Notebook’s execute the code on the server it is running on and then displays the result in the web page. For more information, you can check out http://jupyter.org

In our case, we will be running the Jupyter Notebook on the AWS EC2 instance. However, we will be accessing it through our local computer. For security reasons, we will not publish our Jupyter Notebook to the whole wide world (lol that does spell www).

Instead, we will use the following ssh command to bind our local computer’s tcp port 8888 to the AWS EC2 instance’s tcp port 8888 (this is the port on which the Jupyter Notebook will be running) when we connect to it. This will allow us to access the Jupyter Notebook as if it is running locally on our computer, however the connection will be tunnelled to the AWS EC2 instance.

ssh  -i key.pem ubuntu@public-ip -L8888:localhost:8888

Next, run the following commands to start an instance of Jupyter Notebook

cd fastai
jupyter notebook

After the Jupyter Notebook starts, it will provide a URL to access it, along with the token to authenticate with. Copy it and then paste it into a browser on your local computer.

You will now be able to access the fastai Jupyter Notebook.

Follow the steps below to open Lesson 1.

  • click on the courses folder
  • once inside the courses folder,  click on the  dl1 folder

In the next screen, find the file lesson1.ipynb and double-click it. This will launch the lesson1 Jupyter Notebook in another tab.

Give yourself a big round of applause for reaching so far!

Now, start from the top of lesson1 and go through the first three code sections and execute them. To execute the code, put the mouse pointer in the code section and then press Shift+Enter.

In the next section, change the path to where we moved the cows and horses pictures to. It should look like below

PATH = "data/cowshorses/"

Then, execute this code section.

Skip the following sections

  • Extra steps if NOT using Crestle or Paperspace or our scripts
  • Extra steps if using Crestle

Just a word of caution. The original Jupyter Notebook is meant to distinguish between cats and dogs. However, since we are using it to distinguish between cows and horses, whenever you see a mention of cats, change it to cows and whenever you see a mention of dogs, change it to horses.

The following lines don’t need any changing, so just execute them as they are

os.listdir(PATH)
os.listdir(f'{PATH}valid')

In the next line, replace cats with cows so that you end up with the following

files = !ls {PATH}valid/cows | head
files

Execute the above code. A list of the first 10 cow image files will be displayed.

Next, lets see what the first cow image looks like.

In the next line, change cats to cows to get the following.

img = plt.imread(f'{PATH}valid/cows/{files[0]}')
plt.imshow(img);

Execute the code and you will see the cow image displayed.

Execute the next two code sections. Leave the section after that commented out.

Now, instead of creating a CNN model from scratch, we will use one that was pre-trained on ImageNet which had 1.2 million images and 1000 classes. So it already knows quite a lot about how to distinguish objects. To make it suitable to what we want to do, we will now train it further on our images of cows and horses.

The following defines which model to use and provides the data to train on (the CNN model that we will be using is called resnet34). Execute the below code section.

data = ImageClassifierData.from_paths(PATH, tfms=tfms_from_model(resnet34, sz))
learn = ConvLearner.pretrained(resnet34, data, precompute=True)

And now for the best part! Lets train the model and give it a learning rate of 0.01.

learn.fit(0.01, 1)

After you execute the above code, the model will be trained on the cows and horses images that were provided in the train folders. The model will then be tested for accuracy by getting it to identify the images contained in the valid folders. Since we already know what the images are of, we can use this to calculate the model’s accuracy.

When I ran the above code, I got an accuracy of 0.75. This is quite good since it means the model can identify cows from horses 75% of the time. Not to forget, we used only 100 cows and 100 horses images to train it, and it didn’t even take that long to train it !

Now, lets see what happens when we give it loads more images to train on.

BTW to get more insights into the results from the trained model,  you can go through all the sections between the lines learning.fit(0.01,1) and Choosing a learning rate.

Another take at training the model

From all the literature I have been reading, one point keeps on repeating. More data means better models. Lets put this to the test.

This time around we will give the model ALL the images we downloaded.

Do the following.

  • on your local computer, move the photos back to the downloaded folder
    • move photos from cows/train to cows/downloaded
    • move photos from cows/valid to cows/downloaded
    • move photos from horses/train to horses/downloaded
    • move photos from horses/valid to horses/downloaded
  • on your local computer, move 100 photos of cows to cows/valid folder and the rest to the cows/train folder
    • move 100 photos from cows/downloaded to cows/valid folder
    • move the rest of the photos from cows/downloaded to cows/train folder
  • on your local computer, move 100 photos for horses to horses/valid and the rest to horses/train folder
    • move 100 photos from horses/downloaded to horses/valid folder
    • move the rest of the photos from horses/downloaded to horses/train folder
  • on the AWS EC2 instance, delete all the photos under the following folders
    • /data/cowshorses/train/cows
    • /data/cowshorses/train/horses
    • /data/cowshorses/valid/cows
    • /data/cowshorses/valid/horses

Use the following commands to copy the images from the local computer to the AWS EC2 Instance

To transfer the cows training set
scp -i key.pem /Users/x/Documents/images/cows/train/*  ubuntu@public-ip::~/data/cowshorses/train/cows

To transfer the horses training set
scp -i key.pem /Users/x/Documents/images/horses/train/*  ubuntu@public-ip::~/data/cowshorses/train/horses

To transfer the cows validation set
scp -i key.pem /Users/x/Documents/images/cows/valid/*  ubuntu@public-ip::~/data/cowshorses/valid/cows

To transfer the horses validation set
scp -i key.pem /Users/x/Documents/images/horses/valid/*  ubuntu@public-ip::~/data/cowshorses/valid/horses

Now that everything has been prepared, re-run the Jupyter Notebook, as stated under Starting Jupyter Notebook above (ensure you start from the top of the Notebook).

When I trained the model on ALL the images (less those in the valid folder) I got an accuracy of 0.95 ! Wow that is soo amazing! I didn’t do anything other than increase the amount of images in the training set.

Final thoughts

In a future blog post, I will show you how you can use the trained model to identify cows and horses from a unlabelled set of photos.

For now, I would highly recommend that you use the above mentioned image downloader to scrape Google for some other datasets. Then use the above instructions to train the model on those images and see what kind of accuracy you can achieve (maybe try identifying chickens and ducks?)

As mentioned before, once finished, don’t forget to shut down your AWS EC2 instance. If you don’t need it anymore, you can terminate it, to save on storage costs as well.

If you are keen about ML, you can check out the courses at http://www.fast.ai (they are free)

If you want to dabble in the maths behind ML, as perviously mentioned, Andrew Ng’s https://www.coursera.org/learn/machine-learning is one of the finest.

Lastly, if you are keen to take on some ML challenges, check out https://www.kaggle.com They have lots and lots competitions running all the time, some of which pay out actual money. There are lots of resources as well and you can learn off others on the site.

Till the next time, Enjoy 😉

Deploying an Active Directory Forest using AWS CloudFormation

First published at https://nivleshc.wordpress.com

Introduction

Wow, it is amazing how time flies. Almost two years ago, I wrote a set of blogs that showed how one can use Azure Resource Manager (ARM) templates and Desired State Configuration (DSC) scripts to deploy an Active Directory Forest automatically.

For those that would like to take a trip down memory lane, here is the link to the blog.

Recently, I have been playing with AWS CloudFormation and I am simply in awe by its power. For those that are not familiar with AWS CloudFormation, it is a tool, similar to Azure Resource Manager, that allows you to “code” your computing infrastructure in Amazon Web Services. Long gone are the days when you would have to sit down, pressing each button and choosing each option to deploy your environment. Cloud computing provides you with a way to interface with the fabric, so that you can script the build of your environment. The benefits of this are enormous. Firstly, it allows you to standardise all your builds. Secondly, it allows you to have a live as-built document (the code is the as-built document). Thirdly, the code is re-useable. Most important of all, since the deployment is now scripted, you can automate it.

In this blog I will show you how to create an AWS CloudFormation template to deploy an AWS Elastic Compute Cloud (EC2) Windows Server instance. The template will also include steps to promote the EC2 instance to a Domain Controller in a new Active Directory Forest.

Guess what the best part is? Once the template has been created, all you will have to do is to load it into AWS CloudFormation, provide a few values and sit back and relax. AWS CloudFormation will do everything for you from there on!

Sounds interesting? Lets begin.

Creating the CloudFormation Template

A CloudFormation template starts with a definition of the parameters that will be used. The person running the template (lets refer to them as an operator) will be asked to provide the values for these parameters.

When defining a parameter, you will provide the following

  • a name for the parameter
  • its type
  • a brief description for the parameter so that the operator knows what it will be used for
  • any constraints you want to put on the parameter, for instance
    • a maximum length (for strings)
    • a list of allowed values (in this case a drop down list is presented to the operator, to choose from)
  • a default value for the parameter

For our template, we will use the following parameters.

Next, we will define some mappings. Mappings allow us to define the values for variables, based on what value was provided for a parameter.

When creating EC2 instances, we need to provide a value for the Amazon Machine Image (AMI) to be used. In our case, we will use the OS version to decide which AMI to use.

To find the subnet into which the EC2 instance will be deployed in, we will use the Environment and AvailabilityZone parameters to find it.

The code below defines the mappings we will use

The next section in the CloudFormation template is Resources. This defines all resources that will be created.

If you have any experience deploying Active Directory Forests, you will know that it is extremely simple to do it using PowerShell scripts. Guess what, we will be using PowerShell scripts as well 😉 Now, after the EC2 instance has been created, we need to provide the PowerShell scripts to it, so that it can run them. We will use AWS Simple Storage Service (S3) buckets to store our PowerShell scripts.

To ensure our PowerShell scripts are stored securely, we will allow access to it only via a certain role and policy.

The code below will create an AWS Identity and Access Management (IAM) role and policy to access the S3 Bucket where the PowerShell scripts are stored.

We will use cf-init to do all the heavy lifting for us, once the EC2 instance has been created. cf-init is a utility that is present by default in EC2 instances and we can ask it to perform tasks for us.

To trigger cf-init, we will use the Userdata feature of EC2 instance provisioning. cf-init, when started, will check the EC2 Metadata for the credentials it will use, and it will also check it for all the tasks it needs to perform.

Below is the metadata that will be used. For simplicity, I have hardcoded the URL to the files in the S3 bucket.

As you can see, I have first defined the role that cf-init will use to access the S3 bucket. Next, the following tasks will be carried out, in the order defined in the configuration set

  • get-files
    • it will download the files from S3 and place them in the local directory c:\s3-downloads\scripts.
  • configure-instance (the commands in this section are run in alphabetical order, that is why I have prefixed them with a number, to ensure it follows the order I want)
    • It will change the execution policy for PowerShell to unrestricted (please note that this is just for demonstration purposes and the execution policy should not be made this relaxed).
    • next, the name of the server will be changed to what was provided in the Parameters section
    • the following Windows Components will be installed (as defined in the Add-WindowsComponents.ps1 script file)
      • RSAT-AD-PowerShell
      • AD-Domain-Services
      • DNS
      • GPMC
    • the Active Directory Forest will be created, using the Configure-ADForest.ps1 script and the values provided in the Parameters section

In the last part of the CloudFormation template, we will provide the UserData information that will trigger cfn-init to run and do all the configuration. We will also tag the the EC2 instance, based on values from the Parameters section.

For simplicity, I have hardcoded the security group that will be attached to the EC2 instance (this is defined as GroupSet under NetworkInterfaces). You can easily create an additional parameter for this, if you want.

Finally, our template will output the instance’s hostname, environment it has been created in and its privateip. This provides an easy way to identify the EC2 instance once it has been created.

Below is the last part of the template

Now all you have to do is login to AWS CloudFormation, load the template we have created, provide the parameter values and sit back and relax.

AWS CloudFormation will take it from here and do everything for you 😉

How easy was that? Magic 🙂

The complete CloudFormation template is available at https://gist.github.com/nivleshc/867b1a2ca119c7d22cf215b5a9a5de02

The two PowerShell Scripts that are used in the CloudFormation template can be downloaded using the links below

Add-WindowsComponents.ps1

Configure-ADForest.ps1

For anyone deploying an Active Directory Forest in AWS, I hope the above comes in handy.

Enjoy 😉

Amazon QuickSight – An elegant and easy to use business analytics tool

First published at https://nivleshc.wordpress.com

Introduction

Recently, I had a requirement for a tool to visualise some data I had collected. My requirements were very simple. I didn’t want something that would cost me a lot, and at the same time I wanted the reports to be elegant and informative. Most of all, I didn’t want to have to go through pages and pages of documentation to learn how to use it.

As my data was within Amazon Web Services (AWS), I thought to check if AWS had any such offerings. Guess what, there was indeed a tool just for what I wanted, and after using it, I was amazed at how simple and elegant it is.

In this blog, I will show how you can easily get started with Amazon QuickSight. I will take you through the steps to import your data into Amazon QuickSight and then create some informative visualisations.

Some background on Amazon QuickSight

Pricing

Amazon QuickSight is very inexpensive, infact, if your data is not too much, you won’t have to pay anything!

For standard edition use, Amazon QuickSight provides 1GB of SPICE for the first user free per month. SPICE is an acronym for Super-fast, Parallel, In-memory, Calculation Engine and it uses a combination of columnar storage, in-memory technologies enabled through the latest hardware innovations, machine code generation, and data compression to allow users to run interactive queries on large datasets and get rapid responses.  SPICE is the calculation engine that Amazon QuickSight uses.

Any additional SPICE is priced at $USD0.25 per GB/month. For the latest pricing, please refer to https://aws.amazon.com/quicksight/#Pricing

Data Sources

Currently Amazon QuickSight supports the following data sources

  • Relational Data Sources
    • Amazon Athena
    • Amazon Aurora
    • Amazon Redshift
    • Amazon Redshift Spectrum
    • Amazon S3
    • Amazon S3 Analytics
    • Apache Spark 2.0 or later
    • Microsoft SQL Server 2012 or later
    • MySQL 5.1 or later
    • PostgreSQL 9.3.1 or later
    • Presto 0.167 or later
    • Snowflake
    • Teradata 14.0 or later
  • File Data Sources
    • CSV/TSV – (comma separated, tab separated value text files)
    • ELF/CLF – Extended and common log format files
    • JSON – Flat or semi-structured data files
    • XLSX – Microsoft Excel files

Unfortunately, currently Amazon DynamoDB is not supported as a native data source. Since my data is in Amazon DynamoDB, I had to write some custom lambda functions to export it to a csv file, so that it could be imported into Amazon QuickSight.

Ok, time for that walk-through I promised earlier.  For this blog, I will be using an S3 bucket as my data source. It will contain the CSV files that I will use for analysis in Amazon QuickSight.

Step 1 – Create S3 buckets

If you haven’t already done so, create an S3 bucket that will contain the csv files. The S3 bucket does not have to be publicly accessible. Once created, upload the csv files into the S3 bucket.

In my case, the csv file is called orders.csv and its location is https://s3.amazonaws.com/sample/orders.csv (to get the URL to your S3 file, login to the S3 console and navigate to the S3 bucket that contains the file. Click the S3 bucket to open it, then click the file name to open its properties. Under Overview you will see Link. This is the URL to the file)

Step 2 – Create an Amazon QuickSight Account

Before you start using Amazon QuickSight, you must create an account. Unfortunately, I couldn’t find a way for creating an Amazon QuickSight account without creating an Amazon AWS account. If you don’t have an existing Amazon AWS account, you can create an AWS Free Tier account. Once you have got an AWS account, go ahead and create an Amazon QuickSight account at https://aws.amazon.com/quicksight/.

While creating your Amazon QuickSight account, you will be asked if you would like Amazon QuickSight to auto-discover your Amazon S3 buckets. Enable this and then click to Choose S3 buckets. Choose the S3 bucket that you created in Step 1 above. This will give Amazon QuickSight read-only access to the S3 bucket, so that it can read the data for analysis.

Step 3 – Create a manifest file

A manifest file is a JSON file that provides the location and format of the data files to Amazon QuickSight. This is required when creating a data set for S3 data sources. Please refer to https://docs.aws.amazon.com/quicksight/latest/user/supported-manifest-file-format.html if you would like more information about manifest files.

Below is my manifest file, which I have affectionately named ordersmanifest.json.

{
   "fileLocations": [
      {
         "URIs": [
            "https://s3.amazonaws.com/sample/orders.csv"
         ]
      },
   ],
   "globalUploadSettings": {
      "format": "CSV",
      "delimiter": ",",
      "textqualifier": "'",
      "containsHeader": "true"
   }
}

Once created, upload the manifest file into the same S3 bucket as to where the csv file is stored.

Step 4 – Create a data set

  • Login to your Amazon QuickSight account. From the top right, click on Manage data
  • In the next screen, click on New data set
  • In the next screen, for Create a Data Set FROM NEW DATA SOURCES, click on S3
  • In the next screen
    • provide a name for the data source
    • for Upload a manifest file ensure URL is clicked and enter the URL to the manifest file (you can get the url by logging into the S3 console, and then clicking on the manifest file to reveal its properties. Under the Overview tab, you will see Link. This is the URL to the manifest file).NewS3DataSource
    • Click Connect
    • Amazon QuickSight will now read the manifest file and then import the csv file to SPICE. You will see the following screenFinishDataSetCreation
    • Click on Edit/Preview data.
    • In the next screen, you will see the contents of the data file that was imported, along with the Fields name on the left. If you want to exclude any columns from the analysis, simply untick them (I unticked orderTime (S) since I didn’t need it) EditPreviewDataSet
    • By default, the data is called Group 1. To customise the name, replace Group 1 with a text of your choice (I have renamed my data to Orders Data)RenameGroup1Label
    • Click Save & visualize from the top menu

Step 5 – Create Visualisations

Now that you have imported the data into SPICE, you can start analysing it and creating visualisations.

After step 4, you should be in the Analysis section.

  • Depending on which visualisation you want, you can select the respective type under Visual types from the bottom left hand side of the screen. For my visualisations, I chose Pie Chart (side note – you will notice that orderTime (S)  isn’t listed under Fields list. This is because we had unticked it in the previous screen)OrdersDataAnalysis-01
  • I want to create two Pie Charts, one to show me analysis about what is the most popular foodName and another to find out what is the most popular drinkName. For the first Pie Chart, drag foodName (S) from the Fields list to the Value – Add a measure here box  in the top of the screen. Then drag foodName (S) from the Fields list to the Group/Color – Add a dimension here box in the top of the screen. You will see the followingOrdersDataAnalysis-02
  • You can customise the visualisation title Count of Foodname (S) by Foodname (S) by clicking it and then changing the text (I have changed the title to Popularity of Food Types)FoodNamePopularity
  • If you look closely, the legend on the right hand side doesn’t serve much purpose since the pie slices are already labelled quite well. You can also get rid of the legend and get more space for your visual. To do this, click on the down arrow above FoodName (S) on the right and then select Hide legend FoodNameHideLengend
  • Next, lets create a Pie Chart visualisation for drinkName. From the top menu, click on Add and then Add visual drinkNameAddVisual
  • You will now have another Canvas at the bottom of the first Pie Chart. Click this new canvas area to select it (a blue border will appear to show that it is selected). From Visual types at the bottom left hand side, click on the Pie Chart visual. Then from the top, click on Field wells to expose the Value and Group/Color boxes for the second canvas drinkNameCanvas
  • From the Field list on the left, drag drinkName (S) to the Value – Add a measure here box  in the top of the screen. Then drag drinkName (S) from the Fields list to the Group/Color – Add a dimension here box in the top of the screen. You will now see the following foodanddrinkvisual
  • We are almost done. I actually want the two Pie Charts to sit side by side, instead of one ontop ofthe other. To do this, I will show you a neat trick. In each of the visuals, at the bottom right border, you will see two diagonal lines. If you move your mouse pointer over them, they change to a resizing cursor. Use this to resize the visual’s canvas area. Also, in the middle of the top border of the visual, you will see two rows of gray dots. Click your mouse pointer on this and drag to the location you want to move the visual to.VisualResizeandMove
  • I have hidden the legend for the second visual, customised the title and resized both the visuals and moved them side by side. Viola! Below is what I get. Not bad aye!BothVisualsSidebySide

Step 6 – Create a dashboard

Now that the visuals have been created, they can be shared it with others. This can be done by creating a dashboard. A dashboard is a read-only snapshot of the analysis. When you share the dashboard with others, they can view and filter the dashboard data, however any filters applied to the dashboard visual exist only when the user is viewing the dashboard, and aren’t saved once it is closed.

One thing to note about sharing dashboards – you can only share dashboards with users who have an Amazon QuickSight account.

Creating a dashboard is very easy.

  • In the Analysis screen, on the top right corner, click on Share and then select Create dashboardCreateDashboard
  • You can either replace an existing dashboard or create a new one. In our case, since we are creating a new dashboard, select Create a new dashboard as and enter a name for the dashboard. Once finished, click Create dashboardCreateDashboard-Name
  • You will then be asked to enter the username or email address of those you want to share the dashboard with. Enter this and click on Share ShareDashboard
  • That’s it, your dashboard is now created. To access it, go to the Amazon QuickSight home screen (click on the Amazon QuickSight icon on the top left hand side of the screen) and then click on All dashboards. Those that you have shared the dashboard with will also be able to see it once they login to their Amazon QuickSight account.AllDashboards

Step 6 – Refreshing the Data Set

If your data set continually changes, your visualisations/dashboards will not show the updated information. This can be done by refreshing the data set. Doing this will import the new data into SPICE, which will then automatically update the analysis/visualisations and dashboards

Note: you will have to manually reload the webpage to see the updated visualisations and dashboard

There are two ways of refreshing data sets. One is to do it manually while the other is to use a schedule. The scheduled data refresh allows for the data to be automatically refreshed at a certain time daily, weekly or monthly. A maximum of five scheduled refreshes can be configured.

The steps below show how you can manually refresh the data or create schedules to refresh the data

  • From the Amazon QuickSight main screen, click on Manage data from the top left of the screen ManageData
  • In the next screen, you will see all your currently configured data sets. Click the Orders Data dataset (this is the one we had created previously).
  • In the next screen, you will see Refresh Now and Schedule refreshManualScheduleDataRefresh
  • Clicking on Refresh Now will manually refresh the data. Clicking on Schedule refresh will bring up the screen where you can configure a schedule for refreshing the data automatically.

 

That’s it folks! Wasn’t that simple? If you already have an Amazon AWS account, I would strongly recommend giving Amazon QuickSight a try for all your analytics needs. Even if you don’t have an Amazon AWS account, I would still suggest getting an AWS free tier account to try it out.

Enjoy 😉

 

Implementing a Break Glass Process with AWS Systems Manager

Modern day organisations rely on systems to perform critical, sometimes lifesaving tasks. As a result, a common requirement for many organisations is a break-glass process, providing the ability to bypass normal access control procedures when existing authentication mechanisms fail. The implementation of a break glass system often involves considerable effort to ensure the process is not open to malicious use and is auditable, yet simple and efficient. The good news is AWS Systems Manager (SSM) with AWS Key Management Service (KMS) can be leveraged to allow administrative users the ability to recover access to systems on-demand, without having to bake in privileged users with predefined passwords on systems.

How the AWS Systems Manager Break Glass solution works

Before we get into the configuration details, let’s walk through how this all works.

  1. The break-glass process is initiated when an administrative user invokes SSM Run Command against a target system using a custom SSM document for Windows or Linux.
  2. The commands in the SSM document are invoked and the root/admin password is set to a random string of characters. The string is then encrypted using KMS and stored in the SSM Parameter store.
  3. CloudWatch events detects that SSM Run Command has completed successfully and initiates a Lambda function to clean up the reset password.
  4. The Lambda function waits for 60 seconds, then removes the password from the parameter store.

As you can see, there’s minimal password management required for this solution without having to compromise security. Now that we have an understanding of how the solution hangs together, let’s take a look at how to set it up.

Creating the Customer Master Key

To begin, we need to create a key that will be used to encrypt passwords written to SSM parameter store. You can use the IAM section of the AWS Management Console to create a Customer Master Key by performing the following:

  1. Open the Encryption Keys section of the Identity and Access Management (IAM) console.
  2. For Region, choose the appropriate AWS region.
  3. Choose Create key.
  4. Type an alias for the CMK. Choose Next Step.
  5. Select which IAM users and roles can administer the CMK. Choose Next Step.
  6. Select which IAM users can use the CMK to encrypt and decrypt data. These users will be able to perform the break glass process. Choose Next Step.
  7. Choose Finish to create the CMK.

Creating the EC2 Policy

Great, so we’ve got a key set up. We now need to provide our instances access to encrypt the password and store it in the SSM parameter store. To do this, we need to create a custom IAM policy by performing the following:

  1. Open the IAM console.
  2. In the navigation column on the left, choose Policies.
  3. At the top of the page, choose Create Policy.
  4. On the Create Policy page choose Select on Create Your Own Policy.
  5. For Policy Name, type a unique name.
  6. The policy document you’ll want to use is defined below. Note that the key ARN defined here is the CMK created in the previous step.
  7. When you are done, choose Create Policy to save your completed policy.

Creating the EC2 Role

We now need to assign the policy to our EC2 instances. Additionally, we need to allow our instances access to communicate with the SSM endpoint. To do this, we’ll need to create an appropriate EC2 role:

  1. Open the IAM console.
  2. In the navigation pane, choose Roles, Create new role.
  3. On the Select role type page, choose Select next to Amazon EC2.
  4. On the Attach Policy page, select AmazonEC2RoleforSSM and the policy you created in the previous step.
  5. On the Set role name and review page, type a name for the role and choose Create role.

Attaching the Role to the EC2 Instance

After creating the EC2 role, we then need to attach it to the target instance(s).

  1. Navigate to the EC2 console.
  2. Choose Instances in the navigation pane.
  3. Select the target instance you intend to test the break-glass process on.
  4. Choose Actions, choose Instance Settings and then Attach/Replace IAM role from the drop-down list.
  5. On the Attach/Replace IAM role page, choose the role created in the previous step from the drop-down list.
  6. After choosing the IAM role, proceed to the next step by choosing Apply.

Creating the Password Reset SSM Document

An AWS Systems Manager Document defines the actions that are performed on the target instance(s). We need to create a multi-step cross-platform document that can reset Linux or Windows passwords based on the target platform. To do this, perform the following:

  1. Open the Amazon EC2 console.
  2. In the navigation pane, choose Documents.
  3. Choose Create Document.
  4. Type a descriptive name for the document.
  5. In the Document Type list, choose Command.
  6. Delete the brackets in the Content field, and then paste the document below containing scripts for Windows and Linux. Remember to replace the CMKs and region in the both scripts.
  7. Choose Create Document to save the document.

Congratulations! So far, you’ve set up the password reset functionality. Technically, you could stop here and you’d have a working break-glass capability, however we’re going to go one step further and add a clean-up process to remove the password from the parameter store for added security, as described below.

Creating the Lambda Function Policy

Our password clean-up process will use a Lambda function to delete the password from the parameter store. We’ll need to create an IAM policy to allow the Lamda function to do this.

  1. Open the IAM console.
  2. In the navigation column on the left, choose Policies.
  3. At the top of the page, choose Create Policy.
  4. On the Create Policy page choose Select on Create Your Own Policy.
  5. For Policy Name, type a unique name.
  6. The policy document to use is defined below.
  7. When you are done, choose Create Policy to save your completed policy.

Creating the Lambda Function Role

We now need to attach the policy to a role that will be used by our lambda function.

  1. Open the IAM console.
  2. In the navigation pane, choose RolesCreate new role.
  3. On the Select role type page, choose Select next to AWS Lambda.
  4. On the Attach Policy page, select CloudWatchLogsFullAccess (for logging purposes) and the policy you created in the previous step.
  5. On the Set role name and review page, type a name for the role and choose Create role.

Creating the Lambda Function

We now need to create the Lambda Function that will delete the password, and attach the role created in the previous step.

  1. Open the AWS Lambda console.
  2. Choose Create Function.
  3. Choose Author from scratch.
  4. On the triggers page, click Next.
  5. Under Basic Information enter a name for your function and select the Python 2.7 runtime.
  6. Under Lambda function code, enter the code below.
  7. Under Lambda Function handler and role, choose the role you created in the previous step.
  8. Expand advanced settings and extend the timeout to 90 seconds.
  9. Choose Next and review the summary page then click Create Function.

Creating the CloudWatch Event

Almost there! The last step is to capture a successful execution of SSM Run Command, then trigger the previously created Lambda function. We can capture this using CloudWatch events:

  1. Open the CloudWatch console.
  2. In the navigation pane, choose Events.
  3. Choose Create rule.
  4. For Event Source, Choose Event Pattern, then choose Build custom event pattern, from the dropdown box.
  5. Enter the following into the text box, replacing the document name with the SSM document that was created earlier.
  6. For Targets, choose Add target and then choose Lambda function.
  7. For Function, select the Lambda function that was created in the previous step.
  8. Choose Configure details.
  9. For Rule definition, type an appropriate name.
  10. Choose Create rule.

That’s it! All that’s left is taking the process for a test drive. Let’s give it a shot.

Testing the Process

Assuming you’ve logged into the console with a user that has decrypt access for the CMK used, the following process can be used to access the password:

  1. Open the Amazon EC2 console.
  2. In the navigation pane under Systems Manager Services, choose Run Command.
  3. Choose Run a command.
  4. For Command document, choose the SSM Document created earlier.
  5. For Target instances, choose an instance that has the previously created EC2 role attached. If you do not see the instance in this list, it might not have the correct role attached, or may not be able to access the SSM endpoint.
  6. Choose Run, and then choose View results.
  7. In the commands list, choose the command you just executed. If the command is still in progress, click the refresh icon in the top right corner of the console.
  8. When the Status column shows Success, click the Output tab.
  9. The output tab will display successful execution of both plugins described in our SSM document. If we click View Output on both we’ll notice that one didn’t execute due to not meeting the platform precondition we set. The other plugin should show that the password reset executed successfully.

  1. In the navigation pane under Systems Manager Shared Resources, choose Parameter Store.
  2. Assuming 60 seconds hasn’t elapsed (because our clean-up function will kick in and delete the password) there should be a parameter called pwd-<instance-ID>. After selecting the parameter, the Description tab below will show a SecureString.
  3. Click on Show to view the password.

You can now use this password to access the administrator/root account of your instance. Assuming the password clean-up script is configured correctly, the password should disappear from the parameter store within 60 seconds of the Run Command completing.

Conclusion

The above process provides a simple and secure method for emergency access to both Windows and Linux systems, without the complex process and inherent risk of a traditional break-glass system. Additionally, this method has no running systems, providing a break-glass capability at nearly no cost.

 

AWS DeepLens – Part 1 – Getting the DeepLens Online

Look what I got my hands on!

Today I will be taking you through the initial setup of the yet to be released AWS DeepLens. DeepLens is rumoured to be released globally in April 2018.

What is the AWS DeepLens?

Announced at AWS Re-Invent 2017, DeepLens is a marriage of:

  • HD Camera
  • Intel based computer with an on-board GPU
  • Ubuntu OS
  • AWS Greengrass
  • AWS IOT
  • AWS Lambda
  • AWS SageMaker

This marriage of technologies is designed to assist developers achieve Deep-Learning inference at the edge device. The edge is typically at the end of the pipeline. What does this all mean?

AWS have made a big play at standardising a data engineer’s pipeline, from writing code in Jupyter notebooks, running training over a cluster, producing a standardised model and finally deploying the model to perform inference at the edge. AWS DeepLens fits in the last step of this pipeline.

Further information can be found here:

https://aws.amazon.com/deeplens/#Tech_Specs

https://aws.amazon.com/deeplens/

With that out of the way, let’s get started.

What’s needed

To get started, the following is required:

  • An AWS account
  • A WiFi network with internet access
  • A computer with a WiFi adaptor and a web browser
  • A power adaptor from the US plug type to your own countries power plug type (as of the writing of this post)

For troubleshooting you will need the following:

  • Micro-HDMI to HDMI cable
  • Monitor with a HDMI port
  • USB keyboard and mouse

Gotchas

Before we go any further through the setup process, there are a few gotchas I encountered while getting the device online that are worth highlighting sooner rather than later:

  • Ensure the wireless network your connecting to is not on a 192.168.0.0/24 network
  • Turn off any JavaScript blocking plugins in your web browser
  • The password for the DeepLens WiFi may have confusing letters in the them like a capital i that looks like a L

A recent update confirmed there were Wi-Fi issues seen here on the AWS DeepLens Developer Forum

Setting up the DeepLens in the AWS console

  1. Login into your AWS Management console 
  2. Switch to the US-East region (the only available region for the DeepLens at the time of writing)
  3. Click on the DeepLens Service under Machine Learning       
  4. Select Devices from the top navigation bar to navigate to the projects page
  5. Click the Register Device button on the right side of the screen
  6. Give your DeepLens a descriptive name, then select Next
  7. On the Set permissions page, select the Create a role in IAM for all fields, then select Next
  8. You’re now provided with an AWS generated certificate that authenticates AWS DeepLens with the IOT Greengrass service. Click Download certificate and store the zip file in a safe place
  9. Click Register . You’re now ready to plug in you DeepLens

Unpacking and plugging in

The DeepLens comes with:

  • A power pack with a US style power plug
  • A micro SD card
  • The DeepLens itself

To connect the DeepLens, perform the following:

  • Insert the micro SD card into the back of the DeepLens
  • Attach the power adaptor from the DeepLens to the wall socket
  • Press the power button

Connecting to the DeepLens WiFi access point from a PC

This is well documented by the AWS Management console, as displayed in the screenshot below.

DeepLens_Setup

The only thing to add here, is to watch out for confusing characters in the password on the device.

Once you have navigated to http://192.168.0.1/ in your web browser, you will get to a web-based-wizard that steps through the setup process beginning with connecting to your WiFi network.

Connecting the DeepLens to your network

Select your SSID from the list and provide your network password. Once connected you will most likely see a screen that mentions the device is updating. Ensure you wait the dictated length of time before clicking on the Reboot button. If nothing happens wait longer and try again.

Uploading the certificate

Once the device reboots the screen will allow you to upload the certificate downloaded in the previous step, then click Next.

Setting up ssh and other settings

On the final page, specify a device password  and enable ssh access to the device. There’s also an additional option to enable automatic device updates. Automatic updates are on by default and I recommend leaving it that way. Once you’re ready, click Review.

Validating your DeepLens connectivity

Ensure you finalise the configuration by clicking Finish.

You should now see the following screen which indicates you have completed the wizard:

At this point you should see the DeepLens Registration status in the AWS Management console move from In Progress to Completed. This may take a up to 5 minutes.

You are now ready to deploy machine learning projects to the device.

Next time

In the next blog post, we’ll deploy some of the sample projects and learn how they work. We’ll also explore how this integrates with AWS SageMaker.

On-demand, Scaleable VPN Access to AWS

Recent growth in our Managed Services business (driven in part by our acquisition by Telstra) has meant that a number of tools and processes that we have previously taken for granted have had to be re-assessed and re-architected to allow us to scale and maintain the same level of service at low costs.

One particular area that we’ve recently reworked is how we remotely access and administer workloads within customer’s AWS environments. Previous methods of access leveraged either static bastion hosts or VPN endpoints and they worked well up until a point, but after analysing at the overall footprint of resources used and costs incurred by doing so, it became clear to us that we needed to find a better way.

Traditional methods of using a of a single, common ‘shared’ management zone was discarded after analysing the various security and regulatory requirements of our customers, so we had to come up with something else. We needed a solution that was;

  • Secure, preventing access by unwanted parties and encrypting our communications to and from the customer networks;
  • Auditable, capturing when a DevOps engineer connected to and disconnected from a customer account;
  • Resilient, able to operate in event of a VM, host or AZ failure;
  • Cost effective, aligning to AWS principles of making the most of the resources used; and of course
  • Scalable, allowing us to have several users connected at once as well as being able to have the solution deployed across tens to hundreds of customer environments.

Traditional approaches having using redundant, highly available VPN or virtual desktop capabilities permanently running seemed expensive and inefficient – they were always running (in case someone needed to connect) and that meant ongoing costs, even when not in use – there had to be a better way… Looking at Auto Scaling Groups and other approaches where systems are treated as ephemeral rather than permanent, we started to toy with the idea of having a remote access service created on-demand using AWS’s APIs to generate a temporary, nano-sized VPN server only when needed and then torn down when finished with. Basically, we would use AWS APIs (1) to create the VPN server, somehow create a temporary access key – kind of like a one-time access token – and then use this key to establish a tunnel into the VPC.

After a bit of tinkering around, we managed to pull together a proof of concept solution which validated our objectives.  I wanted to share the proof of concept environment we developed so that others could use it in their environments to reduce their costs of remote access, and although we have evolved it beyond what is described below, the core concepts of how it operates remains the same. The proof of concept design consists of a few components;

  • A workload AWS account and VPC – containing the systems that we manage and need network-level access to.
  • A management AWS account and VPC – this is our entry point into the workload AWS account and VPC. It is peered into the workload VPC. For the PoC, the public-facing subnet, routing, peering and role to be used by our OpenVPN instances with CloudWatch Logs and CloudWatch permissions are expected to be pre-created.
  • (Optional) a SAML IDP – We use Azure AD as our IDP. This allows us to have a central location to store our engineer’s identities so don’t have to manage multiple sets of credentials as we horizontally scale our management accounts. For the PoC, this is not required but nice to have.
  • Management Role to assume – this AWS role requires enough permissions to allow for the creation and configuration of EC2 instances and to be able to be assumed by our engineers.
  • The last piece required to pull the whole PoC together is a PowerShell script which coordinates everything and is described in the execution flow further on.

The script performs a number of actions once it is executed by the user (1). It uses the locally-installed OpenSSL binaries (https://wiki.openssl.org/index.php/Binaries) to generate a self-signed certificate pair (2) to be used for the requested connection. The script will generate a new set of certificates every time it is run – kind of like a single-use set of credentials. From there it then leverages an AzureAD login script (https://www.npmjs.com/package/aws-azure-login) to allow the user to authenticate against AzureAD via the command line (3). Username, password and MFA token are checked by AzureAD and a SAML token provided back to the script (4). This token is then presented onto the AWS management account using the AssumeRoleWithSAML API, authorises the user and returns a SecurityTokenService token for the role assumed (5, 6 and 7). The role has permissions to create an EC2 instance, some basic IAM permissions (to assign a role to the EC2 instance). Once the role has been assumed, the script then goes onto calling the EC2 APIs (8) to create the temporary OpenVPN server, with the setup and configuration passed in as User Data. This includes the installation and configuration of OpenVPN as well as certificates generated at step 2. The script waits for the EC2 instance to be created successfully (9) and obtains the ephemeral public IP address of the system plus the network routes within the VPC for local destinations as well as peered VPCs (10). The script then creates the configuration file using the information gathered in steps 2 and 10, executes OpenVPN client on the local system to create a tunnel into the newly-created OpenVPN server and updates the local route table to allow connectivity into the AWS networks. Once connected the user is free to connect via SSH/RDP etc. to various endpoints within the management or peered workload account (11).

We’ve found that this whole process takes somewhere around 1 to 2 minutes to complete, all the way from certificate creation to tunnel establishment.

All in all, the whole solution is quite simple and makes use of a number of well-established features from AWS to provide a very cost effective and scalable way to access a remote environment for our DevOps engineers. To top it all off, the (not so) recent announcement by Amazon to move to per-second billing for Linux workloads makes this approach even more attractive, ensuring that we only pay for the resources that we use.

Supercharge your CloudFormation templates with Jinja2 Templating Engine

If you are working in an AWS public cloud environment chances are that you have authored a number of CloudFormation templates over the years to define your infrastructure as code. As powerful as this tool is, it has a glaring shortcoming: the templates are fairly static having no inline template expansion feature (think GCP Cloud Deployment Manager.) Due to this limitation, many teams end up copy-pasting similar templates to cater for minor differences like environment (dev, test, prod etc.) and resource names (S3 bucket names etc.)

Enter Jinja2. A modern and powerful templating language for Python. In this blog post I will demonstrate a way to use Jinja2 to enable dynamic expressions and perform variable substitution in your CloudFormation templates.

First lets get the prerequisites out of the way. To use Jinja2, we need to install Python, pip and of course Jinja2.

Install Python

sudo yum install python

Install pip

curl "https://bootstrap.pypa.io/get-pip.py" -o "get-pip.py"
sudo python get-pip.py

Install Jinja2

pip install Jinja2

To invoke Jinja2, we will use a simple python wrapper script.

vi j2.py

Copy the following contents to the file j2.py

import os
import sys
import jinja2

sys.stdout.write(jinja2.Template(sys.stdin.read()).render(env=os.environ))

Save and exit the editor

Now let’s create a simple CloudFormation template and transform it through Jinja2:


vi template1.yaml

Copy the following contents to the file template1.yaml


---

AWSTemplateFormatVersion: '2010-09-09'

Description: Simple S3 bucket for {{ env['ENVIRONMENT_NAME'] }}

Resources:

S3Bucket:

Type: AWS::S3::Bucket

Properties:

BucketName: InstallFiles-{{ env['AWS_ACCOUNT_NUMBER'] }}

As you can see it’s the most basic CloudFormation template with one exception, we are using Jinja2 variable for substituting the environment variable. Now lets run this template through Jinja2:

Lets first export the environment variables


export ENVIRONMENT_NAME=Development

export AWS_ACCOUNT_NUMBER=1234567890

Run the following command:


cat template1.yaml | python j2.py

The result of this command will be as follows:


---

AWSTemplateFormatVersion: '2010-09-09'

Description: Simple S3 bucket for Development

Resources:

S3Bucket:

Type: AWS::S3::Bucket

Properties:

BucketName: InstallFiles-1234567890

As you can see Jinja2 has expanded the variable names in the template. This provides us with a powerful mechanism to insert environment variables into our CloudFormation templates.

Lets take another example, what if we wanted to create multiple S3 buckets in an automated manner. Generally in such a case we would have to copy paste the S3 resource block. With Jinja2, this becomes a matter of adding a simple “for” loop:


vi template2.yaml

Copy the following contents to the file template2.yaml


---

AWSTemplateFormatVersion: '2010-09-09'

Description: Simple S3 bucket for {{ env['ENVIRONMENT_NAME'] }}

Resources:

{% for i in range(1,3) %}

S3Bucket{{ i }}:

Type: AWS::S3::Bucket

Properties:

BucketName: InstallFiles-{{ env['AWS_ACCOUNT_NUMBER'] }}-{{ i }}

{% endfor %}

Run the following command:


cat template2.yaml | python j2.py

The result of this command will be as follows:


---

AWSTemplateFormatVersion: '2010-09-09'

Description: Simple S3 bucket for Development

Resources:

S3Bucket1:

Type: AWS::S3::Bucket

Properties:

BucketName: InstallFiles-1234567890-1

S3Bucket2:

Type: AWS::S3::Bucket

Properties:

BucketName: InstallFiles-1234567890-2

As you can see the resulting template has two S3 Resource blocks. The output of the command can be redirected to another template file to be later used in stack creation.

I am sure you will appreciate the possibilities Jinja2 brings to enhance your CloudFormation templates. Do note that I have barely scratched the surface of this topic, and I highly recommend you to have a look at the Template Designer Documentation found at http://jinja.pocoo.org/docs/2.10/templates/ to explore more possibilities. If you are using Ansible, do note that Ansible uses Jinja2 templating to enable dynamic expressions and access to variables. In this case you can get rid of the Python wrapper script mentioned in this article and use Ansible directly for template expansion.

Replacing the service desk with bots using Amazon Lex and Amazon Connect (Part 4)

Welcome back to the final blog post in this series! In parts 1, 2 and 3, we set up an Amazon Lex bot to converse with users, receive and validate verification input, and perform a password reset. While we’ve successfully tested this functionality in the AWS console, we want to provide our users with the ability to call and talk with the bot over the phone. In this blog post, we’ll wire up Amazon Connect with our bot to provide this capability.

What is Amazon Connect

Amazon Connect is a Cloud based contact service center that can be set up in minutes to take phone calls and route them to the correct service center agents. Additionally, Connect is able to integrate with Amazon Lex to create a self-service experience, providing a cost effective method for resolving customer queries without having to wait in queue for a human agent. In our case, Lex will be integrated with Amazon Connect to field password reset requests.

Provisioning Amazon Connect

The following steps provision the base Amazon Connect tenant:

  1. Begin by heading to the AWS Console, then navigate to Amazon Connect and select Add an instance.
  2. Specify a sub-domain for the access URL which will be used to log into Amazon Connect. Select Next step.
  3. For now, skip creating an administrator by selecting Skip this, then select Next step.
  4. For Telephony Options ensure Incoming Calls is selected and Outbound Calls is unselected, then click Next step.
  5. Accept the default data storage configuration and select Next step.
  6. Finally, review the information provided and select Create instance.

That’s all that’s required to provision the Amazon Connect service. Pretty simple stuff. It takes a few minutes to provision, then you’ll be ready to begin configuring your Amazon Connect tenant.

Configuring Amazon Connect

Next, we need to claim a phone number to be used with our service:

  1. Once Amazon Connect has been provisioned, click Get started to log into your Amazon Connect instance.
  2. On the Welcome to Amazon Connect page, select Let’s Go.
  3. To claim a phone number, select your preferred country code and type then select Next. You may find that there are no available numbers for your country of choice (like the screenshot below). If that’s the case and it’s necessary that you have a local number, you can raise a support case with Amazon. For testing purposes, I’m happy to provision a US number and use Google Hangouts to dial the number for free.
  4. When prompted to make a call, choose Skip for now.

You should now be at the Amazon Connect dashboard where there are several options, but before we can continue, we first need to add the Lex bot to Amazon Connect to allow it to be used within a contact flow.

Adding Lex to Amazon Connect

  1. Switch back to the AWS Console and navigate to Amazon Connect.
  2. Select the Amazon Connect instance alias created in the previous step.
  3. On the left-hand side, select Contact Flows.
  4. Under the Amazon Lex section, click Add Lex Bot, and select the user administration bot we created.
  5. Select Save Lex Bots.

Now that our bot has been added to Amazon Connect, we should be able to create an appropriate Contact Flow that leverages our bot.

Creating the Contact Flow

  1. Switch back to the Amazon Connect dashboard then navigate to Contact Flows under routing on the left sidebar.
  2. Select Create contact flow and enter a name (e.g. User administration) for the contact flow.
  3. Expand the Interact menu item then click and drag Get customer input to the grid.
  4. Click the Get customer input item, and set the following properties:
    • Enable Text to speech then add a greeting text (e.g. Welcome to the cloud call center. What would you like assistance with?).
    • Ensure that Interpret as is set to Text
    • Choose the Amazon Lex option, then add the Lex Bot name (e.g. UserAdministration) and set the alias to $LATEST to ensure it uses the latest build of the bot.
  5. Under Intents, select Add a parameter then enter the password reset intent for the Lex Bot (e.g. ResetPW)
  6. Select Save to save the configuration.It’s worth noting that if you wanted to send the user’s mobile number through to your Lex bot for verification purposes, this can be done by sending a session attribute as shown below. The phone number will be passed to the Lambda function in the sessionAttributes object.
  7. On the left sidebar, expand Terminate/Transfer then drag and drop Disconnect/Hang up onto the grid.
  8. Connect the Start entry point to the Get Customer Input box and connect all the branches of the Get Customer Input Box to the Disconnect/Hang up box as shown below.
    We could have added more complex flows to deal with unrecognised intents or handle additional requests that our Lex bot isn’t configured for (both of which would be forwarded to a human agent), however this is outside the scope of this blog post.
  9. In the top right-hand corner above the grid, select the down arrow, then Save & Publish.

Setting the Contact Flow

Now that we have a contact flow created, we need to attach it to the phone number we provisioned earlier.

  1. On the left sidebar in the Amazon Connect console, select Phone Numbers under the Routing menu then select the phone number listed.
  2. Under the Contact Flow/IVR dropdown menu, select the Contact flow you created, then select Save.

Testing the Contact Flow

Now that we’ve associated the contact flow with the phone number, you’re ready for testing! Remember, if you’ve provisioned a number in the US (and you’re overseas), you can dial for free using Google hangouts.

That’s it! You now have a fully functioning chatbot that can be called and spoken to. From here, you can add more intents to build a bot that can handle several simple user administration tasks.

A few things worth noting:

  • You may notice that Lex repeats the user ID as a number, rather than individual digits. Unfortunately, Amazon Connect doesn’t support SSML content from Lex at this time however it’s in the product roadmap.
  • You can view missed utterances on the Monitoring tab on your Lex bot and potentially add them to existing intents. This is a great way to monitor and expand on the capability of your bot.

Patching LINUX EC2 through SSM

This blog deals with configuring patches for Linux EC2 instances through AWS Systems Manager (SSM).

Mentioned below is the link for patching Windows-based EC2 instances using SSM

https://blog.kloud.com.au/2017/05/08/patching-ec2-through-ssm/

The configuration has three major sections

  • EC2 instance configuration for patching
  • Default Patching Baseline Configuration
  • Maintenance Window configuration.

1 Instance Configuration

We will start with the First section which is configuring the Instances to be patched. This requires the following tasks.

  1. Create Amazon EC2 Role for patching with two policies attached
    • AmazonEC2RoleForSSM
    • AmazonSSMFullAccess
  2. Assign Roles to the EC2 Instances
  3. Configure Tags to ensure patching in groups.

Mentioned below are the detailed steps for the creation of an IAM role for Instances to be Patched using Patch Manager.

Step 1: Select IAM —–> Roles and Click on Create New Role

1

Step 2: Select Role Type —-> Amazon EC2

2.PNG

Step 3: Under Attach Policy Select the following and Click Next

  • AmazonEC2RoleForSSM
  • AmazonSSMFullAccess

3.PNG

Step 4: Enter the Role Name and Select Create Role (At the bottom of the page)

4.PNG

Now you have gone through the first step in your patch management journey.

Instances should be configured to use the above created role to ensure proper patch management. (or any roles which has AmazonEC2RoleforSSM and AmazonSSMFullAccess policies attached to it.)

5.PNG

We need to group our AWS hosted servers in groups cause no one with the right frame of mind wants to patch all the servers in one go.

To accomplish that we need to use Patch Groups (explained later).

Patch groups can be created by simply creating a tag for EC2 instances with a  tag key of “Patch Group” which is case sensitive.

We can use Group01 and Group02 as the value for the “Patch Group” tag.

To utilize patch groups, all EC2 instances should be tagged to support cumulative patch management based on Patch Groups.

Congratulations, you have completed the first section of the configuration. Keep following just two to go.

2 Default Patch Baseline Configuration

Patch baseline configuration controls what patches are to be installed on the instances based on the following classification of Patches

  • Product Type  : OS Version
  • Classification: CriticalUpdates, SecurityUpdates, ServicePacks, UpdateRollUps
  • Severity : Critical , Important , etc.

Mentioned below are the steps for creating patch baseline for AmazonLinux EC2 instances.

Note: The process is quite similar for Ubuntu, Red Hat Enterprise Linux and Windows.

Step 01: Select EC2 —> Select Patch Baselines (under the Systems Manager Services Section) and Click on Create Patch Baseline

pb01

Step 03: Fill in the details

  • Name : MyAmazonLinuxPatchBaseline
  • Description: MyAmazonLinuxPatchBaseline
  • Operating System : AmazonLinux ( Choose operating system based on the workload )

pb02

Step 04 : Configure Approval Rules based on the patching policy.

pb06

Step 05: Complete Patch Exceptions and click on Create Patch Baseline

pb031.png

Step 06: Select the created Patch baseline

Step 07: Go to Actions and click Set Default Patch Baseline

pb04

Step 08: Click on Set Default Patch Baseline

pb05

At this point, the instances to be patched are configured and we have also configured the patch policies. The next section we provide AWS the when (Date and Time) and what (task) of the patching cycle.

3 Maintenance Windows Configuration

As the name specifies, Maintenance Windows give us the option to Run Tasks on EC2 Instances on a specified schedule.

What we wish to accomplish with Maintenance Windows is to Run a Command (Apply-AWSPatchBaseline), but on a given schedule and on a subset of our servers. This is where all the above configurations gel together to make patching work.

Configuring Maintenance windows consist of the following tasks.

  • IAM role for Maintenance Windows
  • Creating the Maintenance Window itself
  • Registering Targets (Selecting servers for the activity)
  • Registering Tasks (Selecting tasks to be executed)

Mentioned below are the detailed steps for configuring all the above.

Step 01: Create a Role with the following policy attached

  • AmazonSSMMaintenanceWindowRole

9.PNG

Step 02: Enter the Role Name and Role Description

10.PNG

Step 03: Click on Role and copy the Role ARN

Step 04: Click on Edit Trust Relationships

11.PNG

Step 05: Add the following values under the Principal section of the JSON file as shown below

“Service”: “ssm.amazonaws.com”

Step 06: Click on Update Trust Relationships (on the bottom of the page)

12.PNG

At this point the IAM role for the maintenance window has been configured. The next section details the configuration of the maintenance window.

Step 01: Click on EC2 and select Maintenance Windows (under the Systems Manager Shared Resources section)

13.PNG

Step 02: Enter the details of the maintenance Windows and click on Create Maintenance Windows

14.PNG

At this point the Maintenance Window has been created. The next task is to Register Targets and Register Tasks for this maintenance window.

Step 01: Select the Maintenance Window created and click on Actions

Step 02: Select Register Targets

15.PNG

Step 03: Enter Target Name, Description, Owner Information and select the Tag Name and Tag Value

Step 04: Select Register Targets

registertargets

At this point the targets for the maintenance window have been configured. This leaves us with the last activity in the configuration which is to register the tasks to be executed in the maintenance window.

Step 01: Select the Maintenance Window and Click on Actions

Step 02: Select Register run command Task

registerruncmd02

Step 03: Select AWS-RunPatchBaseline from the Document section

registerruncmd03.PNG

Step 04: Click on Registered targets and select the instances based on the Patch Group Tag

Step 05: Select Operation SCAN or Install based on the desired function (Keep in mind that an Install will result in a server restart).

Step 06: Select the MaintenanceWindowsRole

Step 07: Click on Register Tasks

19.PNG

After completing the configuration, the Registered Task will run on the Registered Targets based on the schedule specified in the Maintenance Window.

The status of the Maintenance Window can be seen in the History section (as Shown below)

20.PNG

Hope this guide does get you through the initial patching configuration for your EC2 instances in Amazon.

Thanks for Reading.

Update FSTAB on multiple EC2 instances using Run Commands

Scenario:

  • Customer Running multiple Linux Ec2 instance in AWS.
  • Customer reports that Instances are loosing mount points after a reboot.

Solution :

The resolution requires to update the fstab file on all the instances.

fstab is a system configuration file on Linux and other Unix-like operating systems that contains information about major filesystems on the system. It takes its name from file systems table, and it is located in the /etc directory ( ref : http://www.linfo.org/etc_fstab.html)

In order to update files on multiple servers we will utilize the following

  • ECHO command with append parameter (>>) to update the text file through shell
  • SSM Run Command to execute the command on multiple machines.

Note : All the concerned EC2 instances should have SSM manager configured.

Step 1 : Login to the AWS Console and click  EC2

click on ec2

 Step 2: Click on Run Command on the Systems Manager Services section

click on Run command

Step 3: Click on Run Command in the main panel

click-on-run-command-2.png

Step 4: Select Run-Shell Script

select run-shell script

Step 5: Select Targets 

Note : Targets can be selected manually or we can use Tags to perform the same activity on multiple instances with the matching tag.

select targets and stuff

Step 6:

Enter the following information :

  • Execute on : Specifies the number of target the commands can be executed concurrently. Concurrently running commands save time in execution.
  • Stop After : 1 errors
  • Timeout ( seconds) : leave the default 600 seconds

Step 7: Select the commands section and paste the command

echo '10.x.x.x:/ /share2 nfs4 nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,_netdev 0 0' >> /etc/fstab
echo '10.x.x.x:/ /share1 nfs4 nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,_netdev 0 0' >> /etc/fstab
echo '192.x.x.x:/ /backup1 nfs4 nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,_netdev 0 0' >> /etc/fstab
echo '172.x.x.x:/ /backup2 nfs4 nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,_netdev 0 0' >> /etc/fstab            

 

Step 8 : Click on Run click on run

Step 9: Click on command id to get update regarding the execution success of failure

click on command id to check the status of the coomand