Dynamically rename an AWS Windows host deployed via a syspreped AMI

One of my customers presented me with a unique problem last week. They needed to rename a Windows Server 2016 host deployed using a custom AMI without rebooting during the bootstrap process. This lack of a reboot rules out the simple option of using the PowerShell Rename-Computer Cmdlet. While there are a number of methods to do this, one option we came up with is dynamically updating the sysprep unattended answer file using a PowerShell script prior to the unattended install running during first boot of a sysprepped instance.

To begin, we looked at the Windows Server 2016 sysprep process on an EC2 instance to get an understanding of the process required. It’s important to note that this process is slightly different to Server 2012 on AWS, as EC2Launch has replaced EC2Config. The high level process we need to perform is the following:

  1. Deploy a Windows Server 2016 AMI
  2. Connect to the Windows instance and customise it as required
  3. Create a startup PowerShell script to set the hostname that will be invoked on boot before the unattended installation begins
  4. Run InitializeInstance.ps1 -Schedule to register a scheduled task that initializes the instance on next boot
  5. Modify SysprepInstance.ps1 to replace the cmdline registry key after sysprep is complete and prior to shutdown
  6. Run SysprepInstance.ps1 to sysprep the machine.

Steps 1 and 2 are fairly self-explanatory, so let’s start by taking a look at step 3.

After a host is sysprepped, the value for the cmdline registry key located in HKLM:\System\Setup is populated with the windeploy.exe path, indicating that it should be invoked after reboot to initiate the unattended installation. Our aim is to ensure that the answer file (unattend.xml) is modified to include the computer name we want the host to be named prior to windeploy.exe executing. To do this, I’ve created the following PowerShell script named startup.ps1 and placed it in C:\Scripts to ensure my hostname is based on the internal IP address of my instance.

Once this script is in place, we can move to step 4, where we schedule the InitializeInstance.ps1 script to run on first boot. This can be done by running InitializeInstance.ps1 -Schedule located in C:\ProgramData\Amazon\EC2-Windows\Launch\Scripts

Step 5 requires a modification of the SysprepInstance.ps1 file to ensure the shutdown after Sysprep is delayed to allow modification of the cmdline registry entry mentioned in step 3. The aim here is to replace the windeploy.exe path with our startup script, then shutdown the host. The modifications to the PowerShell script to accommodate this are made after the “#Finally, perform Sysprep” comment.

Finally, run the SysprepInstance.ps1 script.

Once the host changes to the stopped state, you can now create an AMI and use this as your image.

Viewing AWS CloudFormation and bootstrap logs in CloudWatch

Mature cloud platforms such as AWS and Azure have simplified infrastructure provisioning with toolsets such as CloudFormation and Azure Resource Manager (ARM) to provide an easy way to create and manage a collection of related infrastructure resources. Both tool sets allow developers and system administrators to use JavaScript Object Notation (JSON) to specify resources to provision, as well as provide the means to bootstrap systems, effectively allowing for single click fully configured environment deployments.

While these toolsets are an excellent means to prevent RSI from performing repetitive monotonous tasks, the initial writing and testing of templates and scripts can be incredibly time consuming. Troubleshooting and debugging bootstrap scripts usually involves logging into hosts and checking log files. These hosts are often behind firewalls, resulting in the need to use jump hosts which may be MFA integrated, all resulting in a reduced appetite for infrastructure as code.

One of my favourite things about Azure is the ability to watch the ARM provisioning and host bootstrapping process through the console. Unless there’s a need to rerun a script on a host and watch it in real time, troubleshooting the deployment failure can be performed by viewing the ARM deployment history or viewing the relevant Virtual Machine Extension. Examples can be seen below:

This screenshot shows the ARM resources have been deployed successfully.

This screenshot shows the ARM resources have been deployed successfully.

This screenshot shows the DSC extension status, with more deployment details on the right pane.

This screenshot shows the DSC extension status, with more deployment details on the right pane.

While this seems simple enough in Azure, I found it a little less straight forward in AWS. Like Azure, bootstrap logs for the instance reside on the host itself, however the logs aren’t shown in the console by default. Although there’s a blog post on AWS to view CloudFormation logs in CloudWatch, it was tailored to Linux instances. Keen for a similar experience to Azure, I decided to put together the following instructional to have bootstrap logs appear in CloudWatch.

To enable CloudWatch for instances dynamically, the first step is to create an IAM role that can be attached to EC2 instances when they’re launched, providing them with access to CloudWatch. The following JSON code shows a sample policy I’ve used to define my IAM role.

The next task is to create a script that can be used at the start of the bootstrap process to dynamically enable the CloudWatch plugin on the EC2 instance. The plugin is disabled by default and when enabled, requires a restart of the EC2Config service. I used the following script:

It’s worth noting that the EC2Config service is set to recover by default and therefore starts by itself after the process is killed.

Now that we’ve got a script to enable the CloudWatch plugin, we need to change the default CloudWatch config file on the host prior to enabling the CloudWatch plugin. The default CloudWatch config file is AWS.EC2.Windows.CloudWatch.json and contains details of all the logs that should be monitored as well as defining CloudWatch log groups and log streams. Because there’s a considerable number of changes made to the default file to achieve the desired result, I prefer to create and store a customised version of the file in S3. As part of the bootstrap process, I download it to the host and place it in the default location. My customised CloudWatch config file looks like the following:

Let’s take a closer look at what’s happening here. The first three components are windows event logs I’m choosing to monitor:

You’ll notice I’ve included the Desired State Configuration (DSC) event logs, as DSC is my preferred configuration management tool of choice when it comes to Windows. When defining a windows event log, a level needs to be specified, indicating the verbosity of the output. The values are as follows:

1 – Only error messages uploaded.
2 – Only warning messages uploaded.
4 – Only information messages uploaded.

You can add values together to include more than one type of message. For example, 3 means that error messages (1) and warning messages (2) get uploaded. A value of 7 means that error messages (1), warning messages (2), and information messages (4) get uploaded. For those familiar with Linux permissions, this probably looks very familiar! 🙂

To monitor other windows event logs, you can create additional components in the JSON template. The value of “LogName” can be found by viewing the properties of the event log file, as shown below:

img_57c431da7dfb4

The next two components monitor the two logs that are relevant to the bootstrap process:

Once again, a lot of this is self explanatory. The “LogDirectoryPath” specifies the absolute directory path to the relevant log file, and the filter specifies the log filename to be monitored. The tricky thing here was getting the “TimeStampFormat” parameter correct. I used this article on MSDN plus trial and error to work this out. Additionally, it’s worth noting that cfn-init.log’s timestamp is the local time of the EC2 instance, while EC2ConfigLog.txt takes on UTC time. Getting this right ensures you have the correct timestamps in CloudWatch.

Next, we need to define the log groups in CloudWatch that will hold the log streams. I’ve got three separate Log Groups defined:

You’ll also notice that the Log Streams are named after the instance ID. Each instance that is launched will create a separate log stream in each log group that can be identified by its instance ID.

Finally, the flows are defined:

This section specifies which logs are assigned to which Log Group. I’ve put all the WindowsEventLogs in a single Log Group, as it’s easy to search based on the event log name. Not as easy to differentiate between cfn-init.log and EC2ConfigLog.txt entries, so I’ve split them out.

So how do we get this customised CloudWatch config file into place? My preferred method is to upload the file with the set-cloudwatch.ps1 script to a bucket in S3, then pull them down and run the PowerShell script as part of the bootstrap process. I’ve included a subset of my standard cloudformation template below, showing the monitoring config key that’s part of the ConfigSet.

What does this look like in the end? Here we can see the log groups specified have been created:

img_57c4d1ddcd273

If we drill further down into the cfninit-Log-Group, we can see the instance ID of the recently provisioned host. Finally, if we click on the Instance ID, we can see the cfn-init.log file in all its glory. Yippie!

img_57c4d5c4c3b89

Hummm, looks like my provisioning failed because a file is non-existent. Bootstrap monitoring has clearly served its purpose! Now all that’s left to do is to teardown the infrastructure, remediate the issue, and reprovision!

The next step to reducing the amount of repetitive tasks in the infrastructure as code development process is a custom pipeline to orchestrate the provisioning and teardown workflow… More on that in another blog!

AWS CloudFormation AWS::RDS::OptionGroup Unknown option: Mirroring

Amazon recently announced Multi-AZ support for SQL Server in Sydney, which provides high availability for SQL RDS instances using SQL Server mirroring technology. In an effort to make life simpler for myself, I figured I’d write a CloudFormation template for future provisioning requests, however it wasn’t as straight forward as I’d expected.

I began by trying to guess my way through the JSON resources, based on what I’d already knew for MySQL deployments. I figured the MultiAZ property was still relevant, so I hacked together a template and attempted to provision the stack, which failed, indicating the following error:

CREATE_FAILED        |  Invalid Parameter Combination: MultiAZ property cannot be used with SQL Server DB instances, use the Mirroring option in an option group associated with the DB instance instead.

The CloudFormation output clearly provides some guidance on the correct parameters required to enable mirroring in SQL Server. I had a bit of trouble tracking down documentation for the mirroring option, but after crawling the web for sample templates, I managed to put together the correct CloudFormation template, which can be seen below.

Excited and thinking I had finalised my template, I attempted to create the stack in ap-southeast-2 (Sydney, Australia), only for it to fail with a different error this time…

CREATE_FAILED        | Unknown option: Mirroring

Finding this output strange, I attempted to run the CloudFormation template in eu-west-1, which completed successfully.

With no options left, I decided to contact AWS Support who indicated that this is a known issue in the ap-southeast-2 region, which is also evident when attempting to create an option group in the GUI and the dropdown box is greyed out, as shown below.

What it should look like:

What it currently looks like:

The suggested workaround is to manually create the SQL RDS instance in the GUI which provides the mirroring option in the deployment wizard. Although the limitation is being treated with priority, there’s no ETA for resolution at the moment.

Hopefully this assists someone out there banging their head against the wall trying to get this to work!

Leveraging Cloud Storage for the Enterprise: Microsoft StorSimple – Part 1

Originally posted on Bobbie’s blog @ www.thecloudguy.info

It’s no secret that one of the biggest pain points for enterprises today is the rapid growth of unstructured data. The ability to manage, protect and archive an organisation’s most valuable assets is arguably one of the biggest strains on IT department budgets.

The advent of cloud technology has many organisations looking for a way to leverage Pay-as-You-Go cloud storage offerings to assist in the data life-cycle process. The difficulty with these offerings is that data is stored as objects rather than on file systems such as NFS and CIFS, meaning integration with existing business processes and solutions isn’t straight forward.

Cloud Storage Gateways

Cloud Storage Gateways resolve integration issues by bridging the gap between commonly used storage protocols such as iSCSI/CIFS/NFS and cloud-based object storage. Storage Gateways take the form of a network appliance, server or virtual machine that resides on-premises and typically provide storage capacity for frequently used data.

As data ages and is accessed less frequently, it is moved into cloud storage which can cost considerably less than traditional on-premises storage offerings. Additional features are integrated into cloud Storage gateways such as backup technology to protect large volumes that can no longer be protected using traditional means.

Microsoft StorSimple

Microsoft have a competitive hybrid cloud storage offering called Microsoft StorSimple that takes into consideration a wide range of existing business pain points such as backup and Disaster Recovery.

Microsoft StorSimple is a physical on-premises storage system that uses three tiers of storage: SSD, SAS, and cloud storage. A number of models are offered based on storage and performance requirements, however StorSimple’s ability to leverage cloud storage as a cold tier significantly reduces its on-premises footprint compared to other storage offerings.

Some of the main features of StorSimple include:

  • Storage tiering – StorSimple dynamically moves data between the tiers of disk based on how active data is, providing efficient data storage and maximum performance for heavily used data.
  • iSCSI protocol support – StorSimple volumes are provisioned as block storage, allowing them to be used for file, database, or hypervisor attached storage.
  • Encryption in-flight and at rest in the cloud.
  • High capacity – The largest StorSimple appliance can currently store up to 500TB of deduplicated data (including cloud storage) in eight rack units.
  • Snapshot backups – Traditionally, snapshots were not considered a reliable form of backup due to their reliance on the source physical storage appliance, however StorSimple snapshots are stored in geographically redundant storage accounts in Microsoft Azure, meaning six copies of data are stored across two geographically separate regions.
  • Single pane management – All StorSimple devices in an organisation, regardless of location, can be managed from the same interface in the Azure portal.
  • Near instant DR – In the event of a disaster, a backup can be cloned and mounted on a virtual or physical StorSimple device and brought online. Only a fraction of the volume needs to reside on the target StorSimple for the volume to come online.
  • Virtual StorSimple – Virtual StorSimple devices can be provisioned in Azure to provide DR and test capabilities for volumes that were previously, or are currently, hosted on-premises.
  • Deduplication and compression – Microsoft StorSimple is able to minimize disk footprint by using global deduplication and compression across on premise and cloud storage.
  • Highly available physical storage architecture with dual components, to prevent single point of failure.

StorSimple in Action

Azure Portal Dashboard and Device Management

All StorSimple devices are managed from the familiar Azure Portal. The sample below shows five StorSimple devices with four being virtual and residing in Azure, and one a physical StorSimple 8100 running on-premises.

StorSimple Manager in Azure Portal

Each device can be selected and configured via the Portal and shown below where I am managing the network interface configuration for a physical StorSimple 8100 device.

Firstly we have the configuration of the WAN interface which is used to communicate with the cloud storage in Azure:

Managing WAN interface for StorSimple

Secondly I can manage the iSCSI interface used to connect storage to local servers (note that the StorSimple 8000 series offers multiple redundant 10GbE iSCSI interfaces, however the lab used for this blog post only has 1GbE switching infrastructure)

Managing iSCSI interface for StorSimple

Storage Provisioning

Compared to traditional storage systems, provisioning storage is incredibly simple (no pun intended!) Once a device is selected, the administrative user navigates to the Volume Containers menu and selects to add a Container (shown below).

Provisioning Storage on StorSimple

A Volume Container is used to group Volumes that share a storage account, bandwidth, and encryption key.

Best practice suggests a geographically redundant storage account is used in Azure to ensure data is highly available in the event of a regional disaster. Bandwidth can be throttled to ensure the WAN link is not saturated when tiering to cloud storage. If cloud storage encryption is used, an encryption key must be specified.

Once confirmed, the Volume Container is created and appears in the list of Containers for the device as shown below.

Creating a Volume Container on a StorSimple

A Volume can then be added to the Container:

Adding a Volume to a Container on a StorSimple

Notice that a usage type of “Tiered Volume” or “Archive Volume” can be selected which allows the StorSimple appliance to better judge where to initially place the data that is being moved to the Volume.

This can be handy for organisations that are looking to migrate stale data they are required to keep for compliance purposes to the cloud. Also note the capacity specified is always thin provisioned.

After confirming the basic settings, iSCSI initiators (servers) are specified that are allowed to access the volume. Once this is completed, the volume appears in the Volume Container page.

New Volume in Container on StorSimple

The Volume can now be attached to a host via the iSCSI protocol. Assuming that iSCSI connectivity is already configured, we log onto the server and perform a scan for new disks, which discovers the Volume recently provisioned as highlighted below.

StorSimple Volume available to Windows Host

This Volume can now be brought online, initialised, be assigned a drive letter, and then function as a drive on the server as shown below. Pretty simple stuff!

StorSimple Volume mounted as drive on server

One of the biggest benefits of StorSimple is its ability to provision traditional block storage which can be leveraged by familiar operating systems. Many other platforms offer storage in NAS form, requiring administrators to learn and manage another platform.

Data Protection and Backup

Now that we’ve provisioned a file system, how do we protect that data? Snapshots are used to protect data on a StorSimple in an efficient and reliable manner. Due to global deduplication, snapshots consume minimal storage in the cloud while still providing reliable protection due to Azure’s geographically redundant storage.

StorSimple backup policies and data protection is configured in the Azure Portal. Administrators navigate to the backup policies section of the device and add a new policy. Multiple Volumes can be grouped together within a policy to take crash-consistent snapshots across the multiple volumes.

Defining a backup policy on a StorSimple

A schedule is then defined. A policy can only have one schedule, however multiple policies can be defined for a Volume.

For example: a daily backup policy can be used perform daily snapshots of a Volume and retain them for short periods of time, while a monthly backup policy can take a snapshot of the same Volume once a month and retain snapshots long term for compliance purposes.

Additionally, a snapshot can be stored on either local storage for rapid restores or cloud storage for resiliency.

Defining backup schedule on a StorSimple

Once the schedule is defined, it appears in the backup policies tab.

Backup policies tab for a StorSimple

Although the first cloud snapshot can take some time, as all Volume data that resides on-premises needs to be copied to the cloud, all subsequent snapshots are quick, as only the changed data is deduped and copied to cloud storage.

Below is a view from the backup catalog, after a backup is complete.

StorSimple backup catalog view

Well, that’s it from me for now! Stay tuned for part two, where I will dive deeper into disaster recovery scenarios, data mobility and performance monitoring.

Originally posted on Bobbie’s blog @ www.thecloudguy.info