Azure MFA: Architecture Selection Case Study

I’ve been working with a customer on designing a new Azure Multi Factor Authentication (MFA) service, replacing an existing 2FA (Two Factor Authentication) service based on RSA Authenticator version 7.

Now, typically Azure MFA service solutions in the past few years have been previously architected in the detail ie. a ‘bottom up’ approach to design – what apps are we enforcing MFA on? what token are we going to use? phone, SMS, smart phone app? Is it one way message, two way message? etc.

Typically a customer knew quite quickly which MFA ‘architecture’ was required – ie. the ‘cloud’ version of Azure MFA was really only capable of securing Azure Active Directory authenticated applications. The ‘on prem’ (local data centre or private cloud) version using Azure MFA Server (the server software Microsoft acquired in the PhoneFactor acquisition) was the version that was used to secure ‘on-prem’ directory integrated applications.  There wasn’t really a need to look at the ‘top down’ architecture.

In aid of a ‘bottom up’ detailed approach – my colleague Lucian posted a very handy ‘cheat sheet’ last year, in comparing the various architectures and the features they support which you can find here: https://blog.kloud.com.au/2016/06/03/azure-multi-factor-authentication-mfa-cheat-sheet

 

New Azure MFA ‘Cloud Service’ Features

In the last few months however, Microsoft have been bulking up the Azure MFA ‘cloud’ option with new integration support for on-premise AD FS (provided with Windows Server 2016) and now on-premise Radius applications (with the recent announcement of the ‘public preview’ of the NPS Extension last month).

(On a side note: what is also interesting, and which potentially reveals wider trends on token ‘popularity’ selection choices, is that the Azure ‘Cloud’ option still does not support OATH (ie. third party tokens) or two-way SMS options (ie. reply with a ‘Y’ to authenticate)).

These new features have therefore forced the consideration of the primarily ‘cloud service’ architecture for both Radius and AD FS ‘on prem’ apps.

 

“It’s all about the Apps”

Now, in my experience, many organizations share application architectures they like to secure with multi factor authentication options.  They broadly fit into the following categories:

1. Network Gateway Applications that use Radius or SDI authentication protocols, such as network VPN clients and application presentation virtualisation technologies such as Citrix and Remote App

2. SaaS Applications that choose to use local directory credentials (such as Active Directory) using Federation technologies such as AD FS (which support SAML or WS-Federation protocols), and

3. SaaS applications that use remote (or ‘cloud’) directory credentials for authentication such as Azure Active Directory.

Applications that are traditionally accessed via only the corporate network are being phased out for ones that exist either purely in the Cloud (SaaS) or exist in a hybrid ‘on-prem’ / ‘cloud’ architecture.

These newer application architectures allow access methods from untrusted networks (read: the Internet) and therefore these access points also apply to trusted (read: corporate workstations or ‘Standard Operating Environment’) and untrusted (read: BYOD or ‘nefarious’) devices.

In order to secure these newer points of access, 2FA or MFA solution architectures have had to adapt (or die).

What hasn’t changed however is that a customer when reviewing their 2FA or MFA choice of vendors will always want to choose a low number of MFA vendors (read: one), and expects that MFA provider to support all of their applications.  This keeps user training cost low and operational costs low.   Many are also fed up dealing with ‘point solutions’ ie. securing only one or two applications and requiring a different 2FA or MFA solution per application.

 

Customer Case Study

So in light of that background, this section now goes through the requirements in detail to really ‘flush’ out all the detail before making the right architectural decision.

 

Vendor Selection

This was taken place prior to my working with our customer, however it was agreed that Azure MFA and Microsoft were the ‘right’ vendor to replace RSA primarily based on:

  • EMS (Enterprise Mobility + Security) licensing was in place, therefore the customer could take advantage of Azure Premium licensing for their user base.  Azure Premium meant we would use the ‘Per User’ charge model for Azure MFA (and not the other choice of ‘Per Authentication’ charge model ie. being charged for each Azure MFA token delivered).
  • Tight integration with existing Microsoft services including Office 365, local Active Directory and AD FS authentication services.
  • Re-use of strong IT department skills in the use of Azure AD features.

 

 Step 1: App Requirements Gathering

The customer I’ve been working with has two ‘types’ of applications:

1. Network Gateway Applications – Cisco VPN using an ASA appliance and SDI protocol, and Citrix NetScaler using Radius protocol.

2. SaaS Applications using local Directory (AD) credentials via the use of AD FS (on Server 2008 currently migrating to Server 2012 R2) using both SAML & WS-Federation protocols.

They wanted a service that could replace their RSA service that integrated with their VPN & Citrix services, but also ‘extend’ that solution to integrate with AD FS as well.   The currently don’t use 2FA or MFA with their AD FS authenticated applications (which includes Office 365).

They did not want to extend 2FA services to Office 365 primarily as that would incur the use of static ‘app passwords’ for their Outlook 2010 desktop version.

 

Step 2:  User Service Migration Requirements

The move from RSA to Azure MFA was going to involve the flowing changes as well to the way users used two factor services:

  1. Retire the use of ‘physical’ RSA tokens but preserve a similar smart phone ‘soft token’ delivery capability
  2. Support two ‘token’ options going forward:  ‘soft token’ ie. use of a smart phone application or SMS received tokens
  3. Modify some applications to use the local AD password instead of the RSA ‘PIN’ as a ‘what you know’ factor
  4. Avoid the IT Service Desk for ‘soft token’ registration.  RSA required the supply of a static number to the Service Desk who would then enable the service per that user.  Azure MFA uses a ‘rotating number’ for ‘soft token’ registrations (using the Microsoft Authenticator application).  This process can only be performed on the smart phone itself.

So mapping out these requirements, I then had to find the correct architecture that met their requirements (in light of the new ‘Cloud’ Azure MFA features):

 

Step 3: Choosing the right Azure MFA architecture

I therefore had a unique situation, whereby I had to present an architectural selection – whether to use the Azure MFA on premise Server solution, or Azure MFA Cloud services.  Now, both services technically use the Azure MFA ‘Cloud’ to deliver the tokens, but the sake of simplicity, it boils down to two choices:

  1. Keep the service “mostly” on premise (Solution #1), or
  2. Keep the service “mostly” ‘in the cloud’ (Solution #2)

The next section goes through the ‘on-premise’ and ‘cloud’ requirements of both options, including specific requirements that came out of a solution workshop.

 

Solution Option #1 – Keep it ‘On Prem’

New on-premise server hardware and services required:

  • One or two Azure MFA Servers on Windows Server integrating with local (or remote) NPS services, which performs Radius authentication for three customer applications
  • On-premise database storing user token selection preferences and mobile phone number storage requiring backup and restoration procedures
  • One or two Windows Server (IIS) hosted web servers hosting the Azure MFA User Portal and Mobile App web service
  • Use of existing reverse proxy publishing capability of the user portal and mobile app web services to the Internet under an a custom web site FQDN.  This published mobile app website is used for Microsoft Authenticator mobile app registrations and potential user self-selection of factor e.g. choosing between SMS & mobile app for example.
New Azure MFA Cloud services required:
  • User using Azure MFA services must be in local Active Directory as well as Azure Active Directory
  • Azure MFA Premium license assigned to user account stored in Azure Active Directory

Option1.png

Advantages:

  • If future requirements dictate Office 365 services to use MFA, then ADFS version 3 (Windows Server 2012) directly integrates with on premise Server MFA.  Only AD FS version 4 (Windows Server 2016) has capability in integrating directly with the cloud based Azure MFA.
  • The ability to allow all MFA integrated authentications through in case Internet services (HTTPS) to Azure cloud are unavailable.  This is configurable with the ‘Approve’ setting for the Azure MFA server setting: “when Internet is not accessible:”

 

Disadvantages:

  • On-premise MFA Servers requiring uptime & maintenance (such as patching etc.)
  • Have to host on-premise Azure website and publish to the Internet under existing customer capability for user self service (if required).  This includes on-premise IIS web servers to host mobile app registration and user factor selection options (choosing between SMS and mobile app etc.)
  • Disaster Recovery planning and implementation to protect the local Azure MFA Servers database for user token selection and mobile phone number storage (although mobile phone numbers can be retrieved from local Active Directory as an import, assuming they are present and accurate).
  • SSL certificates used to secure the on-premise Azure self-service portal are required to be already supported by mobile devices such as Android and Apple. Android devices for example, do not support installing custom certificates and requires using an SSL certificate from an already trusted vendor (such as THAWTE)

 

Solution Option #2 – Go the ‘Cloud’!

New on-prem server hardware and services required:

  • One or two Windows Servers hosting local NPS services which performs Radius authentication for three customer applications.  These can be existing available Windows Servers not utilizing local NPS services for Radius authentication but hosting other software (assuming they also fit the requirements for security and network location)
  • New Windows Server 2016 server farm operating ADFS version 4, replacing the existing ADFS v3 farm.

New Azure MFA Cloud services required:

  • User using Azure MFA services must be in local Active Directory as well as Azure Active Directory
  • User token selection preferences and mobile phone number storage stored in Azure Active Directory cloud directory
  • Azure MFA Premium license assigned to user account stored in Azure Active Directory
  • Use of Azure hosted website: ‘myapps.microsoft.com’ for Microsoft Authenticator mobile app registrations and potential user self selection of factor e.g. choosing between SMS & mobile app for example.
  • Configuring Azure MFA policies to avoid enabling MFA for other Azure hosted services such as Office 365.

Option2.png

Advantages:

  • All MFA services are public cloud based with little maintenance required from the customer’s IT department apart from uptime for on-premise NPS servers and AD FS servers (which they’re currently already doing)
  • Potential to reuse existing Windows NPS server infrastructure (would have to review existing RSA Radius servers for compatibility with Azure MFA plug in, i.e. Windows Server versions, cutover plans)
  • Azure MFA user self-service portal (for users to register their own Microsoft soft token) is hosted in cloud, requiring no on-premise web servers, certificates or reverse proxy infrastructure.
  • No local disaster recovery planning and configuration required. NPS services are stateless apart from IP addressing configurations.   User information token selections and mobile phone numbers stored in Azure Active Directory with inherent recovery options.

 

Disadvantages:

  • Does not support AD FS version 3 (Windows Server 2012) for future MFA integration with AD FS SaaS enabled apps such as Office 365 or other third party applications (i.e. those that uses AD FS so users can use local AD authentication credentials). These applications require AD FS version 4 (Windows Server 2016) which supports the Azure MFA extension (similar to the NPS extension for Radius)
  • The Radius NPS extension and the Windows AD FS 2016 Azure MFA integration do not currently support the ability to approve authentications should the Internet go offline to the Azure cloud i.e. cannot reach the Azure MFA service across HTTPS however this may be because….
  • The Radius NPS extension is still in ‘public preview’.  Support from Microsoft at this time is limited if there are any issues with it.  It is expected that this NPS extension will go into general release shortly however.

 

Conclusion and Architecture Selection

After the workshop, it was generally agreed that Option #2 fit the customer’s on-going IT strategic direction of “Cloud First”.

It was agreed that the key was replacing the existing RSA service integrating with Radius protocol applications in the short term, with AD FS integration viewed as very much ‘optional’ at this stage in light of Office 365 not viewed as requiring two factor services (at this stage).

This meant that AD FS services were not going to be upgraded to Windows Server 2016 to allow integration with Option #2 services (particularly in light of the current upgrade to Windows Server 2012 wanting to be completed first).

The decision was to take Option #2 into the detailed design stage, and I’m sure to post future blog posts particularly into any production ‘gotchas’ in regards to the Radius NPS extension for Azure MFA.

During the workshop, the customer was still deciding whether to allow a user to select their own token ‘type’ but agreed that they wanted to limit it if they did to only three choices: one way SMS (code delivered via SMS), phone call (ie. push ‘pound to continue’) or the use of the Microsoft Authenticator app.   Since these features are available in both architectures (albeit with different UX), this wasn’t a factor in the architecture choice.

The limitation for Option #2 currently around the lack of automatically approving authentications in case the Internet service ‘went down’ was disappointing to the customer, however at this stage it was going to be managed with an ‘outage process’ in case they lost their Internet service. The workaround to have a second NPS server without the Azure MFA extension was going to be considered as part of that process in the detailed design phase.

 

 

 

 

 

 

 

 

 

Introduction to MIM Advanced Workflows with MIMWAL

Introduction

Microsoft late last year introduced the ‘MIMWAL’, or to say it in full: (inhales) ‘Microsoft Identity Manager Workflow Activity Library’ – an open source project that extends the default workflows & functions that come with MIM.

Personally I’ve been using a version of MIMWAL for a number of years, as have my colleagues, in working on MIM projects with Microsoft Consulting.   This is the first time however it’s been available publicly to all MIM customers, so I thought it’d be a good idea to introduce how to source it, install it and work with it.

Microsoft (I believe for legal reasons) don’t host a compiled version of MIMWAL, instead host the source code on GitHub for customers to source, compile and potentially extend. The front page to Microsoft’s MIMWAL GitHub library can be found here: http://microsoft.github.io/MIMWAL/

Compile and Deploy

Now, the official deployment page is fine (github) but I personally found Matthew’s blog to be an excellent process to use (ithinkthereforeidam.com).  Ordinarily, when it comes to installing complex software, I usually combine multiple public and private sources and write my own process but this blog is so well done I couldn’t fault it.

…however, some minor notes and comments about the overall process:

  • I found that I needed to copy the gacutil.exe and sn.exe utilities you extract from the old FIM patch in the ‘Solution Output’ folder.  The process mentions they need to be in the ‘src\Scripts’ (Step 6), but they need to be in the ‘Solution Output’ folder as well, which you can see in the last screenshot of that Explorer folder in Step 8 (of process: Configure Build/Developer Computer).
  • I found the slowest tasks in the entire process was sourcing and installing Visual Studio, and extracting the required FIM files from the patch download.  I’d suggest keeping a saved Windows Server VM somewhere once you’ve completed these tasks so you don’t have to repeat them in case you want to compile the latest version of MIMWAL in the future (preferably with MIM installed so you can perform the verification as well).
  • Be sure to download the ‘AMD 64’ version of the FIM patch file if you’re installing MIMWAL onto a Windows Server 64-bit O/S (which pretty much everyone is).  I had forgotten that old 64 bit patches used to be titled after the AMD 64-bit chipset, and I instead wasted time looking for the newer ‘x64’ title of the patch which doesn’t exist for this FIM patch.

 

‘Bread and Butter’ MIMWAL Workflows

I’ll go through two examples of MIMWAL based Action Workflows here that I use for almost every FIM/MIM implementation.

These action workflows have been part of previous versions of the Workflow Activity Library, and you can find them in the MIMWAL Action Workflow templates:

I’ll now run through real world examples in using both Workflow templates.

 

Update Resource Workflow

The Update Resource MIMWAL action workflow I use all the time to link two different objects together – many times linking a user object with a new and custom ‘location’ object.

For new users, I execute this MIMWAL workflow when a user first ‘Transitions In’ to a Set whose dynamic membership is “User has Location Code”.

For users changing location, I also execute this workflow use a Request-based MPR of the Synchronization Engine changing the “Location Code” for a user.

This workflow looks like the following:

location1

The XPath Filter is:  /Location[LocationCode = ‘[//Target/LocationCode]’]

When you target the Workflow at the User object, it will use the Location Code stored in the User object to find the equivalent Location object and store it in a temporary ‘Query’ object (referenced by calling [//Queries]):

Location2.jpg

The full value expression used above, for example, sending the value of the ‘City’ attribute stored in the Location object into the User object is:

IIF(IsPresent([//Queries/Location/City]),[//Queries/Location/City],Null())

This custom expression determines if there is a value stored in the ‘[//Queries]’ object (ie. a copy of the Location object found early in the query), and if there is a value, then send it to the City attribute of the user object ie. the ‘target’ of the Workflow.  If there is no value, it will send a ‘null’ value to wipe out the existing value (in case a user changes location, but the new location doesn’t have a value for one of the attributes).

It is also a good idea (not seen in this example) to send the Location’s Location Code to the User object and store it in a ‘Reference’ attribute (‘LocationReference’).  That way in future, you can directly access the Location object attributes via the User object using an example XPath:  [//Person/LocationReference/City].

 

Generate Unique Value from AD (e.g. for sAMAccountName, CN, mailnickname)

I’ve previously worked in complex Active Directory and Exchange environments, where there can often be a lot of conflict when it comes to the following attributes:

  • sAMAccountName (used progressively less and less these days)
  • User Principal Name (used progressively more and more these days, although communicated to the end user as ’email address’)
  • CN (or ‘container’ value, which forms part of the LDAP Distinguished Name (DN) value.  Side note: the most commonly mistaken attribute for admins who think this is the ‘Display Name’ when they view it in AD Users & Computers.
  • Mailnickname (used by some Exchange environments to generate a primary SMTP address or ‘mail’ attribute values)

All AD environments require a unique sAMAccountName (otherwise you’ll get a MIM export error into AD if there’s already an account with it) for any AD account to be created.  It will also require a unique CN value in the same OU as other objects, otherwise the object cannot be created.  Unique CN values are generally required to be unique if you export all user accounts for a large organization to the same OU where there is a greater chance for a conflict happening.

UPNs are generally unique if you copy a person’s email address, but sometimes not – sometimes it’s best to combine a unique mailnickname, append a suffix and send that value to the UPN value.  Again, it depends on the structure and naming of your AD, and the applications that integrate with it (Exchange, Office 365 etc.).

Note: the default MIMWAL Generate Unique Value template assumes the FIM Service account has the permissions required to perform LDAP lookups against the LDAP path you specify.  There are ways to enhance the MIMWAL to add in an authentication username/password field in case there is an ‘air gap’ between the FIM server’s joined domain and the target AD you’re querying (a future blog post).

In this example in using the ‘Generate Unique Value’ MIMWAL workflow, I tend to execute as part of a multi-step workflow, such as the one below (Step 2 of 3):sam1

I use the workflow to generate a query of the LDAP to look for existing accounts, and then send that value to the [//Workflowdata/AccountName] attribute.

The LDAP filter used in this example looks at all existing sAMAccountNames across the entire domain to look for an existing account:   (&(objectClass=user)(objectCategory=person)(sAMAccountName=[//Value]))

The workflow will also query the FIM Service database for existing user accounts (that may not have been provisioned yet to AD) using the XPath filter:  /Person[AccountName = ‘[//Value]’]

The Uniqueness Key Seed in this example is ‘2’, which essentially means that if you cannot resolve a conflict with using other attribute values (such as a user’s middle name, or using more letters of a first or last name) then you can use this ‘seed’ number to break the conflict as a last resort.  This number increments by 1 for each confict, so if there’s a ‘michael.pearn’, and a ‘michael.pearn2’ for example, the next one to test will be ‘michael.pearn3’ etc etc.

sam2

The second half of the workflow shows the rules to use to generate sAMAccountName values, and the rules in order in which to break the conflict.  In this example (which is a very simple example), I use an employee’s ‘ID number’ to generate an AD account.  If there is already an account for that ID number, then this workflow will generate a new account with the string ‘-2’ added to the end of it:

Value Expression 1 (highest priority): NormalizeString([//Target/EmployeeID])

Value Expression 2 (lowest priority):  NormalizeString([//Target/EmployeeID] + “-” + [//UniquenessKey])

NOTE: The function ‘NormalizeString’ is a new MIMWAL function that is also used to strip out any diacritics character out.  More information can be found here: https://github.com/Microsoft/MIMWAL/wiki/NormalizeString-Function

sam3

Microsoft have posted other examples of Value Expressions to use that you could follow here: https://github.com/Microsoft/MIMWAL/wiki/Generate-Unique-Value-Activity

My preference is to use as many value expressions as you can to break the conflict before having to use the uniqueness key.  Note: the sAMAccountName has a default 20 character limit, so often the ‘left’ function is used to trim the number of characters you take from a person’s name e.g. ‘left 8 characters’ of a person’s first name, combined with ‘left 11 characters’ of a person’s last name (and not forgetting to save a character for the seed value deadlock breaker!).

Once the Workflow step is executed, I then send the value to the AD Sync Rule (using [//WorkflowData/AccountName] to then pass to the outbound ‘AccountName –> sAMAccountName’ outbound AD rule flow:

sam4

 

More ideas for using MIMWAL

In my research on MIMWAL, I’ve found some very useful links to sample complex workflow chains that use the MIMWAL ‘building block’ action workflows and combine them to do complex tasks.

Some of those ideas can be found here by some of Microsoft’s own MSDN: https://blogs.msdn.microsoft.com/connector_space/2016/01/15/the-mimwal-custom-workflow-activity-library/

These include:

  • Create Employee IDs
  • Create Home Directories
  • Create Admin Accounts

I particularly like the idea of using the ‘Create Employee ID’ example workflow, something that I’ve only previously done outside of FIM/MIM, for example with a SQL Trigger that updates a SQL database with a unique number.

 

 

Setting up your SP 2013 Web App for MIM SP1 & Kerberos SSO

I confess: getting a Microsoft product based website working with Kerberos and Single Sign On (i.e. without authentication prompts from a domain joined workstation or server) feels somewhat of a ‘black art’ for me.

I’m generally ok with registering SPNs, SSLs, working with load balancing IPs etc, but when it comes to the final Internet Explorer test, and it fails and I see an NTLM style auth. prompt, it’s enough to send me into a deep rage (or depression or both).

So, recently, I’ve had a chance to review the latest guidance on getting the Microsoft Identity Manager (MIM) SP1 Portal setup on Windows Server 2012 R2 and SharePoint Foundation 2013 SP1 for both of the following customer requirements:

  • SSL (port 443)
  • Single Sign On from domain joined workstations / servers

The official MIM guidance here is a good place to start if you’re building out a lab (https://docs.microsoft.com/en-us/microsoft-identity-manager/deploy-use/prepare-server-sharepoint).  There’s a major flaw however in this guidance for SSL & Kerberos SSO – it’ll work, but you’ll still get your NTLM style auth. prompt should you configure the SharePoint Web Application initially under port 82 (if you’re following this guidance strictly like I did) and then in the words of this article: “Initially, SSL will not be configured. Be sure to configure SSL or equivalent before enabling access to this portal.”

Unfortunately, this article doesn’t elaborate on how to configure Kerberos and SSL post FIM portal installation, and to then get SSO working across it.

To further my understanding of the root cause, I built out two MIM servers in the same AD:

  • MIM server #1 FIM portal installed onto the Web Application on port 82, with SSL configured post installation with SSL bindings in IIS Manager and a new ‘Intranet’ Alternate Access Mapping configured in the SharePoint Central Administration
  • MIM server #2, FIM portal installed onto the Web Application built on port 443 (no Alternate Access Paths specified) and SSL bindings configured in IIS Manager.

After completion, I found MIM Server #1 was working with Kerberos and SSO under port 82, but each time I accessed it using the SSL URL I configured post installation, I would get the NTLM style auth. prompt regardless of workstation or server used to access it.

With MIM server #2, I built the web application purely into port 443 using this command:

New-SpWebApplication -Name “MIM Portal” -ApplicationPool “MIMAppPool” -ApplicationPoolAccount $dbManagedAccount -AuthenticationMethod “Kerberos” -SecureSocketsLayer:$true -Port 443 -URL https://<snip>.mimportal.com.au

Untitled.jpg

The key switches are:

  • -SecureSocketsLayer:$true
  • -Port 443
  • -URL (with URL starting with https://)

I then configured SSL after this SharePoint Web Application command in IIS Manager with a binding similar to this:

ssl1

A crucial way to see if it’s configured properly is to test the MIM Portal FQDN (without the /identitymanagement specification) you’re intending to use after you configure SharePoint Web Application and bind the SSL certificate in IIS Manager but BEFORE you install the FIM Service and Portal.

So in summary test this:

Verify it working with SSO, then install the FIM Portal to get this URL working:

The first test should appear as a generic ‘Team Site’ in your browser without authentication prompt from a domain joined workstation or server if it’s working correctly.

The other item to take note is that I’ve seen other guidance that this won’t work from a browser locally on the MIM server – something that I haven’t seen in any of my tests.  All test results that I’ve seen are consistent with using a browser from any domain joined workstation, remote domain joined server or the domain joined MIM server itself.  There’s no difference in results in terms of SSO in my opinion.   Be sure to add the MIM portal to the ‘Intranet’ site as well for you testing.

Also, I never had to configure ‘Require Kerberos = True’ for the Web Config that used to be part of the guidance for FIM and previous versions of SharePoint.  This might work as well, but wouldn’t explain the port 82/443 differences for MIM Server #1 (ie. why would that work for 443 and not 82? etc.)

I’ve seen other MIM expert peers configure their MIM sites using custom PowerShell installations of SharePoint Foundation to configure the MIM portal under port 80 (overwriting the default SharePoint Foundation 2013 taking over port 80 during it’s wizard based installation).  I’m sure that might be a valid strategy as well, and SSO may then work as well with SSL with further configuration, but I personally can’t attest to that working.

Good luck!

 

 

 

 

 

Avoiding Windows service accounts with static passwords using GMSAs

One of the benefits of an Active Directory (AD) running with only Windows Server 2012 domain controllers is the use of ‘Group Managed Service Accounts’ (GMSAs).

GMSAs can essentially execute applications and services similar to an Active Directory user account running as a ‘service account’.  GMSAs store their 120 character length passwords using the Key Distribution Service (KDS) on Windows Server 2012 DCs and periodically refresh these passwords for extra security (and that refresh time is configurable).

This essentially provides the following benefits:

  1. Eliminates the need for Administrators to store static service accounts passwords in a ‘password vault’
  2. Increased security as the password is refreshed automatically and that refresh interval is configurable (you can tell it to refresh the password every day if you want to.
  3. The password is not known even to administrators so there is no chance for  attackers to try to hijack the GMSA account and ‘hide their tracks’ by logging in as that account on other Windows Servers or applications
  4. An extremely long character password which would require a lot of computing power & time to break

There is still overhead in using GMSA versus a traditional AD user account:

  1. Not all applications or services support GMSA so if the application does not document their supportability, then you will need to test their use in a lab
  2. Increase overhead in the upfront configuration getting them working and testing versus a simple AD user account creation
  3. GMSA bugs (see Appendix)

I recently had some time to develop & run a PowerShell script under Task Scheduler, but I wanted to use GMSA to run the job under a service account whose password would not be known to any administrator and would refresh automatically (every 30 days or so).

There are quite a few blogs out there on GMSA, including this excellent PFE blog from MS from 2012 and the official TechNet library.

My blog is really a ‘beginners guide’ to GMSA in working with it in a simple Task Scheduler scenario.  I had some interesting learnings using GMSA for the first time that I thought would prove useful, plus some sample commands in other blogs are not 100% accurate.

This blog will run through the following steps:

  1. Create a GMSA and link it to two Windows Servers
  2. ‘Install’ the GMSA on the Windows Servers and test it
  3. Create a Task Schedule job and have it execute it under the GMSA
  4. Execute a GMSA refresh password and verify Task Schedule job will still execute

An appendix at the end will briefly discuss issues I’m still having though running a GMSA in conjunction with an Active Directory security group (i.e. using an AD Group instead of direct server memberships to the GMSA object).

A GMSA essentially shares many attributes with a computer account in Active Directory, but it still operates as a distinct AD class object.   Therefore, its use is still quite limited to a handful of Windows applications and services.   It seems the following apps and services can run under GMSA but I’d first check and test to ensure you can run it under GMSA:

  • A Windows Service
  • An IIS Application Pool
  • SQL 2012
  • ADFS 3.0 (although the creation and use of GMSA using ADFS 3.0 is quite ‘wizard driven’ and invisible to admins)
  • Task Scheduler jobs

This blog will create a GMSA manually, and allow two Windows Servers to retrieve the password to that single GMSA and use it to operate two Task Schedule jobs, one per each server.

Step 1: Create your KDS root key & Prep Environment

A KDS root key is required to work with GMSA.  If you’re in a shared lab, this may already have been generated.  You can check with the PowerShell command (run under ‘Run As Administrator’ with Domain Admin rights):

Get-KDSrootkey

If you get output similar to the following, you may skip this step for the entire forest:

pic1.JPG

If there is no KDS root key present (or it has expired), the command to create the KDS root key for the entire AD forest (of which all GMSA derive their passwords from) is as follows:

Add-KDSRootKey –EffectiveImmediately

The ‘EffectiveImmediately’ switch is documented that may need to wait up to 10 hours for it to take effect to take into account Domain Controller replication. however you can speed up the process (if you’re in a lab) by following this link.

The next few steps will assume you have the following configured:

  • Domain Admins rights
  • PowerShell loaded with ‘Run as Administrator’
  • Active Directory PowerShell module loaded with command:
    • import-module activedirectory

 

Step 2: Create a GMSA and link it to two (or more) Windows Servers

This step creates the GMSA object in AD, and links two Windows Servers to be able to retrieve (and therefore login) as that GMSA on those servers to execute the Task Schedule job.

The following commands will :

$server1 = Get-ADComputer <Server1 NETBIOS name>

$server2 = Get-ADComputer <Server2 NETBIOS name>

New-ADServiceAccount -name gmsa-pwdexpiry -DNSHostName gmsa-pwdexpiry.domain.lab -PrincipalsAllowedToRetrieveManagedPassword $server1,$server2

Get-ADServiceAccount -Identity gmsa-pwdexpiry -Properties PrincipalsAllowedToRetrieveManagedPassword

You should get an output similar to the following:

pic3

The important verification step is to ensure the ‘PrincipalsAllowed…’ value contains all LDAP paths to the Windows Servers who wish to use the GMSA (the ones specified as variables).

The GMSA object will get added by default to the ‘Managed Service Accounts’ container object in the root of the domain (unless you specify the ‘-path’ switch to tell it to install it to a custom OU).

pic4.JPG

Notes:

  1. To reiterate, many blogs point out that you can use the switch: ‘PrincipalsAllowedToRetrieveManagedPassword’ (almost the longest switch name I’ve ever encountered!) to specify an ‘AD group name’.   I’m having issues with using that switch to specify an and work with an AD group instead of direct computer account memberships to the GMSA.   I run through those issues in the Appendix.
  2. A lot of blogs just state you can just specify the server NETBIOS names for the ‘principals’ switch, however I’ve found you need to first retrieve the AD objects first using the ‘get-ADcomputeraccount’ commands
  3. I did not specify a Service Principal Name (SPN) as my Task Scheduler job does not require one, however be sure to do so if you’re executing an application or service requiring one
  4. I accepted the default password refresh interval of 30 days without specifying a custom password refresh interval (viewable in the attribute value: ‘msDS-ManagedPasswordInterval’).  Custom refresh intervals can only be specified during GMSA creation from what I’ve read (a topic for a future blog!).
  5. Be sure to specify a ‘comma’ between the two computer account variables without a space

OPTIONAL Step 2A: Add or Removing Computers to the GMSA

If you’ve created the GMSA but forgot to add a server account, then to modify the server computer account membership of a GMSA, I found the guidance from MS a little confusing. In my testing I found you cannot really add or remove individual computers to the GMSA without re-adding every computer back into the membership list.

You can use this command to update an existing GMSA, but you will still need to specify EVERY computer that should be able to retrieve the password for that GMSA.

For example, if I wanted to add a third server to use the GMSA I would still need to re-add all the existing servers using the ‘Set-ADServiceAccount’ command:

$server1 = Get-ADComputer <Server1 NETBIOS name>

$server2 = Get-ADComputer <Server2 NETBIOS name>

$server3 = Get-ADComputer <Server3 NETBIOS name>

Set-ADServiceAccount -Identity gmsa-pwdexpiry -PrincipalsAllowedToRetrieveManagedPassword $server1,$server2,$server3

(Also another reason why I want to work with an AD group used instead!)

Step 3: ‘Install’ the Service Account

According to Microsoft TechNet, the ‘Install-ADServiceAccount’ “makes the required changes locally that the service account password can be periodically reset by the computer”.

I’m not 100% sure what these changes are local to the Windows Server, but after you run the command, the Windows Server will have permission to reset the password to the GMSA.

You run this command on a Windows Server (who should already be in the list of ‘PrincipalsAllowed…’ computer stored in the GMSA):

Install-ADServiceAccount gsma-pwdexpiry

pic5

After you run this command, verify that both the ‘PrincipalsAllowed…’ switch and ‘Install’ commands are properly configured for this Windows Server:

Test-ADServiceAccount gsma-pwdexpiry

pic6.JPG

A value of ‘True’ for the Test command means that this server can now use the GMSA to execute the Task Scheduler.  A value of ‘False’ means that either the Windows Server was not added to the ‘Principals’ list (using either ‘New-ADServiceAccount’ or ‘Set-ADServiceAccount’) or the ‘Install-ADServiceAccount’ command did not execute properly.

Finally, in order to execute Task Scheduler jobs, be sure also to add the GSMA to the local security policy (or GPO) to be assigned the right: ‘Log on as batch job’:

pic7.JPG

pic8.JPG

Without this last step, the GMSA account will properly login to the Windows Server but the Task Scheduler job will not execute as the GMSA will not have the permission to do so.  If the Windows Server is a Domain Controller, then you will need to use a GPO (either ‘Default Domain Controller’ GPO or a new GPO).

Step 4:  Create the Task Schedule Job to run under GMSA

Windows Task Scheduler (at least on Windows Server 2012) does not allow you to specify a GMSA using the GUI.  Instead, you have to create the Task Schedule job using PowerShell.  The password prompt when you create the job using the GUI will ask you to specify a password when you go to save it (which you will never have!)

The following four commands will instead create the Task Schedule job to execute an example PowerShell script and specifies the GMSA object to run under (using the $principal object):

$action = New-ScheduledTaskAction powershell.exe -Argument “-file c:\Scripts\Script.ps1” -WorkingDirectory “C:\WINDOWS\system32\WindowsPowerShell\v1.0”

$trigger = New-ScheduledTaskTrigger -At 12:00 -Daily

$principal = New-ScheduledTaskPrincipal -UserID domain.lab\gmsa-pwdexpiry$ -LogonType Password -RunLevel highest

Register-ScheduledTask myAdminTask –Action $action –Trigger $trigger –Principal $principal

pic9

pic2

Note:

  1. Be sure to replace the ‘domain.lab’ with the FQDN of your domain and other variables such as script path & name
  2. It’s optional to use the switch: ‘-RunLevel highest’.  This just sets the job to ‘Run with highest privileges’.
  3. Be sure to specify a ‘$’ symbol after the GMSA name for the ‘-UserID’.  I also had to specify the FQDN instead of the NETBIOS for the domain name as well.

Step 5: Kick the tyres! (aka test test test)

Yes, when you’re using GMSA you need to be confident that you’re leaving something that is going to work even when the password expires.

Some common task that I like to perform to verify the GMSA is running include:

Force the GMSA to password change:

You can force the GMSA to reset it’s password by running the command:

Reset-ADServiceAccountPassword gmsa-pwdexpiry

You can then verify the time date of the last password set by running the command:

Get-ADServiceAccount -Identity gmsa-pwdexpiry -Properties passwordlastset

The value will be next to the ‘PasswordLastSet’ field:

pic10

After forcing a password reset, I would initiate a Task Schedule job execution and be sure that it operates without failure.

Verify Last Login Time

You can also verify that the GMSA is logging in properly to the server by checking the ‘Last Login value’:

Get-ADServiceAccount -Identity gmsa-pwdexpiry -Properties LastLogonDate

 View all Properties

Finally, if you’re curious as to what else that object stores, then this is the best method to review all values of the GMSA:

 Get-ADServiceAccount -Identity gmsa-pwdexpiry -Properties *

I would not recommend using ADSIEdit to review most GMSA attributes as I find that GUI is limited in showing the correct values for those objects, e.g. this is what happens when you view the ‘principals…’ value using ADSIEdit (called msDS-GroupMSAMembership in ADSI):

pic11.JPG

Appendix:  Why can’t I use an AD group with the switch: PrincipalsAllowedTo..?

Simply: you can! Just a word of warning.  I’ve been having intermittent issues in my lab with using AD groups.   I decided to base my blog purely on direct computer account memberships directly to the GMSA as I’ve not had an issue with that approach.

If find that the commands: ‘Install-ADServiceAccount’ and ‘Test-ADServiceAccount’ sometimes fails when I use group memberships.  Feel free to readily try however, it may due to issues in my lab.  In preparing this lab, I could not provide the screen shot of the issues as they’d mysteriously resolved themselves overnight (the worst kind of bug, an intermittent one!)

You can easily run the command to create a GMSA with a security group membership (e.g. ‘pwdexpiry’) as the sole ‘PrincipalsAllowed…’ object:

pic12.JPG

Then use try running the ‘Install-ADServiceAccount’ and ‘Test-ADServiceAccount’ on the Windows Servers whose computer accounts are members of that group.

Good luck!

Michael

 

 

 

 

 

 

 

 

Filtering images across a custom FIM / MIM ECMA import MA

A recent customer had a special request when I was designing and coding a new ECMA 2.2 based Management Agent (MA) or “Connector” for Microsoft Forefront Identity Manager (FIM).

(On a sidenote: FIM’s latest release is now Microsoft Identity Manager or “MIM”, but my customer hadn’t upgraded to the latest version).

Kloud previously were engaged to write a new ECMA based MA for Gallagher 7.5 (a door security card system) to facilitate the provisioning of access and removal of access tied to an HR system.

Whilst the majority of the ECMA was ‘export’ based, ie. FIM controlled most of the Gallagher data, one of the attributes we were importing back from this security card system was a person’s picture that was printed on these cards.

Requirements

It seems that in the early days of the Gallagher system (maybe before digital cameras were invented?), they used to upload a static logo (similar to a WiFi symbol) in place of a person’s face.  It was only recently they changed their internal processes to upload the actual profile picture of someone rather than this logo.

The system has been upgraded a number of times, but the data migrated each time without anyone going back to update the existing people’s profile pictures.

This picture would then be physically printed on their security cards, which for people with their faces on their cards, they wanted to appear in Outlook and SharePoint.

The special request was that they wanted me to ‘filter out’ images that were just logos, and only import profile pictures into FIM from Gallagher (and then exported out of FIM into Active Directory and SharePoint).

There were many concerns with this request:

  • We had limited budget and time, so removing the offending logos manually was going to be very costly and difficult (not to mention very tiring for that person across 10,000 identities!)
  • Gallagher stores the picture in its database as a ‘byte’ value (rather than the picture filename used for the import).  That value format is exposed as well across the Gallagher web service API for that picture attribute.
  • Gallagher uses a ‘cropping system’ to ensure that only 240 x 160 pixel sized image is selected from the logo source file that was much larger.  Moving the ‘crop window’ up, down, left or right would change the byte value stored in Gallagher (I know, I tested it almost 20 different combinations!)
  • The logo file itself had multiple file versions, some of which had been cropped prior to uploading into Gallagher.

Coding

My colleague Boris pointed me to an open source Image Comparison DLL written by Jakob Krarup (which you can find here).

It’s called ‘XNA.FileComparison’ and it works superbly well.  Basically this code allows you to use Histogram values embedded within a picture to compare two different pictures and then calculate a ‘Percentage Different’ value between the two.

One of the methods included in this code (PercentageDifference()) is an ability to compare two picture based objects in C# and return a ‘percentage difference’ value which you can use to determine if the picture is a logo or a human (by comparing each image imports into the Connector Space to a reference logo picture stored on the FIM server).

To implement it, I did the following:

  1. Downloaded the sample ‘XNA.FileComparison’ executable (.exe) and ran a basic comparison between some source images and the reference logo image, and looked at the percentage difference values that the PercentageDifference() method would be returning.  This gave me an idea of how well the method was comparing the pictures.
  2. Downloaded the source Visual Studio solution (.SLN) file and re-compiled it for 64-bit systems (the compiled DLL version on the website only works on x86 architectures)
  3. Added the DLL as a Project reference to a newly created Management Agent Extension, whose source code you can find below

In my Management Agent code, I  then used this PercentageDifference() method to compare each Connector Space image against a Reference image (located in the Extensions folder of my FIM Synchronization Service).   The threshold value the method returned then determined whether to allow the image into the Metaverse (and if necessary copy it to the ‘Allowed’ folder) or block it from reaching the Metaverse (and if necessary copy it to the ‘Filtered’ folder).

I also exported each image’s respective threshold value to a file called “thresholds.txt” in each of the two different folders:  ‘Allowed’ and ‘Filtered’.

Each of the options above were configurable in an XML file such as:

  • Export folder locations for Allowed & Filtered pictures
  • Threshold filter percentage
  • A ‘do you want to export images?’ Boolean Export value (True/False), allowing you to turn off the image export on the Production FIM synchronization server once a suitable threshold value was found (e.g. 75%).

A sample XML that configures this option functionality can be seen below:

 

Testing and Results

To test the method, I would run a Full Import on the Gallagher MA to get all pictures values into the Connector Space.  Then I would run multiple ‘Full Synchronizations’ on the MA to get both ‘filtered’ and ‘allowed’ pictures into the two folder locations (whose locations are specified in the XML).

After each ‘Full Synchronization’ we reviewed all threshold values (thresholds.txt) in each folder and used the ‘large icons’ view in Windows Explorer to ensure all people’s faces ended up in the ‘Allowed’ folder and all logo type images ended up in the ‘Filtered’ folder.   I ensured I deleted all pictures and the thresholds.txt in each folder so I didn’t get confused the next run.  If a profile picture ended up in the ‘filtered folder’ or a logo ended up in the ‘allowed folder’, I’d modify the threshold value in the XML and run another Full Synchronization attempt.

Generally, the percentage difference for most ‘Allowed’ images was around 90-95% (i.e. the person’s face value was 90-95% different than the reference logo image).

What was interesting was that some allowed images got down as low as only 75% (ie. 75% different compared to the logo), so we set our production threshold filter to be 70%.  The reason some people’s picture was (percentage wise) “closer” to the logo, was due to some people’s profile pictures having a pure white background and the logo itself was mostly white in colour.

The highest ‘difference’ value for logo images was as high as 63% (the difference between a person’s logo image and the reference logo image was 63%, meaning it was a very “bad” logo image – usually heavily cropped showing more white space than usual).

So setting the filter threshold of 70% fit roughly halfway between 63% and 75%.  This ended up in a 100% success rate across about 6000 images which isn’t too shabby.

If in the future, there were people’s faces that were less than 70% different from the logo (and not meet the threshold so were unexpectedly filtered out), the customer had the choice to update the Management Agent configuration XML to lower the threshold value below 70%, or use a different picture.

Some Notes re: Code

Here are some ‘quirks’ related to my environment which you’ll see in the MA Extension code:

  • A small percentage of people in Gallagher customers did not have an Active Directory account (which I used for the image export filename), so in those cases I used a large random number if they didn’t to save the filename for the images (I was in a hurry!)
  • I’m writing to a custom Gallagher Event Viewer ID name, which will save all the logs to that custom Application Event Viewer (in case you’re trying to find the logs in the generic ‘Application’ Event Viewer log)
  • Hard coding of ‘thresholds.txt’ as a file name and the location of the Options XML (beware if you’re using a D:\ drive or other letter for the installation path of the Synchronization Service!)

Management Agent Extension Code

 

 

 

Powershell Status Reporting on AAD Connect

Recently, I had a customer request the ability to quickly report on the status of two AAD Connect servers.

Since these two servers operate independently, it is up to the administrator to ensure the servers are healthy and they are operating in the correct configuration modes with respect to each other.

Typically, if you’re going to spend money operating two AAD connect servers, it make sense they both are enabled with their import cycles but only one runs in ‘Normal’ mode (i.e. exporting) and the other in ‘Staging’ mode (i.e. not exporting but ready to take over if needed).

This customer had a full import & full sync. time of almost two full days (!), so it was essential the second staging mode AAD Connect server was operating correctly (in Staging mode and with its cycle enabled) to take over operations.

Since AAD Connect is based on the architecture of the Synchronization Engine of Microsoft Forefront Identity Manager (formerly known as MIIS), clustering is not an option.

The ‘Get-ADSyncScheduler’ AAD Connect PowerShell commands is well documented by Microsoft, and we’ve posted a few articles on using that command recently at this blog as well.

My customer had a few requirements:

  • Be able to quickly gather the status of both AAD Connect servers once an administrator has logged into at least one of them
  • Pool the status of both server’s ‘Staging’ mode status and its cycle status (either enabled or disabled)
  • Warn administrators if two servers are operating in ‘normal’ mode or are otherwise mis-configured

On the third point, if you attempted to bring a second AAD Connect server out of ‘Staging’ mode, there’s nothing on the server or via Azure authentication that prevent you doing that.  Microsoft strongly warn you during the installation process that you should be wary of other AAD Connect servers and their staging mode status.

I briefly tested dropping a second server out of Staging Mode in a Test environment resulting in two AAD Connect servers operating in ‘normal’ (import/export) and whilst I didn’t see any immediate issue, I strongly recommend not doing this.  I also only had a test Office 365 tenancy of only a handful of objects to test against so it wasn’t a true reflection of what could happen in a more production environment with more features (like near real time password hash sync.) and more objects.  I honestly thought I’d run into a blocking message preventing me making that configuration.

After developing the script, I went down the path of using string matching to determine the results of the ‘Get-ADSyncScheduler’ command.  This had the following impacts:

  • In order to simplify the script, I wanted the objects for ‘Staging’ and ‘Cycle’ to have three results: ‘null’, ‘true’ or ‘false’.
  • In order to filter the results of the ‘get-ADSyncScheduler’ command, I converted it into a string, then performed a string matching query against the whole string for the results of the Staging and Cycle enabled options.
  • Instead of string matching, this example command would return the actual value of ‘Staging mode’ and could have been used instead:
    • Get-ADSyncScheduler | select -ExpandProperty SyncCycleEnabled

Once you have the results of the status of the Staging and Cycle properties for both servers(true/false/null), you can then report on them collectively to either indicate an ‘OK’ status, a ‘Warning’ status or an ‘Alert’ status.

I broke up those reporting categories into the following:

  • OK Status – One ‘Normal’ mode AAD Connect server with Cycle enabled, one ‘Staging’ AAD Connect server with Cycle enabled
  • Warning Status – One ‘Normal’ mode AAD Connect server with cycle enabled, with the other AAD Connect server with its cycle disabled (but still configured to be in Staging Mode)
  • Offline Alert Status – the remote AAD Connect server cannot be contacted i.e.. there’s a null object when the text is searched for the status
  • Alert Status – Both servers can be contacted but no AAD connect servers are operating in normal mode (ie. both are in ‘Staging mode’) or two servers are operating in normal mode.

 

This script was installed onto both of the AAD Connect servers with a shortcut provided to the ‘All Users’ desktop.  Both copies of the script were then modified with the specific server name of their respective remote AAD Connect servers, ie. ‘Server B’ in ‘Server As’ script and vice versa, at this location:
$remoteAAD = Invoke-Command -ComputerName <remote server name here> -ScriptBlock { Get-ADSyncScheduler }
This script could be enhanced with:
  • Running on an administrator’s workstation instead and both AAD Connect servers treated with remote PowerShell commands (and script updated and tested for compatibility running remotely with AAD Connect PowerShell commandlets)
  • Enhanced with email alerting if it was to run in conjunction with Windows Scheduler
A warning: this script has not been tested against two servers operating in normal mode or offline so some of the alerts are ‘theoretical’ at this stage.  Let me know in the comments if you find any bugs etc.

 

cls

# Set variables and constants

$localStagingStatus = $null
$remoteStagingStatus = $null
$localCycleStatus = $null
$remoteCycleStatus = $null

$StagingTrue = """StagingModeEnabled"":true"
$SyncCycleEnabledTrue = """SyncCycleEnabled"":true"
$StagingFalse = """StagingModeEnabled"":false"
$SyncCycleEnabledFalse = """SyncCycleEnabled"":false"

# Review local AAD Scheduler details

$localAAD = Get-ADSyncScheduler
$localAADstr = $localAAD.ToString()

if ($localAADstr -match $StagingTrue) {write-host -ForegroundColor DarkYellow "Staging mode ENABLED locally"
$localStagingStatus = "true" }

if ($localAADstr -match $SyncCycleEnabledTrue) {write-host -ForegroundColor DarkYellow "Sync Cycle ENABLED locally"
$localCycleStatus = "true" }

if ($localAADstr -match $StagingFalse) {write-host -ForegroundColor DarkYellow "Staging mode DISABLED locally"
$localStagingStatus = "false" }

if ($localAADstr -match $SyncCycleEnabledFalse) {write-host -ForegroundColor DarkYellow "Sync Cycle DISABLED locally"
$localCycleStatus = "false" }

# Connect to remote AAD connect server

$remoteAAD = Invoke-Command -ComputerName servername -ScriptBlock { Get-ADSyncScheduler }
$remoteAADstr = $remoteAAD.ToString()

if ($remoteAADstr -match $StagingTrue) {write-host -ForegroundColor DarkYellow "Staging mode ENABLED remotely"
$remoteStagingStatus = "true"}

if ($remoteAADstr -match $StagingFalse) {write-host -ForegroundColor DarkYellow "Staging mode DISABLED remotely"
$remoteStagingStatus = "false"}

if ($remoteAADstr -match $SyncCycleEnabledTrue) {write-host -ForegroundColor DarkYellow "Sync Cycle ENABLED remotely"
$remoteCycleStatus = "true"}

if ($remoteAADstr -match $SyncCycleEnabledFalse) {write-host -ForegroundColor DarkYellow "Sync Cycle DISABLED remotely"
$remoteCycleStatus = "false"}

if ($debug) {
write-host "local staging status:" $localStagingStatus
write-host "local cycle status:" $localCycleStatus
write-host "remote staging status:" $remoteStagingStatus
write-host "remote cycle status:" $remoteCycleStatus
}

# Interpret results

write-host "---------------------------------------------------------------"
write-host "Summary of Results from AAD Connect server:" $env:computername
write-host "---------------------------------------------------------------"

# OK

if ($localStagingStatus -eq "true" -and $localCycleStatus -eq "true" -and $remoteStagingStatus -eq "false" -and $remoteCycleStatus -eq "true") { write-host -foregroundcolor Green "OPERATIONAL STATUS: OK. Local server operating in ACTIVE STANDBY mode. Remote server operating in active production mode."}
if ($localStagingStatus -eq "false" -and $localCycleStatus -eq "true" -and $remoteStagingStatus -eq "true" -and $remoteCycleStatus -eq "true") { write-host -foregroundcolor Green "OPERATIONAL STATUS: OK. Local server operating in ACTIVE PRODUCTION mode. Remote server operating in active standby mode."}

# Warning

if ($localStagingStatus -eq "true" -and $localCycleStatus -eq "false" -and $remoteStagingStatus -eq "false" -and $remoteCycleStatus -eq "true") { write-host -foregroundcolor Yellow "OPERATIONAL STATUS: Warning. Local server operating in OFFLINE STANDBY mode. Remote server operating in ACTIVE PRODUCTION mode."}
if ($localStagingStatus -eq "false" -and $localCycleStatus -eq "true" -and $remoteStagingStatus -eq "null" -and $remoteCycleStatus -eq "null") { write-host -foregroundcolor Yellow "OPERATIONAL STATUS: Warning. Local server operating in ACTIVE PRODUCTION mode. Remote server cannot be contacted, could be OFFLINE"}

# Offline Alert

if ($remoteStagingStatus -eq "null" -and $remoteCycleStatus -eq "null") { write-host -foregroundcolor Yellow "OPERATIONAL STATUS: Alert. Local server operating in STANDBY mode. Remote server cannot be contacted, could be OFFLINE"}

# Major Alert, confirmed configuration issue

if ($localCycleStatus -eq "false" -and $remoteCycleStatus -eq "false") {write-host -foregroundcolor Red "OPERATIONAL STATUS: Both servers have their cycles disabled. Review immediately."}
if ($localStagingStatus -eq "true" -and $remoteStagingStatus -eq "true") { write-host -foregroundcolor Red "OPERATIONAL STATUS: Both servers are in Staging mode. Review immediately."}
if ($localStagingStatus -eq "false" -and $localCycleStatus -eq "true" -and $remoteCycleStatus -eq "true" -and $remoteStagingStatus -eq "false") { write-host -foregroundcolor Red "OPERATIONAL STATUS: Alert. Both servers are in ACTIVE PRODUCTION mode. This violates Microsoft best practice and could cause replication problems. Review immediately."}

 

 

 

Configuring Proxy for Azure AD Connect V1.1.105.0 and above

My colleague David Ross has written a previous blog about configuring proxy server settings to allow Azure AD Sync (the previous name of Azure AD Connect) to use a proxy server.

Starting with version 1.1.105.0, Azure AD Connect has completely changed the configuration steps required to allow the Azure AD Connect configuration wizard and Sync. Engine to use a proxy.

I ran into a specific proxy failure scenario that I thought I’d share to provide further help.

My Azure AD Connect (v.1.1.110.0) installation reached the following failure at the end of the initial installation wizard:

Installfailure1

The trace log just stated the following:

Apply Configuration Page: Failed to configure directory extension (True). Details: System.Management.Automation.CmdletInvocationException: user_realm_discovery_failed: User realm discovery failed —> Microsoft.IdentityManagement.PowerShell.ObjectModel.SynchronizationConfigurationValidationException: user_realm_discovery_failed: User realm discovery failed


In this environment, I had the following environmental components:

  • The AAD Connect software was going to operate under a service account
  • All Internet connectivity was through a proxy server which required authentication
  • Windows Server 2012 R2 platform
  • Two factor authentication was enabled for O365 Admin accounts

Previously, in order to get authentication working for O365, I set the proxy server settings in Internet Explorer.  I tested browsing and it appeared fine.  I also had to add the following URLs to the Internet Explorer’s ‘Trusted Sites’ to allow the new forms based authentication (which allowed the second factor to be entered) to work properly with the Azure AD connect wizard:

So even though my Internet proxy appeared to be working under my admin. account, and Office 365 was authenticating properly during the O365 ‘User Sign-In’ screen, I was still receiving a ‘User Realm Discovery’ error message at the end of the installation.

This is when I turned to online help and I found this Microsoft article on the way Azure AD Connect now handles proxy authentication.  It can be found here and is by and large an excellent guide.

Following Microsoft’s guidance, I ran the following proxy connectivity command and verified my proxy server was not blocking my access:

Invoke-WebRequest -Uri https://adminwebservice.microsoftonline.com/ProvisioningService.svc

Installfailure2

So that appeared to be fine and not causing my issue.  Reading further, the guidance in the article had previously stated at the start that my ‘machine.config’ file had to be properly configured.  When I re-read that, I wondered aloud “what file?”.  Digging deeper into the guidance, I ran into this step.

It appears that Azure AD connect now uses Modern Authentication to connect to Office 365 during the final part of the configuration wizard, and that the ‘web.config’ file has to be modified with your proxy server settings for it to complete properly.

Since the environment here requires a proxy which requires authentication, I added the following to the end of the file:

C:\Windows\Microsoft.NET\Framework64\v4.0.30319\Config\machine.config

All new required text are within the ‘<system.net>‘ flags.   NOTE:  The guidance from Microsoft states that the new code has to be ‘at the end of the file’, but be sure to place it BEFORE the text: ‘</configuration>’:

Installfailure4

I saved the file, and then clicked ‘Retry’ button on my original ‘user realm discovery failure’ message (thankfully not having to attempt a completely new install of Azure AD connect) and the problem was resolved.

Hope this helps!

 

AAD Connect: Custom AAD Attributes & Scheduler PowerShell

Following on from the posts from my esteemed colleagues: Lucian and Josh, I thought I’d post my experiences working with the latest version (v1.1.110.0) specifically two areas:

  1. Working with the AAD Connect Scheduler, that is now based in Powershell and whose configuration is now stored in AAD, using the ‘Set-ADSyncScheduler’ commands
  2. Working with ‘extension Attributes’ using Directory Extensions feature of AAD Connect

Both of these features are new to the latest version of AAD Connect.

Working with new AAD Connect Scheduler (PowerShell ‘Set’ and ‘Get’ commands)

The official Microsoft link to working with the ‘Set-ADSyncScheduler’ command can be found here.  I thought I’d expand my workings with this utility as a few of the commands cannot be set (even though they’re exposed using the ‘Get-ADSyncScheduler’ command).

Previous versions of AAD Connect used the in-built Windows Server Task Scheduler to execute the AAD Connect EXE which triggered the Synchronization Service to run on its (previously default) schedule of 3 hours.  For example, to disable the schedule previously, it was generally accepted to ‘disable’ the Task Scheduler job itself.

With the move to PowerShell (and the storage of the AAD Connect schedule in Azure AD), the commands to disable or enable the schedule are now PowerShell commands.

To verify the settings of the AAD Connect Scheduler, type:

Get-ADSyncScheduler

pic1 - getADSyncScheduler

The above picture tells us:

  1. The AAD Connect server is in ‘Staging mode’ and will not export into the local AD or cloud AAD directories
  2. The ‘SyncCycleEnabled’ value is ‘False’ meaning the Synchronization Service is effectively disabled unless you run a manual sync process (and you can find those instructions in Lucian‘s post)
  3. The server will cycle every 30 minutes (which is the default value), and that value has not been overwritten by a ‘customized’ sync cycle (the ‘customized’ value is blank which means an administrator has not run the command at all).

If the ‘SyncCycleEnabled’ value is set to ‘False’, this tells us that the scheduler will not run unless you initiate a manual AAD Connect ‘delta’ or ‘initial’ cycle yourself

  • To enable the schedule, type:

Set-ADSyncScheduler -SyncCycleEnabled $True

  • To disable the schedule, type:

Set-ADSyncScheduler -SyncCycleEnabled $False

 

Another parameter settings that can be modified with ‘Set-ADSyncScheduler’ is:

  • Change the next automated sync cycle to be a ‘Delta’ cycle:

Set-ADSyncScheduler -NextSyncCyclePolicyType Delta

pic3 - Set policy type

  • Change the next automated sync cycle to be a ‘Full Import’ (referred to as ‘Initial’) cycle:
Set-ADSyncScheduler -NextSyncCyclePolicyType Initial

 

The start time setting which I was trying to customize for a customer (i.e.’NextSyncCycleStartTimeInUTC’) does not seem to be possible to be modified (at least in this version):

pic2 - no ability to set UTC time start

Since the default schedule is 30 minutes (down from 3 hours), this isn’t as critical a setting so at this stage you’ll have to advise your customers or management that the start time cannot be modified.  My customer previously had the 3 hour cycle starting ‘on the hour’ to line up with other identity management processes which executed at specific times during the day.

You are also prevented from moving the Server out of or into Staging Mode (ie. ‘StagingModeEnabled’ from ‘True’ to ‘False’ or reverse):

pic4 - cant set staging mode

Like previous versions, you will need to run the AAD Connect wizard to modify this setting:

pic5 - configure staging mode.JPG

If you try to set a time that’s quicker than ‘every 30 minutes’ using the command:

Set-ADSyncScheduler -CustomizedSyncCycleInterval 00:25:00

pic7 - setting not supported time

It will list it as a ‘customized’ value, but it will not change the ‘effective’ time ie. it will not run at every 25 minutes and the ‘allowed’ value will not change from 30 minutes.

pic8 - setting not supported time

Working with ‘extensionAttributes’ using Directory Extensions’ feature of AAD Connect

The article which explains the AAD Connect ‘Directory Extensions’ feature can be found here.

This feature provides the ability to specify custom attributes (sometimes called ‘extended’ attributes) that a customer (or app) has modified into the schema of their local Active Directory.

In this example, a customer requested to copy 3 custom attributes and 1 ‘extensionAttribute’ into Azure AD (which is part of the default AD Schema) so they could be accessed by the Graph API:

pic6 - custom1

NOTE: For privacy and security reasons, for the picture above, I’ve hidden the company name from the start of two of the attributes.

However, according to Microsoft representatives who I am working with, currently the limitation is:

  • Extension attributes 1-15 are not exposed in the Graph API as yet.
So adding in the Extension Attribute 3 (in this example) to the Directory Extension exposes that attribute as the following attribute name in Azure AD (from the Microsoft Azure article):
The attributes are prefixed with extension_{AppClientId}_. The AppClientId will have the same value for all attributes in your Azure AD directory

So we should expect to see the local AD extensionAttribute3 exposed to the Graph API as:

‘extension_AppClientID_extensionAttribute3’

The others would be named:

‘extension_AppClientID_division’

‘extension_AppClientID_OrgUnitCode’

‘extension_AppClientID_Status’

Direct Access on Azure, Why? Can? How?

Direct Access on Azure?

A customer recently requested Kloud to assist them in implementing a Windows 2012 R2 server based Direct Access (DA) service, as their work force had recently moved to a Windows 8 client platform.  What did surprise me was that they requested it be one of the first solutions to be hosted on their Microsoft Azure service.

Direct Access, for those unfamiliar with the technology, is essentially an ‘always on’ VPN style connection that provides a user access to a corporate network from any basic Internet network connection without any user interaction.  The connection is established prior to a user even logging into their laptop or tablet, so services such as Group Policy mapped drives and login scripts will execute just like a user logging into an internal corporate network.  This technology was introduced with Windows 7 Enterprise edition and continues with Windows 8 Enterprise edition.  Windows 10 appears to have this code as well (at least judging by the current technical preview and TechNet forum responses).

One of the first items to note is that Direct Access isn’t supported by Microsoft, at this stage in its life, on the Azure platform.  After implementing it successfully however for this customer’s requirements, I thought I would share some learnings about why this solution worked for this customer and how we implemented it, but also some items to note about why it may not be suitable for your environment.

This article can provide guidance if you’re considering a single DA server solution, however you might need to look for other advice if you’re looking at a multi-site, or multi-DA server solution requiring load balancing or high availability.  Primarily the guidance around IPv6 and static IP addressing that I address here may change if you look at these architectures.


 

Why?

My customer had the following business and technical requirements:

  • They own a large fleet of Windows 8 laptops that ‘live’ outside the corporate network and primarily never connect to the corporate network and therefore never communicate with internal systems such as Active Directory for Group Policy updates or internal anti-virus systems.  The customer wanted to ensure their laptop fleet could still be managed by these systems to ensure compliance and consistency in user interface ‘lockdown’ (using Group Policy) for support teams in aiding troubleshooting and security updates.
  • My customer wanted to remove their existing third-party SSL VPN solution to access internal services and recoup the licensing cost with this removal.  The Direct Access solution had already been ‘paid for’ in effect as the customer already had a Microsoft Enterprise Agreement.
  • The exiting SSL VPN solution forced all Internet access (‘forced tunnelling’) during the session through the corporate network costing the customer on ISP Internet download fees, especially for users working from home.
  • My customer did not have the budget to publish all existing internally accessible services to the Internet using publishing technologies such as Microsoft Web Application Proxy (WAP), for example.  This would require designing and implementing a WAP architecture, and then testing each service individually over that publishing platform.

Can’t or shouldn’t?

Microsoft Azure can host a Direct Access service, and for the most part it works quite well, but here are the underlying technologies that in my testing refuse to work with the Azure platform:

  • ‘Manage out’ – this term refers to the ability for servers or clients on the corporate network being able to establish a connection (that is it creates the network packet) directly to the Direct Access client that is connected only to the Internet.  There is no official guidance from Microsoft about why there is a limitation, however in my testing, I find that it is related to IPv6 and the lack of ‘IPv6 Broadcast’ address ability.  I didn’t get time to run Wireshark across it (plus my version of Wireshark wasn’t IPv6 aware!) so if anyone has found a workaround to get this working in Azure, shoot me an email! (michael.pearn@kloud.com.au).
  • Teredo – there are two types of Direct Access architectures on Windows Server 2012 R2: IP-HTTPS (that is an HTTPS tunnel is established between the client and the Direct Access server first, then all IPv6 communication occurs across this encrypted tunnel) or the use of ‘Teredo’ which is a IPv6 over IPv4 encapsulation technology.  Microsoft explains the various architectures best here: http://technet.microsoft.com/en-us/library/gg315307.aspx (although this article is in 2010 in the context of the now retired UAG product). Teredo however requires TWO network cards, and since Azure Virtual Servers only support one network card per server, then Teredo cannot be used as an option on Azure.  In all of my testing on Azure, I used IP-HTTPS.

The following is a good reason not (at least for now) to put Direct Access on Azure:

  • There is no native way, using Direct Access configuration, to deny a Direct Access client from reaching any server or FQDN on the internal network (ie. a ‘black list’), if that connection can be established from the Direct Access server.  For example, if a DA client attempts to download large files to servers reachable from the Direct Access server (such as large image or CAD drawings), then unless the server is hosted in Azure as well, all downloads will occur over the Azure VPN site-to-site connection.  This can prove very costly in terms of Azure fees.  My customer used their VPN hardware (which had a firewall service) to establish a ‘black list’ of IP addressed sites, that were still on-premise, to prevent Direct Access clients reaching these services across the Azure VPN (although communicating this to your end clients why they can’t reach these services can be difficult, as they’ll just get a 401 web error or mapped drive connection failure).

How?

The key to getting a working Direct Access solution on the Azure platform is primarily configuring the environment with the following items:

  • Ensure all Direct Access servers use a Static Private IP address.   The new Azure Portal can easily assign a Static IP address, but only if the Virtual Machine is built into a custom network first.  If a Virtual Server is built using the default Azure internal networking (10.x.x.x.), the server can be rebuilt into a custom network instead however the server object itself has to be deleted and rebuilt.  If the Virtual Disk is kept during the deletion process, then the new server can just use the existing VHD during the install of the new object.  Browse to the new Azure Portal (https://portal.azure.com) and use the web interface to first configure a private IP address that matches the network in use (see example picture below).  The ‘old’ Azure portal (https://manage.windowsazure.com) cannot set a static IP address directly through the web interface and PowerShell has to be used instead locally on the server.  A connection connection to the Azure service (via the Internet) is needed to set a static IP address and more instructions can be found here: http://msdn.microsoft.com/en-us/library/azure/dn630228.aspx.  To use the Azure PowerShell commands, the Azure PowerShell commandlets have to be installed first and an Internet connection has to be present for the Powershell to connect to the Azure service to allocate a static IP address.  More instructions on getting the Azure PowerShell be found here: http://azure.microsoft.com/en-us/documentation/articles/install-configure-powershell/
    AzureStaticIPaddress
  • Be sure to install all of the latest Hotfixes for Windows 8/8.1 clients and Server 2012 (with or without R2).  This article is excellent to follow the list of required updates:  http://support.microsoft.com/kb/2883952
  • Install at least one Domain Controller in Azure (this is a good article to follow: http://msdn.microsoft.com/en-us/library/azure/jj156090.aspx).  The Direct Access server has to be domain joined, and all Direct Access configuration (for the DA clients and the DA server) are stored as Group Policy objects.  Also, the Direct Access server itself performs all DNS queries for Direct Access clients on their behalf.  If a Domain Controller is local, then all DNS queries will be contained to the Azure network and not be forced to go over the Azure VPN site-to-site connection to the corporate network.  Do not forget to configure your AD Sites and Services to ensure that Domain Controller in Azure is contained within its own AD site so the AD replication does occur too often across the VPN connection (plus you don’t want your on-premise clients using the Azure DC for authentications).
  • When configuring Direct Access, at least as far as a single Direct Access server solution is required, do not modify the default DNS settings that are assigned by the configuration wizard.  It isn’t well documented or explained, but essentially the Direct Access server runs a local DNS64 service which essentially becomes the DNS server for all Direct Access clients (for all internal sites, not the Internet sites).  The DA configuration wizard assigns the IP address of the DA server itself and provides the IPv6 address of the server to the DA Client GPO for DNS queries.  The DA server will serve all DNS requests for addresses ending in the ‘DNS Suffix’ pool of FQDN’s specified in the DA settings.  If you have a ‘split-brain’ DNS architecture, for example you have ‘customer.com’ addresses on Internet DNS servers but you also overrule these records with an internal Primary zone (or stub records) for ‘customer.com’ for certain sites, then if you include ‘customer.com’ in the Direct Access DNS Suffix settings, then the client will only use internal DNS servers (at least the DNS servers that the Direct Access server can reach) for resolution for these names.  As with all things, test all of your corporate websites during the testing phase to ensure no conflicts.  One of the services that could cause conflicts would be solutions such as ADFS which are generally architected with the exact same internal/external FQDN (e.g. adfs.customer.com) – the internal FQDN generally points clients to use integrated authenticated, the external FQDN generally points to forms based authentication.  For my customer, re-directing all ‘customer.com’ requests to the internal corporate network, including ADFS, worked without issue.
  • If you’re using Hyper-V Windows 8 clients in your testing, be aware that in my experience the testing experience is ‘patchy’ at best.  I did not have time to get a Windows 8 client VHD imported into Azure (there’s no native Windows 8 templates to use in Azure) so I used a local Hyper-V Windows 8 client in my testing and used the Offline Domain Join plus Group Policy option (there was no point to point network connection between my Azure DA server and my Hyper-V test client).  My experience was that the Hyper-V DA client would connect to the DA FQDN but often the DNS service provided by the DA server was very intermittent.  This article is excellent to follow to get the Direct Access Client Settings Group Policy onto the offline Hyper-V client for testing:  http://technet.microsoft.com/en-us/library/jj574150.aspx.  My experience is that a physical client should be used (if possible) for all Direct Access testing.
  • CNAME records can be used instead of using the native Azure published ‘service.cloudapp.net’ FQDN for the DA service itself.  My client successfully used a vanity CNAME (e.g. directaccess.customer.com) pointing to their ‘service.cloudapp.net’ name, with a matching wildcard certificate that used the ‘vanity’ name in the subject field (e.g. *.customer.com).

Tips?

The following are some general points to follow to help get a DA service running on Azure:

  • If you’re using Windows 8.1 for testing, and you’re finding the Direct Access Client Settings GPO are not reaching your test clients, then check the WMI filter to ensure they are not being excluded from targeting.  In my customer’s environment, the version only allowed 6.2% clients (ie. Windows 8, not Windows 8.1).  Be sure the WMI filter includes ‘like 6.3%’ in the filter, others Windows 8.1 clients will not receive the Direct Access policy correctly:

 

Select * from Win32_OperatingSystem WHERE (ProductType = 3) OR ((Version LIKE '6.2%' OR Version LIKE '6.3%') AND (OperatingSystemSKU = 4 OR OperatingSystemSKU = 27 OR OperatingSystemSKU = 72 OR OperatingSystemSKU = 84)) OR (Version LIKE '6.1%' AND (OperatingSystemSKU = 4 OR OperatingSystemSKU = 27 OR OperatingSystemSKU = 70 OR OperatingSystemSKU = 1 OR OperatingSystemSKU = 28 OR OperatingSystemSKU = 71))
 
  • Once you’ve configured the Server and distributed the Direct Access Group Policy objects to your target clients, use the Direct Access Troubleshooter utility (http://www.microsoft.com/en-au/download/details.aspx?id=41938), see picture below.  This tool is essential to rapidly determine where failures might be occurring in your configuration.  It is limited however in that it does not report on certificate failures (eg. FQDN to subject mis-matches, expired certificates etc.) and it is limited in reporting on errors related to DNS name lookups (of which I provide a bit more guidance below).

DirectAccessTroubleshooter

  • Verify that the client can establish an IP-HTTPS tunnel to your Direct Access server – if it cannot, then the problem is most likely the certificate you’ve used in the Client section of your DA configuration, or access to the Direct Access FQDN you have published the DA service under.  If the tunnel has established correctly, then you should see connection objects appear in the Windows Firewall location (in both Main Mode and Quick Mode), see picture below.  If you cannot see any objects, then test the Direct Access FQDN using a web browser – standard testing behaviour is to see no SSL certificate errors and an HTTP ‘404’ error.

SAssociations

  • It may be easier to get the solution working first with the ‘self-signed’ certificate option provided by Microsoft first (the certificate is actually copied to the client using the Direct Access Client GPO), then move the solution to a third party, customer owned certificate.  This will rule out certificate problems first.
  • If the IP-HTTPS tunnel is established, and you see positive connection numbers appear for clients in the Direct Access console, but the clients still cannot reach internal services over their respectiveFQDNs, then the problem is most likely the DNS configuration of your solution.  The key to testing this is using the following commands:
    • Open a command prompt (cmd.exe) with administrator credentials
    • Type ‘nslookup
    • Type ‘server’, space and then the IPv6 address of the Direct Access server, for example ‘server <ipv6 address>‘.  This forces the session to use the server you have specified for all future queries you type
    • Type the FQDN of your local Active Directory e.g. ‘latte.internal‘.  You should successfully see a response coming back from NSLookup with an IPv6 address of a domain controller.  The IPv6 is the translated IPv4 address that the DNS64 service is providing to the DA client (and not an IPv6 address bound to the network card of the domain controller in case you’re confused).
  • If you cannot get any positive response for the DNS service on the Direct Access server, check:
    • The IPv6 address of the Direct Access server (see picture below) should match the exact IPv6 address that is in the NRPT policy of the Direct Access client.

DAIPv6

  • To verify the IPv6 address is configured correctly on the Direct Access client in theNRPT policy, the following two methods can be used:
    • 1. On the Direct Access Client, open Powershell.exe with Administrator privileges and type ‘Get-DnsClientNrptPolicy‘:

 

NRPTpolicy1

  • 2. On the Direct Access Client, open the Registry (Regedit.exe) and browse to the following registry location:

NRPTpolicy2

  • Using Either method above, the IPv6 address in the NRPT policy listed HAS TO match the IPv6 address of your Direct Access server.  If these addresses do not match, then verify the Direct Access Client GPO settings have reached your clients successfully.  The major reason in assigning a static IP address in Azure to the DA server is so that the IPv6 address allocated by the Azure service remains the same, even if the DA server is shut down (and IP address de-allocated from the VM) from the Azure portal.  Alternatively, modify the NRPT registry directly above to match the IPv6 address used on the DA server (but only do this during the test phase).  It is best practice to ensure clients update their DA GPO settings regularly.  All changes to DA settings should be delivered by GPOs where possible.
  • Finally, ensure that the Firewall service is enabled on your Direct Access clients for the ‘private profile’ (at least).  The firewall service is used to establish the Direct Access connection.  If the Firewall service is disabled on the Private Profile, then the solution will not establish a connection to the Direct Access service.  The firewall service can be disabled without issue on the domain or public profiles (however that is not generally recommended).

privatefirewall

I hope this article has proved useful to you and your organisation in reviewing or implementing Direct Access on Azure.

PowerShell Detection Method for SCCM 2012 Application Compliance management

Microsoft System Center Configuration Manager (SCCM) 2012 has a very powerful Application Detection and Delivery model, separate from the existing ‘package and program delivery model’ of previous versions of SCCM & SMS.

The power of this new model is not having to ‘daisy chain’ packages and executables together to achieve a desired outcome.  Using SCCM’s Detection Model reduces the burden in managing a Windows client base in terms of keeping its baseline configuration the same across every client in the Organisation.

I recently assisted a Kloud customer to configure a script delivery application, that was using the Application delivery model and the ‘Detection Method’ to ensure files reached their local Windows 8 folder destinations successfully.  The script simply copies the files where they need to go and the Detection Method then determines the success of that script. If SCCM does not detect the files in their correct destination locations, it attempts again at executing the script.

Benefits in using SCCM 2012 Application and Detection Method Delivery

Using this Application and Detection method provided Kloud’s customer with the following business benefits:

  • Increased reliability of delivering Office template files to a Windows 8 machine and therefore reduced TCO in delivering software to authorised workstations.  If the application files were corrupted or deleted during installation or post-installation (for example a user turning their workstation off during an install), then SCCM detects these files are missing and re-runs the installation
  • Upgrades are made easier, as it does not depend on any Windows 8 workstation having to run a previous installation or ‘package’.  The ‘Detection Method’ of the Application object determines if the correct file version is there (or not) and if necessary re-runs the script to deliver the files.  The ‘Detection Method’ also runs after every install, to guarantee that a client is 100% compliant with that application delivery.
  • Uses SCCM client agent behaviour including BITS, restart handling, use of the ‘Software Center’ application for user initiated installs and Application package version handling – for example, if a single file is updated in the Application source and re-delivered to the Distribution Point, the SCCM client detects a single file has changed, and will only downloads the changed file saving bandwidth (and download charges) from the Distribution Point

Customer Technical Requirements

Kloud’s customer had the following technical requirements:

1. My customer wanted to use an SCCM Application and Detection Rule to distribute ten Office 2010 template files to Windows 8 workstations (managed with the SCCM client)

2. They wanted to be able to drop new Office 2010 template files at any stage into the SCCM source application folder, distribute the application and the SCCM clients download and install those new templates with minimum interference to end users.

3. They also wanted the minimum number of objects in SCCM to manage the application, and wanted the application to ‘self heal’ if a user deleted any of the template files.

4. All code had to be written in PowerShell for ease of support.

Limitations of Native Detection Methods

SCCM 2012 has a great native Detection Rules method for MSI files and file system executables (see native Detection Rule image below:).

NativeDetectionRules

However we quickly worked out its limitations with this native Detection Rule model, namely for the ‘File System’ setting type:

1. Environment variables for user accounts, such as %username% and %userprofile% are not supported

2. File versioning can only work with Windows executables (ie. .EXE) and not metadata embedded in files, for example Word files.

SCCM comes with the ability to run Powershell, VBScript or JScript as part of its Detection Model, and it is documented with VBScript examples at this location:

TechNet Link

Taking these examples, the critical table to follow to get the Detection Model working correctly (and improving your understanding of how your script works in terms of ‘error code’, ‘stdout’ and ‘stderror’) is the following table, kindly reproduced from Microsoft from the TechNet Link above:

Script exit code Data read from STDOUT Data read from STDERR Script result Application detection state
0 Empty Empty Success Not installed
0 Empty Not empty Failure Unknown
0 Not empty Empty Success Installed
0 Not empty Not empty Success Installed
Non-zero value Empty Empty Failure Unknown
Non-zero value Empty Not empty Failure Unknown
Non-zero value Not empty Empty Failure Unknown
Non-zero value Not empty Not empty Failure Unknown

This table tells us that the key to achieving an Application delivery ‘success’ or ‘failure’ using our PowerShell Detection script boils down to achieving either of the rows highlighted in red – any other result (i.e. “Unknown” for the “Application Detection State”) will simply just result in the application not delivering to the client.

The critical part of any Detection Model script is to ensure an error code of ‘0’ is always the result, regardless if the application is installed successfully or has failed. The next critical step is the Powershell object equivalent of populating the ‘stdout’ object. Other script authors may choose to test the ‘stderror’ object as well in their scripts, but I found it unnecessary and preferred to ‘keep it simple’.

After ensuring my script achieved an exit code of ‘0’, I then concentrated on my script either populating the ‘stdout’ object or not populating the ‘stdout’ object – I essentially ignored the ‘stderror’ object completely and ensured my script ran ‘error free’. At all times, for example, I used ‘test-path’ to first test to see a file or folder exists before then attempting to grab its metadata properties. If I didn’t use ‘test-path’, then the script would error if a file or folder was not found and then it would end up in an “unknown” detection state.

I therefore solely concentrated on my script achieving only the highlighted rows (in red) of the table above.

Microsoft provides example of VBScript code to populate the ‘stdout’ (and ‘stderror’) objects and can be found in the TechNet link above – however my method involves just piping a single PowerShell ‘write-host’ command if the Detection Script determines the application has been delivered successfully.  This satisfies populating the ‘stdout’ object and therefore achieving Detection success.

Limitations in using Scripts for Detection

There were two issues in getting a Detection Method working properly: an issue related to the way SCCM delivers files to the local client (specifically upgrades) and an issue with the way Office template files are used.

One of the issues we have is that Word and Excel typically changes a template file (however small the change!) when either application is loaded, by changing either its ‘Date Modified’ timestamp or modifying the file length in bytes of the file (or both). Therefore, using a detection method that determines whether a file has been delivered successfully to the workstation should avoid using a file’s length in byes or its modified timestamp.

The other issue we found is that SCCM has a habit of changing the ‘Date Modified’ timestamp of all files it delivers when it detects an ‘upgrade’ of the source files for that application. It typically does not touch the timestamp of the source files if it delivers a brand new install to a client that has never received the software, however if a single file in the source folder is changed for that application, then SCCM tries to use a previous version of the application in the cache (C:\windows\ccmcache) and only downloads the new file that has change. This results in all files having their ‘Data Modified’ timestamp changing (except for the brand new file). Therefore determining if that application has delivered successfully using ‘Date Modified’ timestamps is not recommended. The key to seeing this process in action is looking at the file properties in the C:\windows\ccmcache\<sccm code> folder for that application, particularly before and after a file is updated in the original source SCCM application folder.

Ultimately, for Kloud’s customer, we used a file’s Metadata to determine the file version and whether the application has been delivered successfully or not. In this example, we used the ‘Company’ metadata field of the Word and Excel template file (found under a file’s ‘Properties’):

Metadata1

I used this Scripting Guy’s TechNet Blog to form the basis of retrieving a file’s metadata using a PowerShell function, and then using that information pulled from the file to determine a good attribute to scan for, in terms of file version control.

One of the limitations I found was that this function (through no fault of its author: Ed Wilson!) does not return ‘Version number’, so we used the ‘Company’ field instead. If someone has worked out a different PowerShell method to retrieve that ‘Version number’ metadata attribute, then feel free to tell me in the comments section below!

The next step in getting this PowerShell script to work correctly, is ensuring that only ‘error code = 0’ is returned when this script is executed.  Any other error code will result in breaking the delivery of that application to the client. The next step is then only ensuring that a ‘write-host’ is executed if it detects that all detected files are installed – in this example, only 10 files that are 100% detected in my array ‘Path’ will result in a ‘write-host’ being sent to the SCCM client and therefore telling SCCM that client has been successfully delivered. If I were to copy that Powershell script locally, run that script and not detect all files on that machine, then that script will not display anything to that Powershell window. This tells the SCCM client that the delivery has failed.  If that script ran locally and only a single ‘write-host’ of ‘all files accounted for!’ was shown to the screen, this tells me the Detection is working.

The sample code for our Detection Method can be found below (all filenames and paths have been changed from my customer’s script for example purposes):


# Authors: Michael Pearn & Ed Wilson [MSFT]
Function Get-FileMetaData
{
  <#
   .Synopsis
    This function gets file metadata and returns it as a custom PS Object
 #Requires -Version 2.0
 #>
 Param([string[]]$folder)
 foreach($sFolder in $folder)
  {
   $a = 0
   $objShell = New-Object -ComObject Shell.Application
   $objFolder = $objShell.namespace($sFolder) 

   foreach ($File in $objFolder.items())
    {
     $FileMetaData = New-Object PSOBJECT
      for ($a ; $a  -le 266; $a++)
       {
         if($objFolder.getDetailsOf($File, $a))
           {
             $hash += @{$($objFolder.getDetailsOf($objFolder.items, $a))  =
                   $($objFolder.getDetailsOf($File, $a)) }
            $FileMetaData | Add-Member $hash
            $hash.clear()
           } #end if
       } #end for
     $a=0
     $FileMetaData
    } #end foreach $file
  } #end foreach $sfolder
} #end Get-FileMetaData

$TemplateVersions = "5.0.2"

$wordStandards = "C:\Program Files (x86)\Customer\Customer Word Standards"
$wordTemplates = "C:\Program Files (x86)\Microsoft Office\Templates"
$wordTheme = "C:\Program Files (x86)\Microsoft Office\Document Themes 14\Theme Colors"
$excelAddins = "C:\Program Files (x86)\Customer\Customer Excel Addins"
$xlRibbon = "C:\Program Files (x86)\Microsoft Office\Office14\ADDINS"
$PPTribbon = "C:\Program Files (x86)\Customer\PowerPoint Templates"
$PPTtemplates = "C:\Program Files (x86)\Microsoft Office\Templates\Customer"

$strFile1 = "Bridge Template.xlsm"
$strFile2 = "Excel Ribbon.xlam"
$strFile3 = "NormalEmail.dotm"
$strFile4 = "PPT Ribbon.ppam"
$strFile5 = "Client Pitch.potx"
$strFile6 = "Client Presentation.potx"
$strFile7 = "Client Report.potx"
$strFile8 = "Blank.potx"
$strFile9 = "Blocks.dotx"
$strFile10 = "Normal.dotm"

$Path = @()
$Collection = @()

$Path += "$excelAddins\$strfile1"
$Path += "$xlRibbon\$strfile2"
$Path += "$PPTribbon\$strfile3"
$Path += "$PPTtemplates\$strfile4"
$Path += "$PPTtemplates\$strfile5"
$Path += "$PPTtemplates\$strfile6"
$Path += "$wordStandards\$strfile7"
$Path += "$excelAddins\$strfile8"
$Path += "$xlRibbon\$strfile9"
$Path += "$PPTribbon\$strfile10"

if (Test-Path $wordStandards) {
$fileMD = Get-FileMetaData -folder $wordStandards
$collection += $fileMD | select path, company
}
if (Test-Path $wordTemplates) {
$fileMD = Get-FileMetaData -folder $wordTemplates
$collection += $fileMD | select path, company
}
if (Test-Path $wordTheme) {
$fileMD = Get-FileMetaData -folder $wordTheme
$collection += $fileMD | select path, company
}
if (Test-Path $excelAddins) {
$fileMD = Get-FileMetaData -folder $excelAddins
$collection += $fileMD | select path, company
}
if (Test-Path $xlRibbon) {
$fileMD = Get-FileMetaData -folder $xlRibbon
$collection += $fileMD | select path, company
}
if (Test-Path $PPTribbon) {
$fileMD = Get-FileMetaData -folder $PPTribbon
$collection += $fileMD | select path, company
}
if (Test-Path $PPTtemplates) {
$fileMD = Get-FileMetaData -folder $PPTtemplates
$collection += $fileMD | select path, company
}
$OKCounter = 0
for ($i=0; $i -lt $Path.length; $i++) {
     foreach ($obj in $collection) {
     If ($Path[$i] -eq $obj.path -and $obj.company -eq $TemplateVersions) {$OKCounter++}
     }
}
if ($OKCounter -eq $path.length) {
write-host "all files accounted for!"
}


I then posted this code into the Detection Model of the application resulting in something similar to the following image:

DetectionModel

If the application has delivered successfully (and the script results in ‘Exit Code = 0’ and a ‘write-host = “all files accounted for!”‘ piping to the ‘Stdout’ object, then the following entry (critical values highlighted in red text below) should appear in the local SCCM client log: C:\Windows\CCM\Logs\AppEnforce.log:


<![LOG[    Looking for exit code 0 in exit codes table…]LOG]!><time=”12:29:13.852-600″ date=”08-08-2014″ component=”AppEnforce” context=”” type=”1″ thread=”2144″ file=”appexcnlib.cpp:505″>
<![LOG[    Matched exit code 0 to a Success entry in exit codes table.]LOG]!><time=”12:29:13.853-600″ date=”08-08-2014″ component=”AppEnforce” context=”” type=”1″ thread=”2144″ file=”appexcnlib.cpp:584″>
<![LOG[    Performing detection of app deployment type User Install – Prod – Office 2010 Templates 5.0.2(ScopeId_92919E2B-F457-4BBD-82FF-0765C1E1E696/DeploymentType_0f69fa14-549d-4397-8a0b-004f0d0e85e7, revision 4) for user.]LOG]!><time=”12:29:13.861-600″ date=”08-08-2014″ component=”AppEnforce” context=”” type=”1″ thread=”2144″ file=”appprovider.cpp:2079″>
<![LOG[+++ Discovered application [AppDT Id: ScopeId_92919E2B-F457-4BBD-82FF-0765C1E1E696/DeploymentType_0f69fa14-549d-4397-8a0b-004f0d0e85e7, Revision: 4]]LOG]!><time=”12:29:16.977-600″ type=”1″ date=”08-08-2014″ file=”scripthandler.cpp:491″ thread=”2144″ context=”” component=”AppEnforce”>
<![LOG[++++++ App enforcement completed (10 seconds) for App DT “User Install – Prod – Office 2010 Templates 5.0.2″ [ScopeId_92919E2B-F457-4BBD-82FF-0765C1E1E696/DeploymentType_0f69fa14-549d-4397-8a0b-004f0d0e85e7], Revision: 4, User SID: S-1-5-21-1938088289-184369731-1547471778-5113] ++++++]LOG]!><time=”12:29:16.977-600″ date=”08-08-2014″ component=”AppEnforce” context=”” type=”1″ thread=”2144″ file=”appprovider.cpp:2366″>


We should also see a status of ‘Installed’ in the ‘Software Center’ application (part of the SCCM client):

SoftwareCenter

Hope this helps with using SCCM application and Detection Method scripting! Any questions, please comment on my post below and I’ll endeavour to get back to you.