Azure AD Identity and Access Management & Features

I’ve been using Azure AD Identity for quite a while now. I thought it would be good to share the summary of Azure AD Identity features and gather some feedbacks.

Azure AD Identity

Azure Active Directory: A comprehensive identity and access management cloud solution for your employees, partners, and customers. It combines directory services, advanced identity governance, application access management, and a rich standards-based platform for developers.

Identity and access management License option: Azure Active Directory Premium P2 (E5), P1 (E3)

“Identity as the Foundation of Enterprise Mobility”

Identity and access management

Protect at the front door: innovative and advanced risk-based conditional accesses, protect your data against user mistakes, detect attacks before they cause damage

Identity and access management in the cloud:

  • 1000s of apps, 1 identity: Provide one persona to the workforce for SSO to 1000s of cloud and on-premises apps.
  • Enable business without borders: Stay productive with universal access to every app and collaboration capability.
  • Manage access at Scale: Manage identities and access at scale in the cloud and on-premises, advanced user lifecycle management and advanced identity monitor tools
  • Cloud-powered protection: Ensure user and admin accountability with better security and governance

Azure AD portal:

Configure users & groups, Configured SaaS applications identity, configure on-prem applications with Application proxy, license management, password reset, password reset notifications, password reset authentication methods, company branding, whether users can register/consent applications, whether users can invite external contacts, whether guest can invite external contacts, whether users can register devices with Azure AD, whether require MFA, Define whether use pass-through authentication or federation authentication.

Azure AD application integration:

3 types of applications integration:

  • LOB applications: using Azure AD for authentication
  • SaaS applications: configure SSO
  • Azure AD Application proxy: we can publish on-prem applications to internet through Azure AD application proxy.

Inbound/outbound user provisioning to SaaS apps

User Experience with Integrated apps: Access Panel https://myapps.microsoft.com. Custom Branding? Load by appending your organization’s domain https://myapps.microsoft.com/company_domain_name. From Myapps, users can: change PW, Edit PW reset, MFA, view account details, view launch apps, self-management groups. Admins can configure apps to be self-service -users add apps by themselves.

Authentication (Front End & Back End) & Reporting (reporting access & alerts, reporting API, MFA)

Front End Authentication 

Back End Authentication 

Pass-thru authentication:

  • Traffic to the backend app NOT authenticated in Azure AD
  • Useful for NDES, CRLs, etc
  • Still has benefits of not exposing backend apps to http based attacks

Pass-thru authentication:

  • Does not try and authenticate to the backend
  • Useful with forms based applications
  • Auth headers returned to client
  • Can be used with front-end pre-authentication

Pre-Authentication

  • Users must authenticate to AAD to access backend app
  • Allows ability to plug into AAD control plane
  • Can also be extended to provide true SSO to the backend app

Kerberos/IWA

  • Must use pre-authentication on front end
  • Allows for an SSO experience from AAD to the app
  • Support for SPNego (i.e. non AD Kerberos)

 

Azure AD Connect health

Monitor & Report on ADFS, AAD Sync, ADDS. Advanced logs for configuration troubleshooting.

Azure Identity protection (Azure AD premium P2)

  • AIP dashboard is a consolidated view to examine suspicious user activities and configuration vulnerabilities
  • Remediation recommendations
  • Risk Severity calculation
  • Risk-based policies for protection for future threats

If user is at risk, either we can block users or we can trigger MFA automatically

AIP can help to identify spoof attack happening or leak credentials, suspicious sign in activities. infected devices, configurations vulnerabilities, for example, when a user signed in from unfamiliar location, then we can trigger to reset his/her password or we can use user risk condition to allow user access to corporate resources with password change or block access straight away. Alternatively, we can configure the alert to send an approval request to admin.

Identity protection risk types and reports generated:

Azure AD privileged Identity Management

For examples, I am on leave for 2 days and I want my colleagues to become global admin for only two days. if I come back from leave and forget to remove the global admin permissions from that colleagues, he will still be global admin, this will be put company at risk, because potentially either global admin password can be compromised.

Just in time administrative access, we can use this to give only has 2 days “global admin” access

Securing Privileged access: just in Time administration

  • Assume breach of existing AD forests may have occurred
  • Provide privileged access through a workflow
  • Access is limited in time and audited
  • Administrative account not used when reading mail/etc.

Result = limited in time & capability

 

 

 

Resolving unable to access App published with Barracuda WAF over Azure Express Route

Recently, one of the customers reported they can’t access to all UAT apps from their Melbourne office, but it worked fine for other offices. When they tried to access the UAT app domains, they were getting below errors: “The request service is temporarily unavailable. It is either overloaded or under maintenance. Please try later.”

WAF error

Due to the UAT environment IP restrictions on the WAF, it is normal behaviour for me to get the error messages due to the fact our Kloud office’s public IPs are not in the WAFs’ whitelist. This error approved the web traffic did hit the WAFs. Ping the URL hostname, it returned the correct IP without DNS problems, this means that the web traffic did go to the correct WAF farm considering the customer has a couple of other WAF farms in other countries. So we can focus on the AU WAFs now for the troubleshooting.

I pulled out all the WAFs access logs and planned to go through those to verify if the web traffic was hitting on the AU WAFs or went to somewhere else. I did a log search based on the public IPs which were provided by customer, no results returned for the last 7 days.Search Result 1

interesting. did it mean no traffic from Melbourne office came in? I did another search based on my public IPs, it clearly returned a couple of access logs related with my testing, correct time, correct client IP, correct WAF domain hostname, method is GET, Status is 503 which is correct because my office IP is restricted.

Search Result 2

Since customer mentioned all other offices had no problem to access the UAT app environment, I asked them to provide me with one public IP from another office, we tested it again and verified people in Indian office can successfully open the web app and I can see their web traffic appear in the WAF logs as well. I believed when Melbourne staff tried to browse the web app, the traffic should go to the same WAF farm because the DNS hostname was resolved to the same IP no matter whether in Melbourne or in India.

The question is what exactly happened and what was the root cause? :/

In order to capture another good example, I noted down the time and asked the customer to browse the website again. This time I did an access log search based on the time instead of Melbourne public IPs. I got a couple of results returned with some unknown IPs.

Search result 3

I googled the unknown IPs, it turned out they are Microsoft Australian data centre IPs. Now I kind of felt there are some routing or NAT issues in the customer network. I contacted the customer and provided the unknown IPs, customer did a bit of investigations on this and advised that those unknown IPs are the public IPs for their Azure Express Route interfaces. It makes sense now. Because customer didn’t whitelist their new Azure public IPs, so when web traffic came from the unknown source IPs (Azure Public IPs), WAF doesn’t know them and they were all being blocked as well, just like me. Once I added the new Azure IPs into the app whitelist IPs, all the access issues were resolved.

How I prepared for the 70-533 Azure Exam

I have been IT professional for over 7 years now. During this time, I have seen and experienced many critical changes in the IT Infrastructure field. Personally, I started as a network engineer at a software company and then moved to a MSP as infrastructure engineer and looked after servers, firewalls, network, application deploy, etc. for medium and large finance institutions before I join Kloud Solutions and started to evolve by learning the Microsoft Azure. Obviously, Cloud technology is the most significant shift that the IT industry is experiencing today. As a Microsoft guy, it makes sense for me to start utilizing Azure as a platform to provide solutions to our customers.

Back in August, I had a chat with Gary Needham who is one of the Azure guys at work and figured it was time to develop my Azure skills via exam 70-533 Implementing Azure Infrastructure Solutions. As an ops guy with strong hand-on experience, I decided to learn best by doing lab one by one. With a full-time job, I kind of had limited time to study so I wanted to make sure that no matter what labs I was doing, they must be against the exam objectives. As such, I wrote down my goal- obtain Azure certification by the end of October. I had some experience with Azure environment previously by doing Azure Express Route, Azure Backup & Azure Site Recovery, so I was confident that I can achieve that within 3 months’ time.

After quickly getting my subscription ready, I went into the Azure ARM portal and started deploying Azure Resource Group, storage and Virtual networks, subnets and so on. I was getting really excited over the Azure technology. There were lots of fun and this enjoyment made this journey to become an outstanding learning experience.

Azure GitHub quick-start template is a fantastic place to learn Azure template. In the modern IT world, we standardized infrastructure deployment by using JSON template (Infrastructure as code), this can bring us enormous benefits. For me, this came somewhat naturally since I actually started IT as a coder in school. It costs me some trials and errors, I eventually figured out how to use Azure template to deploy Azure ARM resources. It took me a couple of weeks in the lab sessions to deploy Azure resources via template with either powershell or Azure CLI. I felt this is awesome.

Over the next few weeks, I continued to expand my lab environments bit by bit. I like to use Microsoft online documentations. It’s the latest and provide best practice and recommendations during the deployments and configurations. DSC is very a good example, Microsoft provides very good information about the DSC templates and how to build a DSC template, how to use it. Also, how to manage the version control of large DSC environment.

As I went through the exam objects more and more, I tried to link the labs which I did with the simple exam objects in my mind, through this I could remind myself which exam objects I needed to go back to review, this eventually paid off when I took the exam because it helped me to consolidate all the knowledge I quickly learnt.

The online course I was using is Udemy’s 70-533 videos, I am using this course to test myself more from different angles to see if I mastered all the knowledge which are required in the exam.

The practice exam I was using is Microsoft Official Practice exam 70-533. I believe my exam-prep is a bit more in depth than most. I didn’t try to memorize and exam questions and focus on the answers. I never rely on memorization. For some questions I got it wrong, then I went back to the lab and went back to the documents and tried to understand the whole work flow around that particular exam subject. I am a heavy Googler and I like to google the extended questions to know more.

To prepare the exam, I went through the entire collection of practice questions twice to test myself to see if I am ready for the exam. In the first practice exam, I had lots of uncertain questions and it gave me some objective feedbacks, like what is the Azure technology limitations compared with on-prem. Something may be storage limitations, replication limitations, Geo-location limitations, the number of storage accounts limitations per subscriptions, also storage speed limitations. All these knowledges will be tested against designing the Azure solutions. I followed the explanation links and researching any questions I answered incorrectly. At the end of second try, I got 9 questions still incorrect. I noted all the wrong questions and studied hard on those. After I felt comfortable on all the questions, I booked in my exam on Wednesday morning.

On the exam day, I got up at 6:30AM and had a decent breakfast and rode train to the test centre. It took me about an hour to finish all the questions. Eventually I passed the exam 70-533 Implementating Microsoft Azure Infrastructure Solutions.

 

Trust the process:

Read Microsoft Azure Online Documentations and Practice & Implement them in the labs

 

 

Hopefully this blog can help people who is planning to take Azure exam.

Polycom VVX 310 – Unable to do blind transfer internally

I’ve been working with one SFB customer recently. I met some unique issues and I would like to share the experience of what I did to solve the problem.

Issue Description:

Customers were experiencing Polycom handsets unable to transfer external calls to a particular internal 4 – digit number range xxxx. All the agent phones are VVX 310 and agents sign in via extension & pin. When the call transfer failed, what the callers heard is the placid recorded female voice: “we’re sorry, your call cannot be completed as dialled. Please check the number and dial again”. Interesting thing is the call transfer failed scenarios only happen to blind transfers while the supervised transfers worked perfectly. Polycom handsets can successfully make direct calls and receiving calls. Well, this kind of doesn’t make any sense to me.

Investigations:

Firstly, I went through all the SFB dial plans and Gateway routing and transformation rules. The number range was correctly configured and nothing is different from other range.

I upgraded the firmware to the latest V5.5 SFB-enabled version on one of the Polycom handsets. It didn’t make any improvement. The result is still the same.

I was thinking about the Digimap settings on the configuration file which may cause this issue, so I logged into the web interface -> settings -> sip -> Digitmap, removed the regex in the Digitmap field, and rebooted the phone. Still when doing blind transfer to internal number range xxxx, no luck. It failed again. :/

Interesting thing happened when I tested by using the SFB client and log in as two users within the number range xxxx and did the same blind transfer, It worked! When I using the SFB client transfer to the Polycom handset, it also worked. But it stopped working when I did transfer from the Polycom handset.

Since I can hear the Telco’s voice, I thought it would be good to do a tracing from the Sonus end to see why the transfer failed first. From the live trace, I can see the invite number is not what I expected. Something went wrong when the number normalizations happened. The extension was given the wrong prefix. Where the wrong prefix come from?

I logged in the SFB control panel to re-check the voice routing. It shows me nothing is wrong with the user dial plan and user normalization rules. The control panel testing tool gave different prefix result compared with prefix in the live tracing. Where could possibly go wrong?? Ext xxx1 is mapped with SFB user 1, when I log in my SFB client as user 1 and everything works, but when I log in the Polycom phone as Ext xxx1, the blind transfer failed when I transfer to problematic number range xxxx.

All of the sudden, I noticed global dial plan has the strange prefix configured there which was matching the prefix (+613868) pending in the live trace. So I believed, for some reasons Polycom handsets are using the global dial plan when it doing the blind transfer, this may be a bug . The handsets are using global dial plan during the blind transfer while the SFB client is using user profile dial plan. This approved the behaviour difference between the handsets and the desktop clients.

Solution Summary:

After I created a new entry for the number range xxxx in the global dial plan. Polycom phone rebooted and started working again. The result looks all correct. Verified the issue resolved. 

 

 

Hopefully it can help someone else who have similar issues.

 

 

Resolving presence not up-to-date & unable to dial-in the conferences via PSTN issues Lync 2013

Recently I’ve been working with one SFB customer recently. I met some unique issue and I would like to share the experience of what I did to solve the problem

Issue Description: After SQL patching on Lync servers, all users’ presence was not up-to-date and people are unable to dial in to the scheduled conference.

Investigation:

when I used Lync shell moving testing user from SBA pool A to pool B on one FE server, but I checked the user pool info on the SBA pool A, the result still showed the testing user is under pool A. This indicates either the FE Lync databases are not syncing with each other properly or there are database corruptions.

I checked all the Lync FE servers, all the Lync services are running. all look good. I re-tested the conference scenarios, the PSTN conference bridge number is unavailable while people can still make incoming/outgoing calls.

I decided to go back to check the logs on all the Lync FE servers, I noticed on one of the Lync FE servers, I got “Warning: Revocation status unknown. Cannot contact the revocation server specified in certificate”, weird, does this mean there was something wrong with the cert on this FE server? No way, I didn’t see this error on the other FE server, both FE servers are supposed to use the same certs, this means it’s not the cert issue. It is something wrong with the FE server.

Next, I tried to turn off all the Lync services on the problematic FE server to see if it made any difference. Interesting thing happened, once I did that, all users’ presence became updated and also the PSTN conference bridge number became available. I could dial in from my mobile after that. it verified it was server issue.

Root Cause:

What caused the FE server having the cert error? Which cert was used on this FE server? I manually relaunched the deployment wizard, wanted to compare the certs between the 2 FE servers. Then I noticed that the Lync server configurations are not up-to-date from the database store level. This was a surprise to me because there was no change on the topology, so I never thought about re-run the deployment wizard after FE SQL patching. On the other FE server which was working as expected, I can see all the green checks on each step of the deployment wizard. Bingo, I believed all the inconsistent issues from users end were related with the inconsistent SQL databases across all the two FE ends.

Solution:

Eventually, after the change request approved by the CAB, re-run the deployment wizard to sync the SQL store and also re-assign the certs to Lync services resolved the issue.

Hopefully it can help someone else who have similar issues.

Resolving presence not up-to-date & unable to dial-in the conferences via PSTN issues Lync 2013

Recently I’ve been working with one SFB customer recently. I met some unique issue and I would like to share the experience of what I did to solve the problem

Issue Description: After SQL patching on Lync servers, all users’ presence was not up-to-date and people are unable to dial in to the scheduled conference.

Investigation:

when I used Lync shell moving testing user from SBA pool A to pool B on one FE server, but I checked the user pool info on the SBA pool A, the result still showed the testing user is under pool A. This indicates either the FE Lync databases are not syncing with each other properly or there are database corruptions.

I checked all the Lync FE servers, all the Lync services are running. all look good. I re-tested the conference scenarios, the PSTN conference bridge number is unavailable while people can still make incoming/outgoing calls.

I decided to go back to check the logs on all the Lync FE servers, I noticed on one of the Lync FE servers, I got “Warning: Revocation status unknown. Cannot contact the revocation server specified in certificate”, weird, does this mean there was something wrong with the cert on this FE server? No way, I didn’t see this error on the other FE server, both FE servers are supposed to use the same certs, this means it’s not the cert issue. It is something wrong with the FE server.

Next, I decided to turn off all the Lync services on the problematic FE server to see if it made any difference. Interesting thing happened, once I did that, all users’ presence became updated and also the PSTN conference bridge number became available. I could dial in from my mobile after that. it verified it was server issue.

Root Cause:

What caused the FE server having the cert error? Which cert was used on this FE server? I manually relaunched the deployment wizard, wanted to compare the certs between the 2 FE servers. Then I noticed that the Lync server configurations are not up-to-date from the database store level. This was a surprise to me because there was no change on the topology, so I never thought about re-run the deployment wizard after FE SQL patching. On the other FE server which was working as expected, I can see all the green checks on each step of the deployment wizard. Bingo, I believed all the inconsistent issues from users end were related with the inconsistent SQL databases across all the two FE ends.

Solution:

Eventually, after the change request approved by the CAB, re-run the deployment wizard to sync the SQL store and also re-assign the certs to Lync services resolved the issue.

Hopefully it can help someone else who have similar issues.

Configure SBC to forward calls out from the original SG where incoming call comes in

Issue Description:

Recently I did a project of adding additional Telstra SIP trunks into Sonus production environment. The customer Sonus environment has primary SIP trunks with another SIP provider. They are going to use the new Telstra SIP trunks set up for new established small office. After the new trunk set up. I had one issue: when the calls were forwarded from SFB clients to users’ mobile phones, the A party number didn’t Pass through, instead, the number shown on mobile phones is the pilot number of the primary SIP trunk. The feature of A Party Number Pass-through was not supported by the primary SIP trunk provider, but on the new Telstra SIP trunk, this feature is definitely supported as I know. Now my question is: how to configure SBC to forward calls out back to Telstra Signalling group where the incoming calls comes in?

 

Investigation:

I did a testing call (A Party Rang B Party, B Party set to forward the call externally to C Party) and captured the logs for the whole call forwarding scenario, I can see the forwarding part of the call has a SIP “invite” message sent from the mediation server. in the SIP header, it contains all the numbers of Party A, B, C. Screenshot as below:

I can see the HISTORY-INFO data field contains “B Party Number” during the call forwarding. what I was thinking to do is to create a transformation rule to compare on the History-Info data field value, if the value contains the Telstra SIP trunk number range, the call should be routed out via the Telstra SIP Trunk.

Before creating new rule, I wanted to verify the A Party Number Pass-through was working. I created an optional rule to match calling address/number with my mobile number (A Party). When I called the Telstra number range, calls coming in from Telstra trunk and went out ringing another mobile (C Party) via the same trunk, A party number displayed as Caller number. It’s all good.

I set up the message manipulation rule for invite message based on below Sonus Doc (the first half of the doc) https://support.sonus.net/display/UXDOC61/Using+HISTORY-INFO+to+set+the+FROM+Number.

After that I tried to put an optional transformation rule matching the Telstra outgoing calls out, it didn’t work and still the call went out from the primary trunk. Because this rule can’t be mandatory under the current SFB to Telstra routing table. It will disconnect the normal outgoing calls on the Telstra trunk.

Next, I created a mandatory rule to compare the history-info value and bind this rule with Telstra SG, re-test the call forwarding scenario, still the call went out from the primary trunk. :/

When I moved to the Sonus local system log, I couldn’t see any logs contains the “SG User” variable. This made me realise that the inbound manipulation rule assigned to Telstra SG was totally wrong, because the invite is from SFB server, so I correct this setting. It started to work as expected.

Solution Summary:

  1. Create a message manipulation rule “Collect History-Info” on the “invite” message to collect “Collect History-Info” to “SG User Value 1”: Applicable Messages – Selected Messages, Message Selection – Invite, Table Result Type – Mandatory, refer to screenshot below:

  2. Assign it to the Inbound Message Manipulation of SFB SG
  3. Create a transformation rule to compare “SG User Value 1” with the Telstra SIP trunk number range:

  4. Create a NEW route match this transformation rule.

 

After this, retest the call forwarding scenario, both inbound part and the forwarding part of the call are routed through Telstra SIP Trunk. The result looks all correct. Verified the issue resolved. 😊

Resolving Skype for business 2015 Backup service “ErrorState” issue

I’ve been working with one SFB customer recently. I met some unique issue and I would like to share the experience of what I did to solve the problem.

Issue Description:

When I went through the Lync Event logs, I noticed the SFB FE servers are having lots of LS Backup Service with Event 4052, Event 4098 and Event 4071. Error logs are saying 

“Skype for business Server 2015, backup service users store backup module has backup data that never gets imported by backup pool.Backup data “file:\filestore\2-backupservice-1\backupstore\userservice\PresenceFocus\Data\Backup.zip

Cause: Import issue in the backup pool. Please check event log of Skype for business Server 2015, Backup service in the backup pool for more information.

Resolution:

Fix import issue in the backup pool”

 After I read these errors, I did a health check by running “get-csbackupservicestatus -poolfqdn primarypoolname” The result showed: OverallExportStatus: ErrorState, OverallImportStatus: NormalState.

By running the same cmd on the backup pool “get-csbackupservicestatus -poolfqdn backuppoolname”, the result showed: overallExportStatus: ErrorSate, OverallImportStatus: ErrorState.

I Checked the filestore folder permissions settings, it looked all correct, everyone is given access to the folder with read & write permissions. So this issue was not related with folder permission settings. This made sense because the backup services were running all good prior to certain time point. 

Then I did a bit of googling: people say to solve the backup service problem by recreating the backup folder. I tried to stop SFB Backup Service, File Transfer Agent Service, Master Replicator Agent Service on the FE servers across both primary pool and DR pool. Deleted the folder structure within the backup service folders. After this, I restarted all the stopped services above. After a few seconds, the new backup folder structures were recreated again. I run “Invoke-csbackupservicesync -poolfqdn primarypoolname” also “Invoke-csbackupservicesync -poolfqdn backuppoolname” Everything looked just fine. But when I run “get-csbackupservicestatus -poolfqdn poolname” on both pools, I get the same error results as previous.

To me, this is not good news. I was sure that something changed from the environment background. I started to do basic troubleshooting again from the primary site, I browse to the backup folder on the primary site servers recheck the folder permissions and everything looked good. I tried to browse to the DR folder at the primary site. It looked successful, nothing wrong. :/

Root Cause:

When I moved to the DR servers and tried to browse to primary site backup folder via the same directory, interesting thing happened, obviously, the filestore with the same directory path name on the DR servers was totally different from the filestore I browsed on the Primary servers. I did further ping test and verified that filestore host name was resolved differently at primary site and DR site. This meant the filestore of primary site and DR site can’t talk with each other, so that’s the root cause of the backup service having error status.

What exactly changed?

I spoke with customer IT team and they advised that originally both primary filestore and DR filestore were located on one DFS host. A couple of weeks ago, the IT team made some changes on the DFS farm, at the end, SFB FE servers at primary site resolved the filestore name against the Primary site DFS host, however, the SFB FE servers at DR site resolved the filestore name against the DR site DFS host, which is a totally different host. This breaks the configuration sync and caused backup service failed.

Solutions:

Reconfigured the DFS farm, all the SFB FE servers across both sites resolved the filestore name against the primary site DFS host. After that, restarted the backup services, everything started working again.

 Run health check again “get-csbackupservicestatus -poolfqdn primarypoolname”, the overallexportstatus: FinaState, the overallimportstatus:NormalState. Run health check for DR site, the result looks all correct. Verified the issue resolved.

So posting this as I couldn’t find any reference to this particular environment related SFB backup service error issue elsewhere. Hopefully it can help someone else too.