A recent customer had a special request when I was designing and coding a new ECMA 2.2 based Management Agent (MA) or “Connector” for Microsoft Forefront Identity Manager (FIM).
(On a sidenote: FIM’s latest release is now Microsoft Identity Manager or “MIM”, but my customer hadn’t upgraded to the latest version).
Kloud previously were engaged to write a new ECMA based MA for Gallagher 7.5 (a door security card system) to facilitate the provisioning of access and removal of access tied to an HR system.
Whilst the majority of the ECMA was ‘export’ based, ie. FIM controlled most of the Gallagher data, one of the attributes we were importing back from this security card system was a person’s picture that was printed on these cards.
Requirements
It seems that in the early days of the Gallagher system (maybe before digital cameras were invented?), they used to upload a static logo (similar to a WiFi symbol) in place of a person’s face. It was only recently they changed their internal processes to upload the actual profile picture of someone rather than this logo.
The system has been upgraded a number of times, but the data migrated each time without anyone going back to update the existing people’s profile pictures.
This picture would then be physically printed on their security cards, which for people with their faces on their cards, they wanted to appear in Outlook and SharePoint.
The special request was that they wanted me to ‘filter out’ images that were just logos, and only import profile pictures into FIM from Gallagher (and then exported out of FIM into Active Directory and SharePoint).
There were many concerns with this request:
- We had limited budget and time, so removing the offending logos manually was going to be very costly and difficult (not to mention very tiring for that person across 10,000 identities!)
- Gallagher stores the picture in its database as a ‘byte’ value (rather than the picture filename used for the import). That value format is exposed as well across the Gallagher web service API for that picture attribute.
- Gallagher uses a ‘cropping system’ to ensure that only 240 x 160 pixel sized image is selected from the logo source file that was much larger. Moving the ‘crop window’ up, down, left or right would change the byte value stored in Gallagher (I know, I tested it almost 20 different combinations!)
- The logo file itself had multiple file versions, some of which had been cropped prior to uploading into Gallagher.
Coding
My colleague Boris pointed me to an open source Image Comparison DLL written by Jakob Krarup (which you can find here).
It’s called ‘XNA.FileComparison’ and it works superbly well. Basically this code allows you to use Histogram values embedded within a picture to compare two different pictures and then calculate a ‘Percentage Different’ value between the two.
One of the methods included in this code (PercentageDifference()) is an ability to compare two picture based objects in C# and return a ‘percentage difference’ value which you can use to determine if the picture is a logo or a human (by comparing each image imports into the Connector Space to a reference logo picture stored on the FIM server).
To implement it, I did the following:
- Downloaded the sample ‘XNA.FileComparison’ executable (.exe) and ran a basic comparison between some source images and the reference logo image, and looked at the percentage difference values that the PercentageDifference() method would be returning. This gave me an idea of how well the method was comparing the pictures.
- Downloaded the source Visual Studio solution (.SLN) file and re-compiled it for 64-bit systems (the compiled DLL version on the website only works on x86 architectures)
- Added the DLL as a Project reference to a newly created Management Agent Extension, whose source code you can find below
In my Management Agent code, I then used this PercentageDifference() method to compare each Connector Space image against a Reference image (located in the Extensions folder of my FIM Synchronization Service). The threshold value the method returned then determined whether to allow the image into the Metaverse (and if necessary copy it to the ‘Allowed’ folder) or block it from reaching the Metaverse (and if necessary copy it to the ‘Filtered’ folder).
I also exported each image’s respective threshold value to a file called “thresholds.txt” in each of the two different folders: ‘Allowed’ and ‘Filtered’.
Each of the options above were configurable in an XML file such as:
- Export folder locations for Allowed & Filtered pictures
- Threshold filter percentage
- A ‘do you want to export images?’ Boolean Export value (True/False), allowing you to turn off the image export on the Production FIM synchronization server once a suitable threshold value was found (e.g. 75%).
A sample XML that configures this option functionality can be seen below:
Testing and Results
To test the method, I would run a Full Import on the Gallagher MA to get all pictures values into the Connector Space. Then I would run multiple ‘Full Synchronizations’ on the MA to get both ‘filtered’ and ‘allowed’ pictures into the two folder locations (whose locations are specified in the XML).
After each ‘Full Synchronization’ we reviewed all threshold values (thresholds.txt) in each folder and used the ‘large icons’ view in Windows Explorer to ensure all people’s faces ended up in the ‘Allowed’ folder and all logo type images ended up in the ‘Filtered’ folder. I ensured I deleted all pictures and the thresholds.txt in each folder so I didn’t get confused the next run. If a profile picture ended up in the ‘filtered folder’ or a logo ended up in the ‘allowed folder’, I’d modify the threshold value in the XML and run another Full Synchronization attempt.
Generally, the percentage difference for most ‘Allowed’ images was around 90-95% (i.e. the person’s face value was 90-95% different than the reference logo image).
What was interesting was that some allowed images got down as low as only 75% (ie. 75% different compared to the logo), so we set our production threshold filter to be 70%. The reason some people’s picture was (percentage wise) “closer” to the logo, was due to some people’s profile pictures having a pure white background and the logo itself was mostly white in colour.
The highest ‘difference’ value for logo images was as high as 63% (the difference between a person’s logo image and the reference logo image was 63%, meaning it was a very “bad” logo image – usually heavily cropped showing more white space than usual).
So setting the filter threshold of 70% fit roughly halfway between 63% and 75%. This ended up in a 100% success rate across about 6000 images which isn’t too shabby.
If in the future, there were people’s faces that were less than 70% different from the logo (and not meet the threshold so were unexpectedly filtered out), the customer had the choice to update the Management Agent configuration XML to lower the threshold value below 70%, or use a different picture.
Some Notes re: Code
Here are some ‘quirks’ related to my environment which you’ll see in the MA Extension code:
- A small percentage of people in Gallagher customers did not have an Active Directory account (which I used for the image export filename), so in those cases I used a large random number if they didn’t to save the filename for the images (I was in a hurry!)
- I’m writing to a custom Gallagher Event Viewer ID name, which will save all the logs to that custom Application Event Viewer (in case you’re trying to find the logs in the generic ‘Application’ Event Viewer log)
- Hard coding of ‘thresholds.txt’ as a file name and the location of the Options XML (beware if you’re using a D:\ drive or other letter for the installation path of the Synchronization Service!)
Management Agent Extension Code