Recently I’ve been working with one SFB customer recently. I met some unique issue and I would like to share the experience of what I did to solve the problem
Issue Description: After SQL patching on Lync servers, all users’ presence was not up-to-date and people are unable to dial in to the scheduled conference.
Investigation:

when I used Lync shell moving testing user from SBA pool A to pool B on one FE server, but I checked the user pool info on the SBA pool A, the result still showed the testing user is under pool A. This indicates either the FE Lync databases are not syncing with each other properly or there are database corruptions.
I checked all the Lync FE servers, all the Lync services are running. all look good. I re-tested the conference scenarios, the PSTN conference bridge number is unavailable while people can still make incoming/outgoing calls.
I decided to go back to check the logs on all the Lync FE servers, I noticed on one of the Lync FE servers, I got “Warning: Revocation status unknown. Cannot contact the revocation server specified in certificate”, weird, does this mean there was something wrong with the cert on this FE server? No way, I didn’t see this error on the other FE server, both FE servers are supposed to use the same certs, this means it’s not the cert issue. It is something wrong with the FE server.
Next, I decided to turn off all the Lync services on the problematic FE server to see if it made any difference. Interesting thing happened, once I did that, all users’ presence became updated and also the PSTN conference bridge number became available. I could dial in from my mobile after that. it verified it was server issue.
Root Cause:

What caused the FE server having the cert error? Which cert was used on this FE server? I manually relaunched the deployment wizard, wanted to compare the certs between the 2 FE servers. Then I noticed that the Lync server configurations are not up-to-date from the database store level. This was a surprise to me because there was no change on the topology, so I never thought about re-run the deployment wizard after FE SQL patching. On the other FE server which was working as expected, I can see all the green checks on each step of the deployment wizard. Bingo, I believed all the inconsistent issues from users end were related with the inconsistent SQL databases across all the two FE ends.
Solution:

Eventually, after the change request approved by the CAB, re-run the deployment wizard to sync the SQL store and also re-assign the certs to Lync services resolved the issue.
Hopefully it can help someone else who have similar issues.

Category:
Uncategorized