I’ve been working with one SFB customer recently. I met some unique issue and I would like to share the experience of what I did to solve the problem.
When I went through the Lync Event logs, I noticed the SFB FE servers are having lots of LS Backup Service with Event 4052, Event 4098 and Event 4071. Error logs are saying
“Skype for business Server 2015, backup service users store backup module has backup data that never gets imported by backup pool.Backup data “file:\filestore\2-backupservice-1\backupstore\userservice\PresenceFocus\Data\Backup.zip”
Cause: Import issue in the backup pool. Please check event log of Skype for business Server 2015, Backup service in the backup pool for more information.
Fix import issue in the backup pool”
After I read these errors, I did a health check by running “get-csbackupservicestatus -poolfqdn primarypoolname” The result showed: OverallExportStatus: ErrorState, OverallImportStatus: NormalState.
By running the same cmd on the backup pool “get-csbackupservicestatus -poolfqdn backuppoolname”, the result showed: overallExportStatus: ErrorSate, OverallImportStatus: ErrorState.
I Checked the filestore folder permissions settings, it looked all correct, everyone is given access to the folder with read & write permissions. So this issue was not related with folder permission settings. This made sense because the backup services were running all good prior to certain time point.
Then I did a bit of googling: people say to solve the backup service problem by recreating the backup folder. I tried to stop SFB Backup Service, File Transfer Agent Service, Master Replicator Agent Service on the FE servers across both primary pool and DR pool. Deleted the folder structure within the backup service folders. After this, I restarted all the stopped services above. After a few seconds, the new backup folder structures were recreated again. I run “Invoke-csbackupservicesync -poolfqdn primarypoolname” also “Invoke-csbackupservicesync -poolfqdn backuppoolname” Everything looked just fine. But when I run “get-csbackupservicestatus -poolfqdn poolname” on both pools, I get the same error results as previous.
To me, this is not good news. I was sure that something changed from the environment background. I started to do basic troubleshooting again from the primary site, I browse to the backup folder on the primary site servers recheck the folder permissions and everything looked good. I tried to browse to the DR folder at the primary site. It looked successful, nothing wrong.
When I moved to the DR servers and tried to browse to primary site backup folder via the same directory, interesting thing happened, obviously, the filestore with the same directory path name on the DR servers was totally different from the filestore I browsed on the Primary servers. I did further ping test and verified that filestore host name was resolved differently at primary site and DR site. This meant the filestore of primary site and DR site can’t talk with each other, so that’s the root cause of the backup service having error status.
What exactly changed?
I spoke with customer IT team and they advised that originally both primary filestore and DR filestore were located on one DFS host. A couple of weeks ago, the IT team made some changes on the DFS farm, at the end, SFB FE servers at primary site resolved the filestore name against the Primary site DFS host, however, the SFB FE servers at DR site resolved the filestore name against the DR site DFS host, which is a totally different host. This breaks the configuration sync and caused backup service failed.
Reconfigured the DFS farm, all the SFB FE servers across both sites resolved the filestore name against the primary site DFS host. After that, restarted the backup services, everything started working again.
Run health check again “get-csbackupservicestatus -poolfqdn primarypoolname”, the overallexportstatus: FinaState, the overallimportstatus:NormalState. Run health check for DR site, the result looks all correct. Verified the issue resolved.
So posting this as I couldn’t find any reference to this particular environment related SFB backup service error issue elsewhere. Hopefully it can help someone else too.