Thursday, June 15, 2023

A Botched Attempt to Update TFS Server 2018 but with A Good Ending

 When updating TFS Server 2018, I left the "install search"  check box checked, and it led to a disaster. 

The "elastic search" service that TFS Server 2018 tried to install went berserk, and ate up a lot of RAM. So much so that another VM on the same host that ran the SQL server slowed down to a crawl. Then the TFS Server 2018 update failed with a message saying "TF255356: The following error occurred when configuring the Team Foundation databases: TF400711: Error occurred while executing servicing step 'Install Local Contribution Web Platform On Prem' for component Publish Extensions during FinishInstallUpdates". The reason in the log file: SQL timeout.

Not knowing what was going on in the background, I stopped the update, restored the database, and re-tried several times. Each time I got the same error.

I had to try to complete the update on a different machine, with a different SQL server. Since this was no longer a minor update, I upgraded the TFS Server 2018 to Azure DevOps Server 2022. Everything worked. Then I tried to move the SQL databases to the old SQL server. As soon as I completed the move, the TFS server, oh no, the Azure DevOps Server 2022, slowed down to a crawl again. 

It seemed that I had to leave the TFS server, oh wait, the Azure DevOps Server 2022, on the new hosts. So I started to clean up the old hosts and tried to remove any traces of the old TFS Server installations. There were several TFS Server folders in "C:\Program Files". I tried to delete them all, since they no longer show up on the installed applications list of Windows. But the TFS Server 2018 folder could not be deleted because it was in use by some running process. 

I sorted the running processes by memory usage, and saw that at the top there was a running process for "elastic search" that ate up a huge amount of RAM. Then I discovered that this process was from a Windows service installed by the TFS Server 2018. The service was "elasticsearch-service-x64". Its files were in the TFS Server 2018 folder. I stopped the service, and then removed it by running "sc delete elasticsearch-service-x64" on a admin command prompt. 

Now the host has regained all the RAM, everything is back to normal. The SQL server is no longer running at a snails pace. And the server is now the latest -- Azure DevOps Server 2022.0.1.