Lync Dude: Simple Understanding of Lync Windows Fabric & Failover

Howdy,

I saw a scary number of Lync 2013 deployments in the last 8 months where the Lync is deployed using an Enterprise pool with only two front end servers, even I saw a couple that have an Enterprise pool with only one front end server, yes you read that right, only one front end server.

So I decided to write another post of my “Simple Understanding” article series aimed to explain the Lync 2013 server architecture, how it utilize windows fabric for high availability and why you should not deploy a 2 nodes Enterprise Edition pool, I’ll try to use small words and simple explanations as I can.

I’m planning also to use this article as a guide to share with customers which have an existing Lync deployment or considering Lync to help them in their decisions.

In previous version of Lync 2010, user services was provided by the Backend database, and that’s why when the SQL backend goes down, the Lync clients goes into “Limited Functionalities” status. But this has changed with Lync 2013 server architecture, where users services were moved from the backend databases to the Lync Front end servers.

what I see best advantages of this approach are:

Less dependent on SQL backend
Can scale further by adding more Front end servers to the pool
Each user’s data kept on the front end servers in pool

Windows Fabric Deep Dive

Microsoft Best Practices: Windows Fabric is a kind of clustering technology, with Lync 2013 Microsoft recommended to use 3 nodes in an Enterprise pool, and in case you cannot use 3 nodes in a pool then deploy two standard edition pools and use pool pairing for high availability.

NOTE: Windows Fabric is configured automatically every time the services starts up, as an administrator there is nothing you need to do regarding Windows Fabric, the configuration can be found inside Manifest file of the Windows Fabric located in the following path

C:\Program file\windows fabric\bin\clustermanifest.current

In my lab I have one Enterprise pool with 3 front ends in it LYFE01, LYFE02 and LYFE03, every user enabled for Lync in this pool gets a primary front end server, and two backup servers, one which is the primary backup server and another is a secondary backup server

this can easily be found by using PowerShell command line

C:\> Get-CsUserPoolInfo –identity “user”

As you see in the output of the command line above the user has:

Primary front end= LYFE02 (keep it in mind)
Primary Backup front end = LYFE01
Secondary Backup front end = LYFE03

Routing Groups

As agreed so far each user have a primary front end, this info is written in the SQL backend databases with a little tinny difference, when it comes to SQL backend each Lync user is a Routing group. the number of Routing Groups increases when you add more users to the front end pool, you find Routing Groups in the RTC database inside the RoutingGroupAssignment table (use the following query):

SELECT TOP 1000 [RoutingGroupId]
,[RoutingGroupName]
,[ServiceClusterId]
,[FrontEndId]
,[State]
,[FrontEndChangeTime]
,[CurrentUserCount]
,[CurrentEndpointCount]
FROM [rtc].[dbo].[RoutingGroupAssignment]

In my lab I have only one user enabled for Lync, so when checking Routing Groups in the RTC database I see only one Routing Group (does not matter which SQL you’re checking because Lync is not depending on the backend anymore)

Each Routing Group has a FrontEndID associated to it, this is the Primary Front end server for that Routing Group (User), in my lab it has the value “3”

You can “decrypt” this value by checking the FrontEndID table, this table include the front end nodes in the pool (run the following query)

SELECT TOP 1000 [FrontEndId]
,[Fqdn]
FROM [rtc].[dbo].[FrontEnd]

So as you see, my only Routing Group (user) has FrontEndID server “3” which is LYFE02, that’s the same result we got using Get-CsUserPoolInfo command line up in this article.

SIDE NOTE: the Routing Group Id is written to the Active Directory user account under the attribute msRTCSIP-UserRoutingGroupId in reverse order

How it Works

I would summarize how windows fabric works in 4 simple steps

First time front end service “Starting”, the front end server connect to the SQL backend and collect the “User’s Information” of all users that it is responsible for.
Once front end server finish collecting user’s information from the SQL backend, it replicate user’s information to both the primary backup and backup servers.
From this point on, it is the primary front end responsibility to write all new user’s information to the SQL backend (like user create a new conference or add a new user) as well as replicate them to both primary backup and secondary backup servers.
Front end services goes into “Started” status.

How Windows Fabric works under the hood

Front end Failover

one of the user I have in my lab (I added more users) has LYFE02 as the primary front end server, LYFE03 as the primary backup and LYFE01 as the secondary backup front end, so what happens when the primary front end server (LYFE02) for that user is down?

The fabric pool manager promote the primary backup front end to be the conference directory – Event ID 51037

Fabric pool manager make some changes to the MCU Factory including the chat, phone conference ..etc. – Event ID 51035

Fabric Pool Manager mark the primary front end as inactive – Event ID 32108

the primary backup font end LYEF03 is promoted to be the primary front end for the user “Routing Group” – LS User Services Event ID 32167

the primary storage services will be assigned to the front end (I noticed it is always the secondary backup front end, but I’m not sure if this is the case always) – Lync Storage Services Event ID 32033

User information will not be affected, because the primary Front end was always replicating the user information to the primary-backup and secondary-backup servers.

if I run get-csuserpoolinfo on the same user now I can see that the primary back up front end server got promoted to be the primary front end server, and the secondary backup front end became the primary backup front end.

Important Notes

so now that you understand how Lync utilize Windows Fabric and how does it work, give you some notes that you keep in mind

As mentioned before Microsoft recommend using 3 front ends when deploying Enterprise pool, if you cannot deploy 3 front ends, then use two standard edition pools with pool pairing
Windows Fabric is a kind of failover cluster, and it need an odd number vote (like a witness-servers) to maintain the Pool-level quorum

According to TechNet, following table show the total number of Front ends needed to be running in a pool to maintain the Pool-Level quorum

Lync 2013 server still need SQL backend, and if for any reason the SQL backend is unavailable, Lync Front Ends goes into survivable mode after 30 minutes.

In case minimum number of Front ends in a pool is not met

in case number of front ends in a pool is not met, the front end services start shutdown after 5 minutes, in a nutshell the following happens

LS User Services Event ID 32163: the Pool manager will disconnect from the Fabric pool manager due to loss of quorum

LS User Services Event ID 32189: Pool Fabric will disconnect the users (close routing group connections)

LS User Services Event ID 32170: Pool manager is trying to connect to fabric pool manager and failing, make sure 85% of front ends are up and running

LS User Services Event ID 32173: Lync Front ends server will start shutting down after 5 minutes

Two Front End nodes Pool

in case of a two servers in Enterprise pool, Lync uses the SQL backend server as the witness server to maintain the Pool-level quorum, in case you have Mirrored SQL, Lync will use the primary SQL server in the Mirror as a witness server

to confirm that, if you check the Windows Fabric “ClusterManifest.current” file, and compared it to the one mentioned above in this article (from 3 Front Ends Pool) you will notice addition section called “Votes”, where Windows Fabric uses the SQL server in the Votes to maintain Pool-Level quorum.

Primary SQL Backend Outage

if for any reason the Primary SQL backend is unavailable the following happens (no SQL Mirror is used)
the Lync clients goes into “Limited functionalities” mode same like what happen in Lync 2010 server.
if the SQL backend not brought back online in 30 minutes, Lync Front Ends goes into survivable mode.
if for any reason the SQL backend is unavailable, and the minimum number of Front Ends need to be running in the pool is not met, Front End services shut down after 5 minutes (check table above).

following short video summarize the cases of a two Front Ends pool

if your customer is using Virtual infrastructure like Hyper-v or VMware to host the Lync servers, make sure to divide your front ends across the physical hosts, especially when dealing with large number of front ends in pool, just make sure you don’t put all the eggs in one basket, you don’t want to lose a physical host which have users’ primary front end, primary backup front end and the secondary backup front end running on it, the user will go offline till one of those front ends is brought back online

what you would want to have

What you would NOT want to have

NOTE: with new improvements in Hyper-V and VMware, engineer can utilize the “Live Migration” option and their infrastructure resources to make sure when a physical node is down, the front ends are migrated automatically to the online physical host.

this cover most of the points Lync specialist or a customer considering deploying Lync need to know and take in consideration in the planning phase of the project Smile

Lync Dude: Simple Understanding of Lync Windows Fabric & Failover

Windows Fabric Deep Dive

Routing Groups

How it Works

Front end Failover

Important Notes

In case minimum number of Front ends in a pool is not met

Two Front End nodes Pool

Trending Articles

Bath man appears in court charged with attempted murder of a man...

MACLEAN, Allan

Black Angus Grilled Artichokes

Practice Sheet of Right form of verbs for HSC Students

Police blotter for Jan. 12

99 God Status for Whatsapp, Facebook

Rajasthan Board 12th Science Result 2018 name wise- RBSE 12th commerce result...

Notorious Naushad of Ippa gang nabbed

Child Kidnapping: Amy McNeil was kidnapped on her way to school by 5 adults;...

Sonible Smartlimit v1.1.5-R2R

NCERT Solutions for Class 9th Sanskrit Chapter 3 पाथेयम्

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

Arrow Flash 2 – Sinhala Dubbed – Episode 23 – 20th March 2016

[GET] AI Traffic Goldmine

[E² Plugin] HDF-Radio

Universal Multi-Patch v1.3 By RADIXX11

IWAN – Thanks and Praise ( Throw Back Thursday )

RONALD P SONDERGAARD Arrested by Miami-Dade County Corrections on Mar 03, 2017

मुख मैथुन से उठाएं सेक्स का भरपूर मज़ा, जानें क्या है इसका सही तरीकामुख मैथुन...

HSSC Excise & Taxation Inspector Result 2017 Scorecard/ Category Wise Merit List