I was recently working on an issue where Lync was marking a single gateway of 3 in a voice policy route as down. This customer is using 3 gateways and outbound calls are using all 3 together via round-robin routing. The only item that was unique about this particular gateway was that there is a FXS paging trunk connected to it. Our support team restarted the mediation service and saw all the necessary event log entries indicating that the gateway was back online only to see it be marked as unreachable after a period of time. The issue was escalated to me since I did the original implementation and suggested that we try testing a few things:
Issue
-
Make a call to the paging trunk, while there is a call established make another call.
- This resulted in an event log entry stating that there was a failed attempt on outbound routing.
- We did this 4 more times (the magic number here is 5 failures)
- After 4 more failures we saw Lync mark the gateway as unavailable
Cause
- The reason this is a problem is that the gateway was responding to Lync when the FXS port was in use with a 503 Service Unavailable (Remember Good SIP Messages 1xx, 2xx, and 3xx) (bad SIP messages 4xx, 5xx, 6xx) If Lync receives 5 permanent failures Lync will mark the gateway as down. That is OK when there are other gateways that provide the same outbound routing, but now paging will not work. Remember it is connected to the gateway that is marked offline
Resolution
In order to solve this issue I suggested that we utilize the New-CsSipResponseCodeTranslationRule Lync PowerShell command to take a 503 message and translate it to a 483 (Busy Here). By doing this we take a permanent failure like 5xx and translate that to a temporary failure that Lync does not count towards the 5 failures before a gateway is marked down.
Here is the command that was run:
New-CsSipResponseCodeTranslationRule -Identity “PstnGateway:xx-xxx-gateway1.contoso.corp/Rule503_Paging” -ReceivedResponseCode 503 -TranslatedResponseCode 483
After we implemented the command we ran several tests. We had a user connect to paging and then another user call the same trunk over and over. Lync never marked the gateway as down. We validated this by reviewing the Lync logs and seeing that we were in fact getting a 483 message back now.
Hope this helps someone!
