Hi,
In their native form, SIP trunks from VoIP.ms do not have any redundancy. The trunk getting the incoming calls can only be associated with a single POP Server.
I still managed to increase the availability on my SIP phone service. Here is what I did…
For this example, let’s assume that my DID POP Server is Montreal8 (just change it to whatever yours is).
Step 1: In the VoIP.ms portal, I created 3 subaccounts for my PBX (each of them ends up being a different SIP trunk, more on that later). Let’s call them pbx1, pbx2 and pbx3 for this example.
Step 2: In the VoIP.ms portal, in my DID configuration, I set the Main routing to SIP/IAX subaccount pbx1. I also set the “If Unreachable” routing from the Failover section to SIP/IAX subaccount pbx2.
Step 3: SIP trunk configuration on my PBX:
a. I configured Trunk1 to montreal6.voip.ms (the POP matching my DID) port 5060 using subaccount pbx1
b. I configured Trunk2 to montreal6.voip.ms (again, the POP matching my DID) port 5080 (this is the alternative SIP port for VoIP.ms) using subaccount pbx2.
c. I configured Trunk3 to toronto2.voip.ms (I chose any other POP server from any other VoIP.ms location, other than Montreal) using subaccount pbx3
Step 4: In the Inbound call routing rules on my PBX, I included Trunk1, Trunk2 and Trunk3 (in this order) in all rules for external calls.
Step 5: In the Outbound call routing rules on my PBX, I included Trunk1, Trunk2 and Trunk3 (in this order) in all rules for external calls.
Step 6 (bonus): If you have a secondary internet connection (dual WAN), create a policy-based routing rule in your router/firewall that always steers all traffic going to montreal6.voip.ms on port 5080 to the secondary WAN. This will make Trunk2 always go through your secondary internet connection even if the main one is up. (This does not apply to me, I only have a single internet connection)
Here are a few incident scenarios and how this setup should react:
Scenario 1: POP Montreal6 or the entire Montreal VoIP.ms site goes completely down. Trunk1 and Trunk2 will be down. Trunk3 will remain up and will handle all outbound calls, including emergency calls (911). Inbound calls will not go through however. This usually only last a few minutes, probably when VoIP.ms is performing maintenance on their server.
This scenario happens to me a few times per month, both of my SIP trunks to my main POP server go down for about 3 to 7 minutes each time. When Trunk1 and Trunk2 are disconnected, my outbound calls are getting routed completely transparently to Trunk3 to Toronto. For me, getting no inbound calls for a few minutes is not a big deal. Having 21 minutes of down time on incoming calls during a month is still a 99.95% availability rate
.
Fortunately, VoIP.ms never performs maintenance simultaneously at Montreal and Toronto so I get close to 100% availability rate on outbound calls. At this point, both my single internet connection and single PBX system become my availability bottleneck, it is not VoIP.ms anymore.
Scenario 2: In case there really is a disaster in Montreal and the VoIP.ms site goes down for a while, a few manual steps are required to restore the Inbound calls (outbound calls are still functional at this point because they are already routed to Toronto on Trunk3). I need to log into my VoIP.ms portal and change:
- My DID POP server to Toronto2
- The Main routing on my DID to SIP/IAX subaccount pbx3.
No settings on my PBX need to be changed, all inbound calls will now be coming on Trunk3. Outbound calls were already being routed on Trunk3 as soon as Trunk1 and Trunk2 went down.
(Since nothing need to be changed on the PBX, this change can easily be done remotely and recovery is faster).
At this point, the phone service should be fully restored for the users, so the pressure goes off. However, the PBX does not have any SIP redundancy anymore. At this point, I can take my time to change the redundancy configuration using two other VoIP.ms sites, like connecting Trunk1 and Trunk2 to Toronto2 instead of Montreal6 and connecting Trunk3 to Chicago3 instead of Toronto2. After doing that, I have to change the Main call routing of my DID back to subaccount pbx1.
Scenario 3: My firewall messes up the state table of my SIP trunk connection to montreal6.voip.ms on port 5060 and stop passing the traffic. After a few seconds, the pbx1 subaccount will appear unresponsive to VoIP.ms and the Failover rule on the DID will redirect the inbound calls to pbx2 subaccount (Trunk2 on the PBX). Around the same time, the PBX should also realize that Trunk1 is down and start routing outbound calls to Trunk2.
Scenario 4: Toronto site or Toronto2 server goes down. If Montreal6 is still up, nothing will happen to the phone service because, in a normal situation, all inbound and outbound calls are going through Trunk1 connected to Montreal6.
Scenario 5: My main internet connection goes down and let’s say that I have a secondary one (dual WAN). After a few seconds, Trunk1 will appear unresponsive from both my PBX and VoIP.ms. My PBX should start using Trunk2 for outbound calls and VoIP.ms should redirect inbound calls to pbx2 subaccount (Failover on DID). Since Trunk2 is already routed to my secondary internet connection by the policy-based routing, the WAN Failover delay from my internet router/firewall should be partially avoided and the phone service should be restored before the entire Internet connectivity does. Once the internet Failover is done to WAN2, Trunk1 should reconnect over the secondary internet connection.
Like I said, this is not a perfect VoIP “high availability” solution for both incoming and outgoing calls like a large corporate SIP provider would offer, but it covers the most likely scenarios that can disrupt the phone service. Moreover, this configuration does not add any recurring costs from VoIP.ms, it only cost the time of the IT staff to design and implement it.
After setting up a high availability solution like this one, I always advise to run disaster recovery tests by simulating the different scenarios to confirm that the system really behaves as expected.
Hope this will be helpful to you. 