SLA probe IP-monitoring with a default route withdrawal

When an SRX is your wan facing router/firewall you might want to continuously test your connectivity. That is when the RPM probes do come in handy. The RPM probes are very similar to ip-sla from cisco but way more limited. On their own they just provide statistics which is nice but not very helpful. Juniper also has a feature called ip-monitoring that works in conjunction with the rpm probes and can take a result of an rpm probe and take some action on it.

Unfortunately the actions it can do are extremely limited.

  • The first is to down an interface which depends on implementation might or might not recover. e.g. if you down a lo0 and you rpm is on a physical interface it will autorecover if it is on physical interface you use for wan it will break the probe.
  • The second action it can take is to insert a new route in a specific routing table. This is bit more useful but has fairly limited use. The typical example would be to add default route to a secondary srx as described here.
  • The last option is to set metric to a newly added route but that never worked for me. As you can see the ip-monitoring is very limited and not very flexible.

The issue is if you start using routing instances for the wan interfaces. Your internet routing instance would have only 3 routes – the interface /32 the /31 p2p and the static default to your ISP. You would import the (usually) static route into your internal routing instance. The import would fail if you have a direct failure on the p2p link but if you have indirect failure scenario then the import would always work and your router will start blackholing traffic. The obvious solution would be to withdraw the route before it is being even imported to the internal routing instance but that is not possible due to limitations of ip-monitoring function. 

There are three possible solutions in this situation.

  • The first one it so bring the wan interface down – this will make the default route unavailable removing it from the routing table and consequently removing it from the route importing. The main issue is that this situation will never autorecover.
  • The first one is to install a bogus interface route in the internal routing instance and write a routing policy that would filter the default route from the routing table. This is bit cumbersome and introduces a non-existent route into the routing which might have unforeseen consequences.
  • The last and cleanest solution is introduction of a prefered route with a different metric and use a routing policy on import between the routing instances and deny the import if the specific metric is being matched.

Topology

Configuration

  • Configure the RPM probe and IP monitoring

The RPM probe is pretty much box standard and is not very aggressive so the fail-over is rather slow. The timers and other parameters of the probe should be tweaked to better values. Also for the test purpose I am pinging the opposite interface of the p2p link to the provider which is obviously not a good place to test against in real life scenario as it kinda beats the purpose of the whole setup – but in lab environment this is sufficient.

set services rpm probe wan_test test uplink_test target address 1.1.1.1
set services rpm probe wan_test test uplink_test probe-count 3
set services rpm probe wan_test test uplink_test probe-interval 10
set services rpm probe wan_test test uplink_test test-interval 5
set services rpm probe wan_test test uplink_test source-address 1.1.1.2
set services rpm probe wan_test test uplink_test routing-instance ri_internet
set services rpm probe wan_test test uplink_test thresholds successive-loss 3
set services rpm probe wan_test test uplink_test thresholds total-loss 3
set services rpm probe wan_test test uplink_test destination-interface ge-0/0/0.0
set services rpm probe wan_test test uplink_test next-hop 1.1.1.1
set services ip-monitoring policy uplink_failure_detection match rpm-probe wan_test set services ip-monitoring policy uplink_failure_detection then preferred-route routing-instances ri_internet route 0.0.0.0/0 next-hop 1.1.1.1
  • resulting config
rpm {
 probe wan_test {
   test uplink_test {
     target address 1.1.1.1;
     probe-count 3;
     probe-interval 10;
     test-interval 5;
     source-address 1.1.1.2;
     routing-instance ri_internet;
     thresholds {
       successive-loss 3;
       total-loss 3;
      }
     destination-interface ge-0/0/5.0;
     next-hop 1.1.1.1;
    }
  }
}
ip-monitoring {
 policy uplink_failure_detection {
   match {
     rpm-probe wan_test;
   }
   then {
     preferred-route {
       routing-instances ri_internet {
         route 0.0.0.0/0 {
         next-hop 1.1.1.1;
       }
      }
     }
    }
  }
}

The action taken by the ip monitoring is to inject a preferred default route in the routing instance ri_internet  with the same next-hop. The ip-monitoring action will do this on rpm probe failure the difference from the normal static route is that this new route has metric2 with a value of 0 normal routes do no use this metric so it is great thing to match on with the routing policy.

  • Checking the status
root@kw-core-wan1> show services ip-monitoring status

Policy - uplink_failure_detection (Status: PASS)
 RPM Probes:
 Probe name Test Name Address Status
 ---------------------- --------------- ---------------- ---------
 wan_test               uplink_test     1.1.1.1          PASS
 Route-Action:
 route-instance route next-hop state
 ----------------- ----------------- ---------------- -------------
 ri_internet       0.0.0.0/0         1.1.1.1          NOT-APPLIED
  • The routing table in ri_internet
root@srx> show route table ri_internet.inet.0

ri_internet.inet.0: 3 destinations, 4 routes (3 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

0.0.0.0/0  *[Static/5] 00:03:26, metric 5
            > to 1.1.1.1 via ge-0/0/5.0
1.1.1.1/30 *[Direct/0] 01:23:12
            > via ge-0/0/5.0
1.1.1.2/32 *[Local/0] 01:23:12
            Local via ge-0/0/5.0
  • write the import policy

For the import between the ri_internet and ri_internal I an using the instance import function which allows to import routes between instances through policy. This allows for flexibility on what to import an how without the need to use RIB groups or running a routing protocol in the ri_internet.

term t_reject_metric2 {
   from {
         instance ri_internet;
         protocol static;
         metric2 0;
         route-filter 0.0.0.0/0 exact;
        }
   then reject;
 }
term t_a_metric_5 {
   from {
         instance ri_internet;
         protocol static;
         metric 5;
         route-filter 0.0.0.0/0 exact;
        }
   then accept;
}

Here I should explain the policy a bit as it is crucial to the whole setup.

The first term is matching the default route to the same next-hop but with metric2 and action of “reject” so if the preferred default appears in the routing table and has metric2 value 0 this term will match and no route will be imported into the ri_inside.inet0 routing table. If this route doesn’t exist the second term will be evaluated and that will only match the live static default route to the ISP. This scenario will auto-recover and in effect withdraws the default from the routing instance in case of uplink failure.

Failure 

root@srx> show services ip-monitoring status

Policy - uplink_failure_detection (Status: FAIL)
 RPM Probes:
 Probe name Test Name Address Status
 ---------------------- --------------- ---------------- ---------
 wan_test               uplink_test     1.1.1.1          FAIL
 Route-Action:
 route-instance route next-hop state
 ----------------- ----------------- ---------------- -------------
 ri_internet       0.0.0.0/0         1.1.1.1          APPLIED

The routing table in ri_internet

root@srx> show route table ri_internet.inet.0

ri_internet.inet.0: 3 destinations, 4 routes (3 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

0.0.0.0/0  *[Static/1] 00:03:26, metric2 0
            > to 1.1.1.1 via ge-0/0/5.0
            [Static/5] 01:23:12, metric 5
            > to 1.1.1.1 via ge-0/0/5.0
1.1.1.1/30 *[Direct/0] 01:23:12
            > via ge-0/0/5.0
1.1.1.2/32 *[Local/0] 01:23:12
            Local via ge-0/0/5.0

Leave a Reply

Your email address will not be published. Required fields are marked *