EtherSam (Y.1564) explained

This is the last article (at least for now) from the series about testing methodologies and testing standards. I will cover some bits and pieces in the region of testing in general but it won’t be as heavy on the theory as I want to write some “hands-on” scenarios for combined use of Wireshark and PackEth as well as about some multicast scenarios. Also I will be doing more Cisco and Juniper stuff so it is quite likely I will be blogging some configs and labs. Anyway enough about the future plans and let’s start with the topic at hand.

Introduction

The ITU-T Y.1564 also more commonly known as EtherSam (which originated in the old name of the standard ITU-T Y.156sam) is a service activation test suite whose goal is to allow for rapid link testing in deployment of services. The main advantage of this test is that it allows for testing of SLA (Service Level Agreements) while deploying new service and it can do that disregarding the Physical topology (i.e. it can verify end-to-end SLA even in live environment with live traffic flowing through the network).

There is few serious considerations in general that make this test suite bit awkward to use.

First one is that this is a very new standard (initiated in 2009, published 2011) and is still changing as new drafts are still being issued.

The next rather serious problem is that this test suite is for “service activation” which means in normal language that it is no good for lab testing as it doesn’t really stress the equipment. The reason is that the EtherSam is designed around the idea of rapid deployment of new links/services in Telcos (I will write about the disadvantages of the design in later).

The last issue is that as a new standard it is rather unknown among network engineers so it takes some education before it can be used.

Traffic parameters

The theory behind this test suite is somewhere half way through between the RFC2544 and BERT tests as it tried to get the best of both while achieving similar results to both. Lets start with definitions as they are the most important. In EtherSam you can configure multiple concurrent services and each service  can have following 4 parameters:

  • CIR – Committed Information rate
  • CBS – Committed Burst Size
  • EIR – Excess Information rate
  • EBS – Excess Burst Size 

This is not as complicated as it might seem at this point. These values are only used to set the SLA. The CIR defines the minimal amount of traffic within the available bandwidth and must be always fulfilled. If there is only CIR specified on the links/services it is a good practice to have some amount of bandwidth allocated to CBS as it will allow for a small overshoot in case of traffic burstiness. Obviously one might need more flexibility in how much traffic to pass through (like over-subscription) where some frame loss is acceptable in exchange for more data being delivered. That is the Excess Information Rate. As it is obvious that once EIR is in place the data from CBS would be calculated as part of EIR so CBS setting loses its meaning. If you want to get little more flexibility in case of having more bursty traffic you can specify EBS on top of the EIR.

Traffic coloring

In the paragraph above I have described the two out of three traffic types that exist in EtherSam which would be reffered to as a green traffic (CIR+CBS) and yellow (EIR+EBS). The standard also defines a red traffic which is a traffic non-conforming to either CIR or EIR. In effect based on the EtherSam methodology this traffic should never be passed and should be dropped. This look like a absolutely trivial and obvious thing but it has one very serious consequence in deployments with over-subscription in place – you must define the EIR as the “shared” part of your QoS with specific size allocated to it. So having a random amount of free-to-grab bandwidth for the tested service will result in failing the test as passing red traffic is a fail criteria on Y.1564.

Traffic profile for EtherSam - coloring

Bandwidth profile parameters – Coupling flag and Color mode

I am putting description of these two parameters at this place just for the sole reason that they are defined in the standard but I would like to stress out that I haven’t seen them implemented in any testing equipment so far so this section will be rather short and most people can just skip it as it has little to none practical use (at least at the time of writing). These two parameters allow for the metering algorithm to be adjusted and thus change the result. Also they are valid only in certain scenarios.

  • CF – Coupling flag – Could be only set as on or off. Is only useful for introducing new service in live environment with extremely bursty traffic. It allows for coupling unused green and yellow traffic thus allowing for higher throughput.
  • CM – Color mode – allows for two options color-aware and color-blind mode where the first one is requiring the tested equipment to re-mark/re-color the traffic streams to adhere to the existing network rules whereas the color-blind expect no interference with the coloring.

The Service Configuration test

This is the first test that you can run and is meant to test a individual service. The aim is to test the CIR/EIR (and optionally CBS/EBS) comply to the setup. It is a rather simple test but except the obvious CIR/EIR/policing it allows for some variability offering the following  options:

  • Fixed frame size or EMIX pattern (1518, 1518, 1024, 64, 64)
  • optional Step Load (25%,50%,75%100%)
  • optional Burst test for the CBS and EBS (defined in Bytes)

If you have multiple services configured each one will be done separately so be careful about the time-estimate as this test is not intended to run for long time. Especially with the ramped services it is important to realize that the total duration of this test will be number of services x number of steps x step time. Also the other thing is that CBS and EBS will be tested separately adding more time to the test. In total this should not take more than 10 minutes as this test is not supposed to be replacing a long term tests.

The Service Performance test

This test is the second (and last) test you can do in Y.1564 and is in place to test all services in one go in order to check that the sum of the CIRs is actually available on the path in question. It is also meant to be a long test with specified durations 15 min, 2 hrs and 24 hrs. The EMIX and ramped traffic in the services should be available as in previous test.

I think that this test due to its simplicity can replace the BERT in many cases while giving better results for service providing.

The results and pass/fail criteria

The pass/fail criteria are rather obvious

  • Fulfilling CIR (or CIR+CBS)
  • Fulfilling EIR (or EIR + EBS)
  • Policing overshoot of traffic > CIR+EIR+EBS
  • Conform to maximal acceptable delay variation (jitter)
  • Conform to maximal acceptable round-trip latency
  • Conform to SLA’s Frame loss (or availability)

These are solid criteria and there is not much you can say against these but as always there are some considerations that must be taken in account.

First one is something I have already mentioned – there is no way for the Y.1564 to consider a shared “best effort” overshoot above the defined CIR+EIR which might be problem in some scenarios but I think it could be avoided via some hacked configuration of EIR/EBS.

Second is the SLA frame loss or more known in the telco world as availability. So if you provide let’s say 99.99% availability it means that on a 100mbps stream it would be acceptable to lose  over 2000 frames single hour which I don’t think would be found acceptable in most environments. As far as I know there is no possibility to set the availability to 100% (also no SLA would ever have this number in it). I ma not currently aware of any possible workaround for this so the only advice is to go through the data in the results table very carefully and set this option to be as close to what you expect of the test as possible (i.e. in my opinion under normal circumstances there should be 0% packet loss on 2 hours test on most systems).

The last thing I would like to mention is that there is no built-in out-of sequence counting mechanism. This might sound as an unnecessary feature but in voice-enabled environment this is  actually a very important parameter to observe.

 Conclusion

 The EtherSam is rather interesting test suite but in my opinion cannot (and was never meant to) replace the RFC2544. In some ways it can partially replace BERT in some field operations. I have to say I do welcome this standard as it addresses the last bit of testing that was not properly included in any Ethernet/IP testing suite to my knowledge. It obviously has some drawbacks but I think it has its place in field service activation environment . Only time will tell if it will become as wide spread as the RFC2544 but I certainly hope so.

 

 

 

 

Bit Errror Rate Test (BERT) explained

This article will be rather short in comparison with the others in the mini-series about various Ethernet/IP testing methods but it is one that is necessary as Bit Error Tests have a long tradition in telco environment (circuit based networks) but are still quite valid even in nowadays packet networks – at least for some specific cases. So without further delay let start with some theory behind the testing and some practical use followed by some use cases and best practices.

BERT introduction

As you can guess from the name this test is really to test physical layer traffic for any anomalies. This is a result from the test origins where T1/E1 circuits have been tested and each bit in each time-slot mattered as the providers were using those up to the limit as bandwidth was scarce. Also as most of the data being transferred were voice calls any pattern alterations had quite serious implications on the quality of service. This also led to the (in)famous reliability of five nines or the 99.999% which basically states that the link/device must be available 99.999% throughout a specified SLA period (normally a month or a year). One must remember that redundancy was rather rare so the requirements for hardware reliability was really high. But by the move away from the circuit-based TDM networks towards the packet-based IP networks the requirements changed. The bandwidth is now in abundance in most places and the wide deployment of advanced Ethernet and IP feature rich devices provides with plenty options for redundancy and QoS with packet-switched voice traffic on rise – one would think it is not really necessary to consider BERT as something one should use as test method but that would be huge mistake.

Why BERT

There are few considerations that can make BERT an interesting choice. I will list some I think are the most interesting.

  1. It has been designed to run for extended period of time which makes it ideal for acceptance testing which is still often required
  2. BERT is ideal for testing jitter as it was one of the primary design goals
  3. The different patters used in BERT can be used for packet optimization testing (I will discuss this later in more detail)
  4. Most of the BERT tests are smarter than just counting bit errors so the test can be used for other testing

BERT Physical setup and considerations

On Ethernet network you cannot run a simple L1 test unless you test just a piece of cable or potentially a hub as all other devices would require some address processing. This makes the test being different on Ethernet network from unframed E1 as unlike on E1 we need to set framing to Ethernet with the source and destination  defined on the tester. Also as Ethernet must be looped on a logical level it is not possible to use simple RJ45 with pair of wires going from TX to RX as you could with E1 and either hardware or software loopback reflector is required. Most tester will actually allow you to specify even layer 3 and 4 for IP addresses and UDP ports. The reason is usually so the management traffic between tester and loopbacks can use this channel for internal communication.

Pattern selection options

As this test originates from the telco industry some interesting options are usually presented on the testers. The stream can generate these patterns:

  1. All zeros or all ones – which are specific patters originated from TDM environment
  2. 0101 pattern and 1010 pattern – patterns that can be easily compressed
  3. PRBS – Pseudo Random Bit Sequence – is an deterministic sequence that cannot be compressed/optimized the details and calculation can be found on wikipedia
  4. Inverted PRBS – the same as above but the calculation function is inversed to counter any “optimization” for the PRBS

The thing to remember is that PRBS will be applied to the payload of the frame/packet/datagram so it there is any sort of optimization present it will have no effect as PRBS is by design not compressible. There are various “strengths” of the pseudo-random pattern the higher the number the less repeating it will include. Normally it is possible to see two main variants: 2^15 which is  32,767 bits long and 2^23 which 8,388,607 bits long. Obviously the longer the pattern the better and more “random” behavior it emulates.

Error injecting Options

As this test originated in telco world injecting errors was a major thing but in Ethernet network it lost its importance. If you inject even a single bit error in an Ethernet frame the CRC should be incorrect and the whole frame should be dropped on first L2 equipment it will be passed through which should always result in alarm LoF(Loss of Frame)/LoP (Loss of Pattern).

Use cases, Best Practices and Conclusion

The most common use case for BERT in nowadays network would be in commissioning new links as you can run a fairly simple test for a long time that will give you a reasonable idea bout it’s quality in terms of frames drops and jitter.

The few recommendations about how to run this test would be as follows:

  • Use the largest pattern you can.
  • Remember that the line rate and L2 rates will be different because of the overheads.
  • Remember that 99.999% of availability results in 0.8s outage in 24 hours (which can be quite a lot of frames)
  • PRBS cannot be optimized

So as you can see BERT is rather simple and straight forward test that even though is in many ways deprecated by RFC2544 and others (like Y.156sam) it is still a very good test to know especially if you are in jitter sensitive environment e.g. where VoIP and IPTV is deployed.