RFC2544 Testing explained

This next article in this mini-series about testing Ethernet/IP networks I will write about one of the most common test – the RFC2544 ”Bench-marking Methodology for Network Interconnect Devices”. The purpose of this test is quite often misunderstood even though it is clearly stated in the introduction of the standard itself. So let’s start with clarifying what this testing suite is and what it should be used for.

Introduction and considerations

As the standard say right at the beginning this test suite is in place so customers have a single point of reference while testing network equipment capabilities. So as you can see the intent of this test is to evaluate a single piece of equipment and provide results that can be easily compared between vendors. This approach has some very obvious advantages and some not so obvious drawbacks both of which I will try to cover in this article.

The advantages of the test suite can be covered rather quickly as they are for most part rather obvious. The test suite had been designed in order to provide vendor-independent comparable test with clear and easy-to-understand results. The tests themselves are measuring behaviors/variables that are absolutely a “must know” for any new network element being introduced into the network. What the test cover and how will be detailed below in detail and I have to say most of the methods are still very valid even though the standard has been approved in 1999. The main advantage (or maybe you can say disadvantage) is the test popularity as it became over the years a most used standard for Ethernet/IP network testing.

The disadvantages of RFC2544 are bit more obfuscated but are rather serious. The first of the problems I am encountering a lot is the fact this test suite is so popular often results in two interconnected problems – it is being used in wrong place (where other test would be more suitable) and its results are being misunderstood or misinterpreted. I hope this article will help to shed some light on the test procedures and variables entering the test and subsequently on the expected results.

The other important thing to consider while making a decision if to use or not use this test is that this test suite was created to test standalone network elements and even though it can be used for service activation/ acceptance testing it is not its primary focus and the testing procedures must be adjust. It should be considered that in some cases it will not be suitable at all (that is why there are specialized test procedures for those.

The last consideration or problem of the RFC2544 suite is that it has been created and approved devices that were around in 1999 (routers, L2 switches and hubs etc.) so it is designed in a way that is quite different from today’s multi service environment. Also the intent was to test purely native Ethernet devices (and now legacy Ethernet transport over FFDI and token-ring) so using it in Telco environment where quite large quantities of equipment still use ATM (at least internally) can lead to very interesting results. I will discuss this in later part of this article.

Physical setup

The first thing to consider when using the RFC2544 test suite is what physical setup you will use and what repercussions will that have on subsequent evaluation and fault-finding. The three main options are:

Reflected scenario (uses a dedicated hw/sw loopbacks)
Unidirectional scenario (uses one stream and two testers)
Bidirectional scenario (uses two streams from two independent testers)

Each of those scenarios is quite useful in different use case.

The most common one for field operations would be the Reflected scenario where the tester is on one location and the dedicated reflector/loopback is on another end of the tested line wherever that is. The main problem with this scenario is that there is no way how to tell in which direction lies encountered problem as the uplink is the stream from downlink with MAC and IP addresses swapped by the loopback unit. This is not a big deal in lab environment but might be a crucial thing to consider when the loopback is used in field and is couple/tens of kilometers away.

The other problem is that the loopbacks can be created in software on the measured equipment which adds layer of uncertainty about the results as the device itself can (and in most cases also will) behave differently when the loopback is external hardware.

The last issue with the reflected scenario is that in my experience the loopbacks are not exactly reliable and can themselves introduce unexpected behavior into the measured values.

The unidirectional testing uses one sender and one smart receiver that evaluates the data stream. This testing usefulness is rather limited as no real data is exclusively unidirectional. This is a rarely used setup unless one wants to use this test setup for faultfinding and it could be also used in asymmetric networks.

Bidirectional scenario is probably the most precise way how to perform most testing as it is basically running one separate stream in each direction which is evaluated on the other endpoint. This is commonly called “dual test set” or “dual test” in the tester’s setup. The obvious drawback is that you require either two testers (in case of small field units) or correctly wired and configured big tester which might get tricky. Obviously as the supply of the testers is way more limited due to the high price so this scenario is not suitable for field operations as it requires good logistics while moving the testers between sites.

Ethernet frames

The Ethernet frames used in this suite are based on Ethernet standard and are used for multiple test in the suite. The distribution is not random but tried to cover the most important frame sizes that might be present in average network. The distribution looks like this: 64, 128, 256, 512, 1024, 1280, 1518 Bytes. It is very important to note that these are frames without the 802.1Q tag. So for a tagged traffic the minimal value should be 68B for the traffic to be valid and passed through correctly-behaving device.

There is an interesting discussion point – Could a 64B frame with vlan tag exist? Seemingly in contradiction with my earlier statement – the answer is yes as the 802.1Q shim is actually taking space from the payload part of the frame. This frame should be even passed through and processed correctly as long as no equipment tries to remove the tag. Once that happens and the vlan tag is removed – the frame becomes a runt (frame smaller than minimal required size) and must be dropped on outbound interface before being sent anywhere.

RFC2544 – Throughput test

Throughput test is rather basic and even the name is rather self-explanatory – it will measure maximal amount of data you can pass through a device or link. The throughput is measured for the distribution of frame sizes mentioned above – one trial for each frame-size.

The frame sizes is first thing that is specified the second variable that is in standard but is not present on any testing equipment I’ve seen is the packet types. The test stream normally consists of UDP unicast datagrams but the standard recommends to use other packet types as well – specifically broadcast frames, SNMP-like management frames and routing-updates-like multicast. As far as I know this recommendations are not being observed and only the unicast UDP streams are being used. In some equipment you can set the L4 header to be TCP but be aware that this TCP doesn’t behave as real TCP would (as there are no ACKs being send and window mechanism being employed on the data stream).

So what are the steps taken in this test and variables you can use to adjust the test?

The steps are are as follows:

Discovery phase (checking the other end of the tested link is reachable)
Learning phase (binary division to evaluate what is the maximal throughput)
Contiguous stream of the frames-size at the speed found in step 2) for time of at least 1 second
Evaluating speed/drops/pattern changes and graphing them

This seems to be pretty straight forward but I would like to stop at 2. as the way this is determined is quite interesting. The method commonly used is called binary division and on this place I would like to show you how it works as even though it is a simple concept it is a bit difficult to find any decent information on it. So let’s assume our equipment can only pass 60Mbps and the line rate negotiated is 100Mbps. The binary division will use the negotiated line speed as default value but if a specific value is set the it would be used as the initial maximum. This is the trialing procedure:

100Mbps – initial maximum – fail (60<100Mbps)
50Mbps – 1/2 of interval 0-100Mbps – success (50<60Mbps)
75Mbps – 1/2 of interval 50-100Mbps – fail (60<75Mbps)
62.5Mbps – 1/2 of interval 50-75Mbps – fail (60<62.5Mbps)
56.25Mbps – 1/2 of interval 50-62.5Mbps – success (56.25<60Mbps)
59.37Mbps – 1/2 of interval 56.25-62.5Mbps – success (59.37<60Mbps)
60.9Mbps – 1/2 of interval 59.37-62.5Mbps – fail (60.9 > 60Mbps)
60.1Mbps – 1/2 of interval 59.37-60.7Mbps – fail (60.1 > 60Mbps)
59.7Mbps – 1/2 of interval 59.37-60.1Mbps – success (59.8<60Mbps)
60Mbps – 1/2 of interval 59.7-60.1Mbps – success (60=60Mbps)

Step 10 is slightly adjusted for brevity but the procedure should be clear now. Any other change in TX rate would result in frames being dropped.

The configurable variables in throughput tests are:

maximal speed (fro equipment with mismatched line rate and throughput speed)
accuracy (acceptable variance in the stream)
number of validations – number of trials for each frame size

The results of the throughput test should be represented in a table and a graph. The advertised throughput should be based on result of this test with 64B frames.

So this seems like a pretty straight forward test but it is one of the most misinterpreted test as well. The problem is that most of people involved in evaluating the result forget to count all the overheads that are present in a Ethernet/IP network. What do I mean by this ? Well the speed is of any Ethernet link has a face value of 10/100/1000 Mbps which is a line rate on physical layer so if you want to calculate what is the effective throughput on L2 you must discount all the Ethernet framing – you think it cannot be that much ? Well let’s do the math for the worst case scenario – 64B frame as that is the one that based on the standard should be the compared benchmark.

The Ethernet frame in numbers looks like this: 7B preamble + 1B of delimiter + 12B for addressing + (optional 4 Bytes for 8021Q shim) + 2B of type/length + 48B of payload +4 B CRC + 12B of inter-frame gap so in total we have a 38B overhead to send 48B of L3 data (without using 802.1Q tag). So what does this give us ? It all depend on what the tester will and will not take in account. Most testers will disregard the 8B of preamble and 12B of inter-frame gap and will count only the logical frame that they can analyze so out of 80B line rate you will have 64B of measured traffic. So where does that leaves us the expected results are re-calculated maximal achievable speeds for the basic frame size distribution on a fast Ethernet link:

Frame size	pps	Mbps
64	14880	76.1
128	8445	86.4
265	4528	92.7
512	2349	96.2
768	1586	97.4
1024	1197	98
1280	961	98.4
1518	812	98.6

So as you can see the efficiency of the throughput has linear dependency on the frame size but never reaches 100% due to the previously discussed overheads. In telco you can still encounter a lot of devices that at least internally use ATM. This mechanism can be proprietary but the general idea is described in RFC2684. Why I am mentioning this is that you can actually spot this while evaluating the throughput test as the Ethernet-to-ATM encapsulation is very inefficient with frames of about 100B as the ATM cell has only 48B of payload so splitting a 100B frame would result in creating 3 ATM cells where the last one would be defectively empty leading to almost 30% inefficiency as compared to Ethernet line rate.

RFC2544 Latency test

Latency is the the delay between frame being sent and its reception on the other end of the measured link. There are two main types of latency we can encounter in common situations – The first one is One Way Delay where the latency is being precisely measured on a unidirectional stream. The second and more common Round Trip Time which is a a default most scenarios (like in ICMP ping) but in our case – if you use reflected traffic. It is important to remember that RTT is cumulative and the delay might be different in upstream and downstream so even though a fairly advanced calculations are being used in order to achieve the best possible outcome it should alway be taken as an indication measurement. The Latency mechanism described for this test is closer to One Way Delay but the way it is calculated or what exactly is being displayed in depends on the tester.

The latency test will use all frame sizes from the standard distribution and the maximal speed measured in the throughput test for each one of them. For every trial there is a minimum of 120 seconds for which the test should run. Every trial should also be repeated 20 times which should be configurable as this is excessively long for most cases.

There are two modes of how to measure the latency – cut-thru (bit-forwarding) or store-and-forward these are well defined in RFC1242 and which usually will have different results (because of way the modes work and when the measurement takes place). This has one major disadvantage – these modes have been selected in an environment where Ethernet equipment mostly supported those two modes but 14 years later the world looks quite different as the third method called “hybrid” is most common method of forwarding frames in Ethernet networks. This is why I would recommend to look on the cut-thru method results rather than the store-and-forward.

The most important thing one must know about latency testing is that the result will be always displayed as an average of all trials for one frame-size with optional additional information (like min,max,deltas,means etc.).

Some testers will also provide you with information about jitter but be aware that it is not part of the RFC2544 even though it is very important thing to observe especially in voice-enabled environment.

Frame Loss

The frame loss test is using the previously determined maximal frame rate from the throughput test so it always should be run in conjunction with it. It is simply trying all frame sizes on a frame rate that was determined and checks what percentage is being lost. This test is being run in steps in which every other step is using smaller throughput (by 10%) until the ratio between send and received frames is 1 (i.e. all frames are received). Obviously no lost frames on maximal throughput rate is required for this test to be considered successful.

RFC2544 Back-to-Back test

The back-to-back test has been put in place to test equipment’s behavior with presence of bursty traffic or in other words operation of buffers. As per all previous test each frame size will have its own trial at which the frames will be send in burst defined in seconds with minimal inter-frame delay (also known as line-rate). The initial burs must be at least 2 second long. When the number of received frames is not equal to the number of sent frames then the burst is shortened by one frame per trial until the maximum is found. Trial for each frame size should be repeated 50 times.

The important thing is as there are multiple trials the result will always be presented as average of those trials. Some testing equipment can provide more information like standard deviation and separate stats on each trial.

RFC2544 System recovery test (from overload)

This is a bit odd and probably obsolete test but it is worth of mentioning even though I haven’t seen any testing equipment that has it implemented as the idea behind it is still very valid. The test is in place to subject the Equipment to a condition at which it will be overloaded with 110% of traffic (from the throughput measurements) for minimum of 60 seconds after which the tester will drop the TX to 50%. The aim of this construction is to measure the time difference between the switch from 110% to 50% on tester and actually receiving the 50%. The reason for this is to test that there are no problems with buffering (underflows/overflows/reading/writing etc.) and also the amount of buffering and backlog processing.

The main issue of this test is that 110% usually makes a very small difference and most equipment can deal with it quite well. The other problem is that equipment that can do line-rate speeds is basically impossible to test (or at least it is very impractical). Also this test was intended for equipment with no or limited QoS capabilities which you will not find nowadays outside SOHO segment (and even there they are rare).

RFC2544 Reset test

This is the last test in the suite. Its purpose is to measure time from outage caused by either hardware or software reboot till full service restoration. This is a very useful test as it can show various race conditions – like what will happen if you hammer the interface throughout the start-up sequence. But for the most mart everyone is only interested in the fact that the unit will boot (and the approximate time).

Conclusion

As you can see the tests of this suite are fairly simple but the interpretation can be confusing. I think it is clear that the throughput, frame loss and latency test can be used for link activation/acceptance but have some rather serious flaws as they haven’t been designed for this purpose. I next article I will discuss the telco standard BERT testing and the newer service activation standard of Y.156sam also known as EtherSAM for link activation.