{"id":303,"date":"2018-11-01T00:01:46","date_gmt":"2018-11-01T00:01:46","guid":{"rendered":"https:\/\/www.kuncar.net\/blog\/?p=303"},"modified":"2018-11-01T00:05:13","modified_gmt":"2018-11-01T00:05:13","slug":"mtu-and-tcp-mss-clamping","status":"publish","type":"post","link":"https:\/\/www.kuncar.net\/blog\/2018\/mtu-and-tcp-mss-clamping\/","title":{"rendered":"MTU and TCP MSS clamping"},"content":{"rendered":"<p style=\"text-align: justify;\"><img loading=\"lazy\" decoding=\"async\" class=\"alignleft size-thumbnail wp-image-213\" src=\"https:\/\/www.kuncar.net\/blog\/wp-content\/uploads\/2018\/02\/juniper_logo2-150x150.png\" alt=\"\" width=\"150\" height=\"150\" srcset=\"https:\/\/www.kuncar.net\/blog\/wp-content\/uploads\/2018\/02\/juniper_logo2-150x150.png 150w, https:\/\/www.kuncar.net\/blog\/wp-content\/uploads\/2018\/02\/juniper_logo2-100x100.png 100w, https:\/\/www.kuncar.net\/blog\/wp-content\/uploads\/2018\/02\/juniper_logo2.png 195w\" sizes=\"auto, (max-width: 150px) 100vw, 150px\" \/><span style=\"font-family: arial, helvetica, sans-serif;\">So I have had many conversation over the years in regards of that is MTU and how does it work and what is the relationship between frame\/packet\/datagram sizes. Despite the fact that this is actually fairly simple there seems to be a lot of confusion on this topic so that is why this article come about.<\/span><\/p>\n<h4><span style=\"font-family: arial, helvetica, sans-serif;\">Ethernet<\/span><\/h4>\n<p style=\"text-align: justify;\"><span style=\"font-family: arial, helvetica, sans-serif;\">So the basics are quite simple minimal legal Ethernet frame length is 64B without the VLAN shim and 68B with it. There is a corner case with frame being 64B and having a VLAN tag (which is a valid frame until it reaches the de-capsulation point) but that is not really relevant here. In reality most equipment only adds encapsulation on top of the existing frame so this never really happens outside test environment.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: arial, helvetica, sans-serif;\">The maximal defined legal length of Ethernet frame is 1518B with all headers and shims. It is implied that all the additional headers should be taken from the payload data space &#8211; yes from those 1500B but that leads to very high overhead resulting in sometimes quite bad data transport inefficiency. Most devices will actually allow frames that are larger than 1518B. As the size limitations became very obvious most vendors introduced and adopted jumbo-frames. These are non-standard size Ethernet frames usually defined as over 9000B. The size of jumbo frame accepted by device differs from vendor to vendor and appliance to appliance. Frames that are between the Jumbo frames and the 1518B legal size\u00a0 are usually called MiniGiants.<\/span><\/p>\n<h6><span style=\"font-family: arial, helvetica, sans-serif;\">Setting Ethernet (L2) MTU on juniper router\u00a0<\/span><\/h6>\n<pre>ge-0\/0\/0 {\r\n    mtu 1518;\r\n    unit 0 {\r\n        family inet {\r\n            address 10.0.0.0\/31;\r\n        }\r\n    }\r\n}<\/pre>\n<p><span style=\"font-family: arial, helvetica, sans-serif;\">The above will only set Ethernet MTU and it is a safe value to use in most cases.<\/span><\/p>\n<h4><span style=\"font-family: arial, helvetica, sans-serif;\">IP<\/span><\/h4>\n<p style=\"text-align: justify;\"><span style=\"font-family: arial, helvetica, sans-serif;\">The minimal size of IP packet 46B and maximal is 1500B including the IP headers. The smallest possible header for IPv4 is 20B the larges total length of an IP packet is\u00a065535. This is more or less theoretical as even jumbo frames are usually only 9000B on IP layer. There are some instances where this number can be even larger. The most important thing to note here is that the Ethernet MTU and IP MTU are two different and separate numbers though they are interdependent.<\/span><\/p>\n<h6><span style=\"font-family: arial, helvetica, sans-serif;\">Setting IP (L3) MTU on juniper router\u00a0<\/span><\/h6>\n<pre>ge-0\/0\/0 {\r\n    unit 0 {\r\n        family inet {\r\n            mtu 1500;\r\n            address 10.0.0.0\/31;\r\n        }\r\n    }\r\n}<\/pre>\n<p><span style=\"font-family: arial, helvetica, sans-serif;\">The above will only set IP MTU. This is a safe default to use in most cases.<\/span><\/p>\n<h4><span style=\"font-family: arial, helvetica, sans-serif;\">Issues with segmentation<\/span><\/h4>\n<p style=\"text-align: justify;\"><span style=\"font-family: arial, helvetica, sans-serif;\">There is a lot of issues with incorrectly set MTU and\/or mismatched L2\/L3 MTUs. In IPv4 this is solved usually solved by fragmentation on the router that has smaller MTU on egress interface. But the fragmentation process is very demanding on the router and will result in heightened CPU and buffer utilization &#8211; sometimes to the point for dropping the fragmented packet and subsequent re-transmission by higher layer protocols.\u00a0 This issues are exacerbated\u00a0 in\u00a0 scenarios with IPsec tunneling as the checksums calculated on the payload would be invalidated by fragmentation. Thus the<\/span><span style=\"font-family: arial, helvetica, sans-serif;\"> DF bit is set\u00a0to 1 &#8211; do-not fragment. Similar issues<\/span><span style=\"font-family: arial, helvetica, sans-serif;\">\u00a0could be encountered while using non-native Ethernet WANs with additional encapsulation headers.<\/span><\/p>\n<h4><span style=\"font-family: arial, helvetica, sans-serif;\">Path MTU Discovery<\/span><\/h4>\n<p style=\"text-align: justify;\"><span style=\"font-family: arial, helvetica, sans-serif;\">To check the minimal common MTU for the whole path the PMTU tool can be used. It uses the don&#8217;t fragment flag (DF) in the IP header and on top it is usually ICMP. The mechanism is simple as the sender transmits largest packet possible to each next hop in traceroute-like fashion and when the packet is too large the hop will send back ICMP message &#8220;fragmentation needed&#8221;. Then the sender decreases the MTU size until it will get through the ultimate hop.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: arial, helvetica, sans-serif;\">This method of MTU discovery is a bit crude and has some disadvantages but is quite useful for fist view of the problem and it is a good way to start if investigating this type of issues.\u00a0<\/span><\/p>\n<h4><span style=\"font-family: arial, helvetica, sans-serif;\">TCP<\/span><\/h4>\n<p style=\"text-align: justify;\"><span style=\"font-family: arial, helvetica, sans-serif;\">TCP header is between 20B and 60B in size with maximal segment size of 65415B. In TCP terms segment is the payload without any headers (IP or TCP itself). The segment size is declared to the receiving end of the 3-way handshake but it can be altered by L4 capable devices in the path of the syn packet. This is a bit complicated to explain so let&#8217;s see how the 3-way handshake looks like in real life.<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignleft size-full wp-image-310\" src=\"https:\/\/www.kuncar.net\/blog\/wp-content\/uploads\/2018\/06\/topo-tcp-mss.png\" alt=\"\" width=\"1023\" height=\"182\" srcset=\"https:\/\/www.kuncar.net\/blog\/wp-content\/uploads\/2018\/06\/topo-tcp-mss.png 1023w, https:\/\/www.kuncar.net\/blog\/wp-content\/uploads\/2018\/06\/topo-tcp-mss-150x27.png 150w, https:\/\/www.kuncar.net\/blog\/wp-content\/uploads\/2018\/06\/topo-tcp-mss-300x53.png 300w, https:\/\/www.kuncar.net\/blog\/wp-content\/uploads\/2018\/06\/topo-tcp-mss-768x137.png 768w\" sizes=\"auto, (max-width: 767px) 89vw, (max-width: 1000px) 54vw, (max-width: 1071px) 543px, 580px\" \/><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: arial, helvetica, sans-serif;\">In our example PC-1 is initiating the TCP session towards PC-2. There is an IPsec tunnel between R1 and R2 on which we can set the TCP clamping and see the result on the traffic between the end-hosts.<\/span><\/p>\n<p><span style=\"font-family: arial, helvetica, sans-serif;\">The capture below is the 3-way handshake as seen on eth0 of PC-1.<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-311 aligncenter\" src=\"https:\/\/www.kuncar.net\/blog\/wp-content\/uploads\/2018\/06\/tcp-mss-pc1-initial.png\" alt=\"\" width=\"1108\" height=\"721\" srcset=\"https:\/\/www.kuncar.net\/blog\/wp-content\/uploads\/2018\/06\/tcp-mss-pc1-initial.png 1108w, https:\/\/www.kuncar.net\/blog\/wp-content\/uploads\/2018\/06\/tcp-mss-pc1-initial-150x98.png 150w, https:\/\/www.kuncar.net\/blog\/wp-content\/uploads\/2018\/06\/tcp-mss-pc1-initial-300x195.png 300w, https:\/\/www.kuncar.net\/blog\/wp-content\/uploads\/2018\/06\/tcp-mss-pc1-initial-768x500.png 768w, https:\/\/www.kuncar.net\/blog\/wp-content\/uploads\/2018\/06\/tcp-mss-pc1-initial-1024x666.png 1024w\" sizes=\"auto, (max-width: 767px) 89vw, (max-width: 1000px) 54vw, (max-width: 1071px) 543px, 580px\" \/><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: arial, helvetica, sans-serif;\">You can notice two things here &#8211; the initial window size is set to 2920B which will fit exactly two TCP segments of the proposed size of 1460B. Why it is 2 segments and not just one or three ? Well 2 is a default segment multiplier as implemented in Linux TCP\/IP stack\u00a0 so the smallest window will always contain at least 2 segments.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: arial, helvetica, sans-serif;\">In the tunnel interface setup I have configured the TCP MSS clamping in order to alter the values in the syn packet to 1000B before it will the IPsec tunnel between R1 and R2. This will signal the max Segment size to the remote end of the TCP session.<\/span><\/p>\n<p><span style=\"font-family: arial, helvetica, sans-serif;\">This is how to configure the clamp:<\/span><\/p>\n<pre>root@R1# show security flow\r\ntcp-mss {\r\n    ipsec-vpn {\r\n        mss 1000;\r\n    }\r\n}<\/pre>\n<p><span style=\"font-family: arial, helvetica, sans-serif;\">The result on the eth0 on PC-2 reflect this as you can see in the received pcap.<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-312 aligncenter\" src=\"https:\/\/www.kuncar.net\/blog\/wp-content\/uploads\/2018\/06\/tcp-mss-pc2-initial.png\" alt=\"\" width=\"1069\" height=\"809\" srcset=\"https:\/\/www.kuncar.net\/blog\/wp-content\/uploads\/2018\/06\/tcp-mss-pc2-initial.png 1069w, https:\/\/www.kuncar.net\/blog\/wp-content\/uploads\/2018\/06\/tcp-mss-pc2-initial-150x114.png 150w, https:\/\/www.kuncar.net\/blog\/wp-content\/uploads\/2018\/06\/tcp-mss-pc2-initial-300x227.png 300w, https:\/\/www.kuncar.net\/blog\/wp-content\/uploads\/2018\/06\/tcp-mss-pc2-initial-768x581.png 768w, https:\/\/www.kuncar.net\/blog\/wp-content\/uploads\/2018\/06\/tcp-mss-pc2-initial-1024x775.png 1024w\" sizes=\"auto, (max-width: 767px) 89vw, (max-width: 1000px) 54vw, (max-width: 1071px) 543px, 580px\" \/><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: arial, helvetica, sans-serif;\">The receiving end gets the information about the decreased segment size. But in the SYN ACK response message it will send the default segment size (in my case 1460B (which is my interface&#8217;s IP MTU of 1500 with 40B TCP overhead deducted). On PC-1 we would see that this option\/value is neighter changed by the above mentioned configuration and nor by the TCP handshake process. This seem weird but in this case it is a configuration issue as the IPsec tunnel should have the MSS clamping applied symmetrically on both ends of the tunnel.<\/span><\/p>\n<pre>root@R2# show security flow\r\ntcp-mss {\r\n    ipsec-vpn {\r\n        mss 1000;\r\n    }\r\n}<\/pre>\n<p style=\"text-align: justify;\"><span style=\"font-family: arial, helvetica, sans-serif;\">The issue is if you need to do TCP clamping but you don&#8217;t necessarily have control of both ends of the tunnel yet you need to clamp both the SYN and the returning SYN ACK so the path would have symmetric setup for the MTU\/MSS. The reason for this is that you not only want to avoid fragmentation (or drops in IPsec case) but also because this could lead to <\/span><span style=\"font-family: arial, helvetica, sans-serif;\"> following\u00a0<\/span><span style=\"font-family: arial, helvetica, sans-serif;\">traffic pattern in the TCP windowing:<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-313 aligncenter\" src=\"https:\/\/www.kuncar.net\/blog\/wp-content\/uploads\/2018\/06\/tcp-oscillation.jpg\" alt=\"\" width=\"578\" height=\"389\" srcset=\"https:\/\/www.kuncar.net\/blog\/wp-content\/uploads\/2018\/06\/tcp-oscillation.jpg 578w, https:\/\/www.kuncar.net\/blog\/wp-content\/uploads\/2018\/06\/tcp-oscillation-150x101.jpg 150w, https:\/\/www.kuncar.net\/blog\/wp-content\/uploads\/2018\/06\/tcp-oscillation-300x202.jpg 300w\" sizes=\"auto, (max-width: 578px) 100vw, 578px\" \/><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: arial, helvetica, sans-serif;\">This is known as TCP oscillation or TCP saw-tooth and every of this graph peak is a point where the TCP window is reset resulting in re-transmissions with extremely poor performance and instability of the connections across the link.\u00a0<\/span><span style=\"font-family: arial, helvetica, sans-serif;\">Fortunately this issue can be resolved in multiple ways &#8211; in configuration of your\u00a0 device it could be achieved by\u00a0<\/span><span style=\"font-family: arial, helvetica, sans-serif;\">global TCP mss configuration.<\/span><\/p>\n<pre>root@R1# show security flow\r\ntcp-mss {\r\n    all-tcp {\r\n        mss 1000;\r\n    }\r\n}<\/pre>\n<p style=\"text-align: justify;\"><span style=\"font-family: arial, helvetica, sans-serif;\">This setting will intercept any TCP SYN or SYN ACK datagrams and will adjust the MSS size accordingly. This might be a bit of a too harsh of a solution as it\u00a0 impacts all TCP traffic passed through the device but it can be useful.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: arial, helvetica, sans-serif;\">There is also a possibility that\u00a0<\/span><span style=\"font-family: arial, helvetica, sans-serif;\"><span style=\"font-size: 1rem;\">the upper layer protocols will take care of things for you or that some external tool, like the aforementioned PMTU discovery, will instruct them <\/span>regardless<span style=\"font-size: 1rem;\">\u00a0of the path&#8217;s configuration.\u00a0<\/span><\/span><\/p>\n<h4><span style=\"font-family: arial, helvetica, sans-serif;\">Configuring and Testing the MTU and MSS clamping on a IPsec\/VPN tunnel<\/span><\/h4>\n<p style=\"text-align: justify;\"><span style=\"font-family: arial, helvetica, sans-serif;\">First it is important to make very clear statement that should be obvious by now from the previous parts of the article but the IP and Ethernet MTU are not the same &#8211; they must differ by at least 14 Bytes but usually I would set them 18B apart to accommodate a single vlan (802.1Q) shim. If you actually set the same values for the L2 and L3 MTU junos will actually warn you that it is an invalid configuration:\u00a0<\/span><\/p>\n<pre>[edit interfaces ge-0\/0\/0 unit 0 family]\r\n'inet'\r\nFamily MTU 1500 is too large relative to device MTU 1500; Protocol overhead should be 14<\/pre>\n<p>So the correct interface config should look something like this :<\/p>\n<pre>ge-0\/0\/0 {\r\n    mtu 1518\r\n    unit 0 {\r\n        family inet {\r\n            mtu 1500;\r\n            address 10.0.0.0\/31;\r\n        }\r\n    }\r\n}<\/pre>\n<p><span style=\"font-family: arial, helvetica, sans-serif;\">It is always good practice to set both values. The issue is how to decide to what to set the MSS and where.<\/span><\/p>\n<p>The\u00a0<span class=\"grey\" style=\"font-family: arial, helvetica, sans-serif;\"><a href=\"https:\/\/tools.ietf.org\/html\/rfc879\">RFC 879<\/a>\u00a0suggest the following (conservative) formula:<\/span><\/p>\n<pre class=\"newpage\">MSS = MTU - 60 - 60 = MTU - 120<\/pre>\n<p style=\"text-align: justify;\"><span style=\"font-family: arial, helvetica, sans-serif;\">So in our case that would be 1500 &#8211; 120=1380. This is quite ineffective and assumes that the IP header will be full 60B long &#8211; which is not our case as only 20B header is used in this test so we could use the more optimistic formula:<\/span><\/p>\n<pre class=\"newpage\">MSS = MTU - 20 - 20 = MTU - 40<\/pre>\n<p style=\"text-align: justify;\"><span style=\"font-family: arial, helvetica, sans-serif;\">This would result in MSS of 1460B which sounds much more reasonable. So now we need to apply it to the IPsec traffic in both directions and we&#8217;re done.<\/span><\/p>\n<pre>root@R1# show security flow\r\ntcp-mss {\r\n    ipsec-vpn {\r\n        mss 1460;\r\n    }\r\n}<\/pre>\n<h4><span style=\"font-family: arial, helvetica, sans-serif;\">Final note on testing<\/span><\/h4>\n<p style=\"text-align: justify;\"><span style=\"font-family: arial, helvetica, sans-serif;\">There are couple things to remember if you will be testing this from the juniper devices.<\/span><\/p>\n<ol class=\"ili-indent\">\n<li><span style=\"font-family: arial, helvetica, sans-serif;\">The MSS setting will not affect ICMP (duh!) so if you want to test the MTU size\u00a0 the st0 unit must be set with relevant IP MTU.<\/span><\/li>\n<li><span style=\"font-family: arial, helvetica, sans-serif;\">Junos adds headers on top of the size you defined as the size is really length of the payload.<\/span><\/li>\n<\/ol>\n<p><span style=\"font-family: arial, helvetica, sans-serif;\">So in effect the following command:<\/span><\/p>\n<pre>ping 10.0.0.1 size 1400\r\n<\/pre>\n<p><span style=\"font-family: arial, helvetica, sans-serif;\">will result in frame that is 1442B long. The breakdown of the packet makeup is shown below.<\/span><\/p>\n<table style=\"border-collapse: collapse; width: 100%;\" border=\"1\">\n<tbody>\n<tr>\n<td style=\"width: 25%;\">14B ethernet<\/td>\n<td style=\"width: 25%;\">20B IP<\/td>\n<td style=\"width: 25%;\">8B ICMP<\/td>\n<td style=\"width: 25%;\">1400 payload<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p style=\"text-align: justify;\"><span style=\"font-family: arial, helvetica, sans-serif;\">If there is a PC ideally a linux box that can be used for the end-to-end path MTU discovery the tracepath application is what you could use. There is great article on <a title=\"tracepath\" href=\"http:\/\/packetlife.net\/blog\/2008\/aug\/18\/path-mtu-discovery\/\">packetlife<\/a> that has some nice pictures and examples.<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>So I have had many conversation over the years in regards of that is MTU and how does it work and what is the relationship between frame\/packet\/datagram sizes. Despite the fact that this is actually fairly simple there seems to be a lot of confusion on this topic so that is why this article come &hellip; <\/p>\n<p class=\"link-more\"><a href=\"https:\/\/www.kuncar.net\/blog\/2018\/mtu-and-tcp-mss-clamping\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;MTU and TCP MSS clamping&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[7,4,5,14],"tags":[31,28,27,29,30],"class_list":["post-303","post","type-post","status-publish","format-standard","hentry","category-juniper","category-linux","category-networks","category-testing","tag-ipsec","tag-mss","tag-mtu","tag-tcp","tag-windowing"],"_links":{"self":[{"href":"https:\/\/www.kuncar.net\/blog\/wp-json\/wp\/v2\/posts\/303","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.kuncar.net\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.kuncar.net\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.kuncar.net\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.kuncar.net\/blog\/wp-json\/wp\/v2\/comments?post=303"}],"version-history":[{"count":15,"href":"https:\/\/www.kuncar.net\/blog\/wp-json\/wp\/v2\/posts\/303\/revisions"}],"predecessor-version":[{"id":375,"href":"https:\/\/www.kuncar.net\/blog\/wp-json\/wp\/v2\/posts\/303\/revisions\/375"}],"wp:attachment":[{"href":"https:\/\/www.kuncar.net\/blog\/wp-json\/wp\/v2\/media?parent=303"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.kuncar.net\/blog\/wp-json\/wp\/v2\/categories?post=303"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.kuncar.net\/blog\/wp-json\/wp\/v2\/tags?post=303"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}