"Life is all about sharing. If we are good at something, pass it on." - Mary Berry

Troubleshooting slow network file transfer

visitor badge

2021-06-25

Categories: Networking

We have a Hadoop data cluster. Recently, my colleague noticed that data transfer between servers was slow. Not all are slow; only half of them have problems.

The first tool that comes to mind is iperf3. After installing, I run iperf3 -s then iperf3 -c. While other servers have bandwidth > 900 Mbits/sec, the slow ones are just:

 1[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
 2[  4]   0.00-1.00   sec  11.9 MBytes  99.4 Mbits/sec    0    170 KBytes
 3[  4]   1.00-2.00   sec  11.1 MBytes  93.3 Mbits/sec    0    204 KBytes
 4[  4]   2.00-3.00   sec  10.9 MBytes  91.7 Mbits/sec    0    204 KBytes
 5[  4]   3.00-4.00   sec  8.26 MBytes  69.3 Mbits/sec  102    154 KBytes
 6[  4]   4.00-5.00   sec  10.9 MBytes  91.8 Mbits/sec    3    157 KBytes
 7[  4]   5.00-6.00   sec  10.9 MBytes  91.7 Mbits/sec    0    165 KBytes
 8[  4]   6.00-7.00   sec  10.9 MBytes  91.7 Mbits/sec    0    168 KBytes
 9[  4]   7.00-8.00   sec  10.9 MBytes  91.7 Mbits/sec    0    173 KBytes
10[  4]   8.00-9.00   sec  10.9 MBytes  91.7 Mbits/sec    0    173 KBytes
11[  4]   9.00-10.00  sec  10.9 MBytes  91.7 Mbits/sec    0    173 KBytes
12- - - - - - - - - - - - - - - - - - - - - - - - -
13[ ID] Interval           Transfer     Bandwidth       Retr
14[  4]   0.00-10.00  sec   108 MBytes  90.4 Mbits/sec  105             sender
15[  4]   0.00-10.00  sec   107 MBytes  89.5 Mbits/sec                  receiver

Hell, why is that?

I wonder is there any way to know which switch port connected to these servers? A quick Google search lead me to: https://www.lazysystemadmin.com/2011/09/find-out-which-switch-port-connected.html

1# tcpdump -v -i eno1 -s 1500 -c 1 '(ether[20:2]=0x2000 or ether[12:2]=0x88cc)'

Normal server:

1        Platform (0x06), value length: 20 bytes: 'cisco WS-C3560G-48TS'
2        Port-ID (0x03), value length: 18 bytes: 'GigabitEthernet0/6'

Slow one:

1        Platform (0x06), value length: 19 bytes: 'cisco WS-C3560-48TS'
2        Port-ID (0x03), value length: 15 bytes: 'FastEthernet0/5'

Do you see the difference?

Tags: iperf3 tcpdump

Edit on GitHub

Related Posts: