1. OVS进程调查

10.64.16.21机器上,41320号进程一直100%CPU,这个线程是: ovs-vswitchd --pidfile --detach --log-file ovs-vswitchd是由systemd启动的一组线程 下面我想查看这个线程在干什么,导致cpu占用100%

1.1. pmap 查看地址空间

$ sudo pmap -p 41246 -x
Address           Kbytes     RSS   Dirty Mode  Mapping
0000000000400000    1856    1856       0 r-x-- /usr/local/sbin/ovs-vswitchd
00000000005d0000     256     256     256 rw--- /usr/local/sbin/ovs-vswitchd
0000fff780000000  524288       0       0 rw-s- /dev/hugepages/rtemap_0
0000fff7a0000000  524288       0       0 rw-s- /dev/hugepages/rtemap_1
0000fff7c0000000  524288       0       0 rw-s- /dev/hugepages/rtemap_127
0000fff7e0000000  524288       0       0 rw-s- /dev/hugepages/rtemap_126
0000ffff983d0000      64       0       0 -w-s- /dev/infiniband/uverbs1
0000ffff983e0000      64       0       0 -w-s- /dev/infiniband/uverbs0
0000ffff998d0000      64      64      64 r---- /usr/lib64/libmlx5.so.1.3.16.0
0000ffff99b00000      64      64      64 r---- /usr/lib64/libc-2.17.so
0000ffff99f50000      64      64       0 r-x-- /usr/src/dpdk-stable-17.11.3/arm64-armv8a-linuxapp-gcc/lib/librte_vhost.so.4.1 (deleted)
0000ffff99f60000     128     128     128 rw--- /usr/src/dpdk-stable-17.11.3/arm64-armv8a-linuxapp-gcc/lib/librte_vhost.so.4.1 (deleted)
0000ffff99f80000      64      64       0 r-x-- /usr/src/dpdk-stable-17.11.3/arm64-armv8a-linuxapp-gcc/lib/librte_timer.so.1.1 (deleted)
0000ffff99f90000      64      64      64 rw--- /usr/src/dpdk-stable-17.11.3/arm64-armv8a-linuxapp-gcc/lib/librte_timer.so.1.1 (deleted)
0000ffff99fa0000     128      64       0 r-x-- /usr/src/dpdk-stable-17.11.3/arm64-armv8a-linuxapp-gcc/lib/librte_table.so.3.1 (deleted)
0000ffff99fc0000      64      64      64 rw--- /usr/src/dpdk-stable-17.11.3/arm64-armv8a-linuxapp-gcc/lib/librte_table.so.3.1 (deleted)
0000ffff99fd0000      64      64       0 r-x-- /usr/src/dpdk-stable-17.11.3/arm64-armv8a-linuxapp-gcc/lib/librte_security.so.1.1 (deleted)
0000ffff99fe0000      64      64      64 rw--- /usr/src/dpdk-stable-17.11.3/arm64-armv8a-linuxapp-gcc/lib/librte_security.so.1.1 (deleted)
0000ffff99ff0000      64      64       0 r-x-- /usr/src/dpdk-stable-17.11.3/arm64-armv8a-linuxapp-gcc/lib/librte_sched.so.1.1 (deleted)
0000ffff9a000000      64      64      64 rw--- /usr/src/dpdk-stable-17.11.3/arm64-armv8a-linuxapp-gcc/lib/librte_sched.so.1.1 (deleted)
0000ffff9a010000      64      64       0 r-x-- /usr/src/dpdk-stable-17.11.3/arm64-armv8a-linuxapp-gcc/lib/librte_ring.so.1.1 (deleted)
0000ffff9a020000      64      64      64 rw--- /usr/src/dpdk-stable-17.11.3/arm64-armv8a-linuxapp-gcc/lib/librte_ring.so.1.1 (deleted)
0000ffff9a030000      64      64       0 r-x-- /usr/src/dpdk-stable-17.11.3/arm64-armv8a-linuxapp-gcc/lib/librte_reorder.so.1.1 (deleted)
0000ffff9a040000      64      64      64 rw--- /usr/src/dpdk-stable-17.11.3/arm64-armv8a-linuxapp-gcc/lib/librte_reorder.so.1.1 (deleted)
0000ffff9a050000      64      64       0 r-x-- /usr/src/dpdk-stable-17.11.3/arm64-armv8a-linuxapp-gcc/lib/librte_power.so.1.1 (deleted)
0000ffff9a060000      64      64      64 rw--- /usr/src/dpdk-stable-17.11.3/arm64-armv8a-linuxapp-gcc/lib/librte_power.so.1.1 (deleted)
0000ffff9a080000     128      64       0 r-x-- /usr/src/dpdk-stable-17.11.3/arm64-armv8a-linuxapp-gcc/lib/librte_port.so.3.1 (deleted)
0000ffff9a0a0000      64      64      64 rw--- /usr/src/dpdk-stable-17.11.3/arm64-armv8a-linuxapp-gcc/lib/librte_port.so.3.1 (deleted)
0000ffff9a0b0000      64      64       0 r-x-- /usr/src/dpdk-stable-17.11.3/arm64-armv8a-linuxapp-gcc/lib/librte_pmd_vmxnet3_uio.so.1.1 (deleted)
0000ffff9a0c0000      64      64      64 rw--- /usr/src/dpdk-stable-17.11.3/arm64-armv8a-linuxapp-gcc/lib/librte_pmd_vmxnet3_uio.so.1.1 (deleted)
0000ffff9a0d0000     128     128       0 r-x-- /usr/src/dpdk-stable-17.11.3/arm64-armv8a-linuxapp-gcc/lib/librte_pmd_virtio.so.1.1 (deleted)
0000ffff9a0f0000      64      64      64 rw--- /usr/src/dpdk-stable-17.11.3/arm64-armv8a-linuxapp-gcc/lib/librte_pmd_virtio.so.1.1 (deleted)
0000ffff9a100000      64      64       0 r-x-- /usr/src/dpdk-stable-17.11.3/arm64-armv8a-linuxapp-gcc/lib/librte_pmd_vhost.so.2.1 (deleted)
0000ffff9a110000      64      64      64 rw--- /usr/src/dpdk-stable-17.11.3/arm64-armv8a-linuxapp-gcc/lib/librte_pmd_vhost.so.2.1 (deleted)
0000ffff9a120000     128     128       0 r-x-- /usr/src/dpdk-stable-17.11.3/arm64-armv8a-linuxapp-gcc/lib/librte_pmd_thunderx_nicvf.so.1.1 (deleted)
0000ffff9a140000      64      64      64 rw--- /usr/src/dpdk-stable-17.11.3/arm64-armv8a-linuxapp-gcc/lib/librte_pmd_thunderx_nicvf.so.1.1 (deleted)
0000ffff9a150000      64      64       0 r-x-- /usr/src/dpdk-stable-17.11.3/arm64-armv8a-linuxapp-gcc/lib/librte_pmd_tap.so.1.1 (deleted)
0000ffff9a160000      64      64      64 rw--- /usr/src/dpdk-stable-17.11.3/arm64-armv8a-linuxapp-gcc/lib/librte_pmd_tap.so.1.1 (deleted)
0000ffff9a170000      64      64       0 r-x-- /usr/src/dpdk-stable-17.11.3/arm64-armv8a-linuxapp-gcc/lib/librte_pmd_sw_event.so.1.1 (deleted)
0000ffff9a180000      64      64      64 rw--- /usr/src/dpdk-stable-17.11.3/arm64-armv8a-linuxapp-gcc/lib/librte_pmd_sw_event.so.1.1 (deleted)
0000ffff9a190000      64      64       0 r-x-- /usr/src/dpdk-stable-17.11.3/arm64-armv8a-linuxapp-gcc/lib/librte_pmd_softnic.so.1.1 (deleted)
0000ffff9a1a0000      64      64      64 rw--- /usr/src/dpdk-stable-17.11.3/arm64-armv8a-linuxapp-gcc/lib/librte_pmd_softnic.so.1.1 (deleted)
0000ffff9a1b0000      64      64       0 r-x-- /usr/src/dpdk-stable-17.11.3/arm64-armv8a-linuxapp-gcc/lib/librte_pmd_skeleton_event.so.1.1 (deleted)
0000ffff9a1c0000      64      64      64 rw--- /usr/src/dpdk-stable-17.11.3/arm64-armv8a-linuxapp-gcc/lib/librte_pmd_skeleton_event.so.1.1 (deleted)
0000ffff9a1d0000      64      64       0 r-x-- /usr/src/dpdk-stable-17.11.3/arm64-armv8a-linuxapp-gcc/lib/librte_pmd_ring.so.2.1 (deleted)
0000ffff9a1e0000      64      64      64 rw--- /usr/src/dpdk-stable-17.11.3/arm64-armv8a-linuxapp-gcc/lib/librte_pmd_ring.so.2.1 (deleted)
0000ffff9a1f0000     320     128       0 r-x-- /usr/src/dpdk-stable-17.11.3/arm64-armv8a-linuxapp-gcc/lib/librte_pmd_qede.so.1.1 (deleted)
0000ffff9a240000      64      64      64 rw--- /usr/src/dpdk-stable-17.11.3/arm64-armv8a-linuxapp-gcc/lib/librte_pmd_qede.so.1.1 (deleted)

注:mmap的file并不是把文件内容复制到内存里,而是把文件映射到进程的虚拟地址空间,进程对这块地址空间读写的时候,OS会以page为单位, 根据情况来选择真正读文件/写文件的时机,比如你写了一段数据,这段数据所在的page在后面的某个时刻,OS决定换出,那么它会被OS真正写入 这个文件,而这个写入的动作可能又经历了文件系统 -> page caching -> IO驱动 -> 物理硬件的复杂过程。

1.2. lsof 查看打开的文件

$ sudo lsof -p 41246
ovs-vswit 41246 root  txt       REG                8,5  14113280 3527027728 /usr/local/sbin/ovs-vswitchd
ovs-vswit 41246 root  mem-R     REG               0,40 536870912     968169 /dev/hugepages/rtemap_0
ovs-vswit 41246 root  mem-R     REG               0,40 536870912     968170 /dev/hugepages/rtemap_1
ovs-vswit 41246 root  mem-R     REG               0,40 536870912     968296 /dev/hugepages/rtemap_127
ovs-vswit 41246 root  mem-R     REG               0,40 536870912     968295 /dev/hugepages/rtemap_126
ovs-vswit 41246 root  mem       CHR            231,193                27211 /dev/infiniband/uverbs1
ovs-vswit 41246 root  mem       CHR            231,192                27210 /dev/infiniband/uverbs0
ovs-vswit 41246 root  mem-w     REG               0,23    208420     789242 /run/.rte_config
ovs-vswit 41246 root    3w      REG                8,5    283688 3527027762 /usr/local/var/log/openvswitch/ovs-vswitchd.log
ovs-vswit 41246 root   15u      CHR             10,196       0t0       2155 /dev/vfio/vfio
ovs-vswit 41246 root   12u  netlink                          0t0     968157 ROUTE
ovs-vswit 41246 root   21u  a_inode               0,13         0       9112 [eventpoll]
ovs-vswit 41246 root   36u      CHR             10,200       0t0       2153 /dev/net/tun
ovs-vswit 41246 root   47u     unix 0xffff8017d901f300       0t0     946033 /usr/local/var/run/openvswitch/ovs-br0.snoop
ovs-vswit 41246 root   51u     unix 0xffff8017d6540000       0t0     953826 /usr/local/var/run/openvswitch/ovs-br0.mgmt
ovs-vswit 41246 root   53u     unix 0xffff8017d6540d80       0t0     953887 /usr/local/var/run/openvswitch/vhost-user0
ovs-vswit 41246 root   58u     unix 0xffff8017d901f780       0t0     946094 /usr/local/var/run/openvswitch/vhost-user1
ovs-vswit 41246 root   59u     unix 0xffff8017d6ee9480       0t0     953199 /usr/local/var/run/openvswitch/vhost-user2
ovs-vswit 41246 root   60u     unix 0xffff8017d6ee5e80       0t0     953211 /usr/local/var/run/openvswitch/vhost-user3

注:

  • 打开的有普通文件,内核设备,管道,unix socket,netlink socket,
  • 这里看到有个log文件,这是调查的好入口

1.3. strace 调查程序动态系统调用

$ sudo strace -p 41246
strace: Process 41246 attached
ppoll([{fd=11, events=POLLIN}, {fd=49, events=POLLIN}, {fd=10, events=POLLIN}, {fd=9, events=POLLIN}, {fd=12, events=POLLIN}, {fd=51, events=POLLIN}, {fd=36, events=POLLIN}, {fd=20, events=POLLIN}, {fd=7, eve
nts=POLLIN}, {fd=30, events=POLLIN}, {fd=47, events=POLLIN}], 11, {3, 310381239}, NULL, 0) = 1 ([{fd=30, revents=POLLIN}], left {3, 247386583})
getrusage(0x1 /* RUSAGE_??? */, {ru_utime={41, 121831}, ru_stime={45, 446739}, ...}) = 0
read(30, "\0", 512)                     = 1
recvfrom(11, 0x2409328, 264, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
recvmsg(10, 0xffffee67ea18, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
read(36, 0x24fbe36, 1518)               = -1 EAGAIN (Resource temporarily unavailable)
read(49, 0x24fbe36, 1518)               = -1 EAGAIN (Resource temporarily unavailable)
accept(51, 0xffffee68f7d0, 0xffffee68f7cc) = -1 EAGAIN (Resource temporarily unavailable)
accept(47, 0xffffee68f7d0, 0xffffee68f7cc) = -1 EAGAIN (Resource temporarily unavailable)
accept(9, 0xffffee68fac0, 0xffffee68fabc) = -1 EAGAIN (Resource temporarily unavailable)
recvmsg(20, 0xffffee67ea68, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
recvmsg(12, 0xffffee67ea58, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
recvmsg(12, 0xffffee67ea58, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
recvmsg(12, 0xffffee67ea58, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
ppoll([{fd=11, events=POLLIN}, {fd=49, events=POLLIN}, {fd=10, events=POLLIN}, {fd=9, events=POLLIN}, {fd=12, events=POLLIN}, {fd=51, events=POLLIN}, {fd=36, events=POLLIN}, {fd=20, events=POLLIN}, {fd=7, eve
nts=POLLIN}, {fd=30, events=POLLIN}, {fd=47, events=POLLIN}], 11, {3, 247000000}, NULL, 0) = 1 ([{fd=30, revents=POLLIN}], left {2, 746806149})
getrusage(0x1 /* RUSAGE_??? */, {ru_utime={41, 122010}, ru_stime={45, 446739}, ...}) = 0
read(30, "\0", 512)                     = 1
recvfrom(11, 0x2409328, 264, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
recvmsg(10, 0xffffee67ea18, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
read(36, 0x24fbe36, 1518)               = -1 EAGAIN (Resource temporarily unavailable)
read(49, 0x24fbe36, 1518)               = -1 EAGAIN (Resource temporarily unavailable)
accept(51, 0xffffee68f7d0, 0xffffee68f7cc) = -1 EAGAIN (Resource temporarily unavailable)
accept(47, 0xffffee68f7d0, 0xffffee68f7cc) = -1 EAGAIN (Resource temporarily unavailable)
accept(9, 0xffffee68fac0, 0xffffee68fabc) = -1 EAGAIN (Resource temporarily unavailable)
recvmsg(20, 0xffffee67ea68, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
recvmsg(12, 0xffffee67ea58, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
recvmsg(12, 0xffffee67ea58, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
recvmsg(12, 0xffffee67ea58, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
ppoll([{fd=11, events=POLLIN}, {fd=49, events=POLLIN}, {fd=10, events=POLLIN}, {fd=9, events=POLLIN}, {fd=12, events=POLLIN}, {fd=51, events=POLLIN}, {fd=36, events=POLLIN}, {fd=20, events=POLLIN}, {fd=7, eve
nts=POLLIN}, {fd=30, events=POLLIN}, {fd=47, events=POLLIN}], 11, {2, 746000000}, NULL, 0) = 1 ([{fd=30, revents=POLLIN}], left {2, 246780821})
getrusage(0x1 /* RUSAGE_??? */, {ru_utime={41, 122186}, ru_stime={45, 446739}, ...}) = 0
read(30, "\0", 512)                     = 1
recvfrom(11, 0x2409328, 264, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
recvmsg(10, 0xffffee67ea18, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable)
read(36, 0x24fbe36, 1518)               = -1 EAGAIN (Resource temporarily unavailable)
read(49, 0x24fbe36, 1518)               = -1 EAGAIN (Resource temporarily unavailable)
accept(51, 0xffffee68f7d0, 0xffffee68f7cc) = -1 EAGAIN (Resource temporarily unavailable)
accept(47, 0xffffee68f7d0, 0xffffee68f7cc) = -1 EAGAIN (Resource temporarily unavailable)
accept(9, 0xffffee68fac0, 0xffffee68fabc) = -1 EAGAIN (Resource temporarily unavailable)

注:

  • ppoll和pselect差不多,所谓pselect就是不被信号打断的select
  • read recvfrom recvmsg都差不多,后两者多了些控制flag
  • 这里不正常的地方在于,ppoll返回的fd,按说都应该能有数据,但是后面的read/recv等函数都读不到东西。--这就是100%的原因? 似乎不对,应该是用户态收发包才对呀?

1.4. pstack 抓调用栈

$ sudo pstack 41320
#0  0x000000000053e91c in netdev_dpdk_vhost_rxq_recv (rxq=<optimized out>, batch=<optimized out>) at lib/netdev-dpdk.c:1943
#1  0x0000000000491d98 in netdev_rxq_recv (rx=<optimized out>, batch=0xffff767fe2c0, batch@entry=0xffff767fe300) at lib/netdev.c:701
#2  0x000000000046dd0c in dp_netdev_process_rxq_port (pmd=pmd@entry=0xffff94180010, rxq=0x24f4620, port_no=4) at lib/dpif-netdev.c:3279
#3  0x000000000046e0a8 in pmd_thread_main (f_=0xffff94180010) at lib/dpif-netdev.c:4146
#4  0x00000000004e452c in ovsthread_wrapper (aux_=<optimized out>) at lib/ovs-thread.c:348
#5  0x0000ffff99c47bb0 in start_thread () from /lib64/libpthread.so.0
#6  0x0000ffff99a6b4c0 in thread_start () from /lib64/libc.so.6

注:没看出啥来...

1.5. perf top到hot spot

$ sudo perf top
Samples: 360K of event 'cycles:ppp', Event count (approx.): 42373494756
Overhead  Shared Object            Symbol
  40.02%  ovs-vswitchd             [.] dp_netdev_process_rxq_port
  25.65%  [vdso]                   [.] monotonic
   9.31%  ovs-vswitchd             [.] netdev_dpdk_vhost_rxq_recv
   6.33%  ovs-vswitchd             [.] pmd_thread_main
   4.65%  ovs-vswitchd             [.] netdev_rxq_recv
   3.47%  ovs-vswitchd             [.] time_timespec__
   2.21%  ovs-vswitchd             [.] time_usec
   1.67%  libpthread-2.17.so       [.] __pthread_once
   1.65%  libc-2.17.so             [.] __clock_gettime
   1.15%  [vdso]                   [.] __kernel_clock_gettime
#9382是这个线程的pid
#进程可能重启了,变成了9382
$ sudo perf top -p 9382
Samples: 193K of event 'cycles:ppp', Event count (approx.): 38607322931
Overhead  Shared Object        Symbol
  55.02%  ovs-vswitchd         [.] dp_netdev_process_rxq_port
  21.43%  librte_vhost.so.4.1  [.] rte_vhost_dequeue_burst
  10.78%  ovs-vswitchd         [.] netdev_dpdk_vhost_rxq_recv
   3.43%  ovs-vswitchd         [.] pmd_thread_main
   2.92%  librte_vhost.so.4.1  [.] get_device
   2.82%  [vdso]               [.] monotonic
   1.94%  ovs-vswitchd         [.] netdev_rxq_recv
   0.41%  ovs-vswitchd         [.] rte_vhost_dequeue_burst@plt
   0.31%  ovs-vswitchd         [.] time_timespec__
   0.26%  ovs-vswitchd         [.] time_usec
   0.21%  libc-2.17.so         [.] __clock_gettime
   0.12%  libpthread-2.17.so   [.] __pthread_once
   0.11%  [vdso]               [.] __kernel_clock_gettime
   0.03%  ovs-vswitchd         [.] clock_gettime@plt
   0.02%  ovs-vswitchd         [.] pthread_once@plt
   0.02%  [kernel]             [k] _raw_spin_unlock_irqrestore
   0.01%  libpthread-2.17.so   [.] pthread_mutex_unlock
   0.01%  [kernel]             [k] sys_futex
   0.01%  [kernel]             [k] futex_wake
   0.01%  libc-2.17.so         [.] _int_free
   0.01%  ovs-vswitchd         [.] single_threaded
   0.01%  libc-2.17.so         [.] __poll

注:以上两个,函数dp_netdev_process_rxq_port都出现在第一位 看一下汇编如下:

Percent│       sub    x2, x2, x3   
       │       add    x2, x2, x1    
       │       str    x2, [x0]  
       │ cc:   ldr    x2, [x29,#88]     
       │       str    x0, [x19,#520]        add    x0, x20, #0x20 
       │       sub    x1, x1, x2 
       │     → bl     non_atomic_ullong_add
       │       ldr    w0, [x19,#312]
       │     ↓ cbnz   w0, 1d0
  0.19 │ e8:   str    xzr, [x19,#248]  
       │       mov    w0, w23  
       │       ldp    x19, x20, [sp,#16]
  0.72 │       ldp    x21, x22, [sp,#32]   
  0.08 │       ldp    x23, x24, [sp,#48]  
       │       ldr    x25, [sp,#64]     
  0.77 │       ldp    x29, x30, [sp],#384   
       │     ← ret      
  0.09 │108:   ldr    x0, [x23,#16] cmp    x0, x22    
       │     ↓ b.ne   1b0       
 39.26 │       mrs    x2, cntvct_el0  
  0.08 │       ldr    x0, [x29,#104]   
  3.68 │       str    x2, [x19,#512]     
       │     ↓ cbz    x0, 134       
       │       ldp    x1, x3, [x0]   
       │       sub    x1, x1, x3    
       │       add    x1, x1, x2 
       │       str    x1, [x0]    
  0.14 │134:   str    x0, [x19,#520]  
  7.96cmp    w21, #0x5f 
       │       mov    w23, #0x0                       // #0  
       │       ccmp   w21, #0xb, #0x4, ne   
 36.43 │     ↑ b.eq   e8     
       │       adrp   x22, 5dc000 <rl.7549+0x20>add    x22, x22, #0x138    
       │       ldr    w0, [x22,#36]     cmp    w0, #0x1  
       │     ↑ b.ls   e8    
       │       ldr    x0, [x20,#8]  
       │     → bl     netdev_rxq_get_name   
       │       mov    x20, x0   
       │       mov    w0, w21    
       │     → bl     ovs_strerror 
       │       mov    x4, x20    
       │       mov    x5, x0                  
       │       add    x2, x22, #0x160         
       │       mov    x0, x22            
       │       mov    w1, #0x2                        // #2

注意以下两条汇编占用很高 mrs x2, cntvct_el0 b.eq e8 根据ARM手册, cntvct_el0是个64bit的Virtual Timer Count register,只读的
可参考https://patchwork.kernel.org/patch/9290801/

$ sudo perf stat -e cycles,stalled-cycles-frontend,stalled-cycles-backend,branch-misses,cache-references,cache-misses -p 9382
^C
 Performance counter stats for process id '9382':
     9,670,958,313      cycles
       324,264,597      stalled-cycles-frontend   #    3.35% frontend cycles idle
     6,355,584,509      stalled-cycles-backend    #   65.72% backend cycles idle
         5,210,609      branch-misses
     4,424,834,814      cache-references
        48,121,592      cache-misses              #    1.088 % of all cache refs
       3.867804140 seconds time elapsed

这里stalled-cycles-backend很高,可能是data cache问题。复习一下data cache:

  • cache有两种写策略:
    • write through: 写透 写cache,同时更新下个level的mem。
    • write back: 写回 暂时不更新到下个level的mem,等到这个block被换出时一起更新。这样复杂点,但性能高
  • 写也有cache miss,此时需要allocate一个cache line。 有人问写cache miss的时候,要先allocate cacheline,那么还要先把数据读到这个cache line里吗? 答:需要。正因为cache的操作单位是cache line,比如64字节,但通常写个data,不会把这个64字节都更新, 那么就需要先读出这64字节,更新其中的一部分,再一起写回。

1.6. 情况有变化

ovs-vswitchd的所有线程都100%,昨天还只有一个线程是...

  • strace出来大部分都是futex系统调用
  • ifconfig也没看到有流量,用ip -s addr
    $ sudo pstack 27680
    Thread 1 (process 27680):
    #0  0x0000ffffa29a048c in monotonic () at arch/arm64/kernel/vdso/gettimeofday.S:241
    #1  0x0000ffffa15eeb28 in clock_gettime () from /lib64/libc.so.6
    #2  0x000000000050f99c in xclock_gettime (ts=0xffff7969e3c0, id=<optimized out>) at lib/timeval.c:503
    #3  time_timespec__ (c=0x690e70 <monotonic_clock>, ts=0xffff7969e3c0) at lib/timeval.c:155
    #4  0x000000000050fbf4 in time_usec__ (c=0x690e70 <monotonic_clock>) at lib/timeval.c:246
    #5  time_usec () at lib/timeval.c:247
    #6  0x000000000046e1dc in pmd_thread_ctx_time_update (pmd=0xffff796a0010) at lib/dpif-netdev.c:777
    #7  pmd_thread_main (f_=0xffff796a0010) at lib/dpif-netdev.c:4156
    #8  0x00000000004e452c in ovsthread_wrapper (aux_=<optimized out>) at lib/ovs-thread.c:348
    #9  0x0000ffffa17b7bb0 in start_thread () from /lib64/libpthread.so.0
    #10 0x0000ffffa15db4c0 in thread_start () from /lib64/libc.so.6
    bai@CentOS-21 ~/tmp
    $ sudo pstack 27680
    Thread 1 (process 27680):
    #0  0x0000000000491d90 in netdev_rxq_recv (rx=0xfff7ed43a100, batch=0xffff7969e2c0, batch@entry=0xffff7969e300) at lib/netdev.c:701
    #1  0x000000000046dd0c in dp_netdev_process_rxq_port (pmd=pmd@entry=0xffff796a0010, rxq=0x1ea85bd0, port_no=4) at lib/dpif-netdev.c:3279
    #2  0x000000000046e0a8 in pmd_thread_main (f_=0xffff796a0010) at lib/dpif-netdev.c:4146
    #3  0x00000000004e452c in ovsthread_wrapper (aux_=<optimized out>) at lib/ovs-thread.c:348
    #4  0x0000ffffa17b7bb0 in start_thread () from /lib64/libpthread.so.0
    #5  0x0000ffffa15db4c0 in thread_start () from /lib64/libc.so.6
    bai@CentOS-21 ~/tmp
    $ sudo pstack 27680
    Thread 1 (process 27680):
    #0  0x0000ffffa1ad5420 in get_device () from /usr/src/dpdk-stable-17.11.3/arm64-armv8a-linuxapp-gcc/lib/librte_vhost.so.4.1
    #1  0x0000ffffa1adab60 in rte_vhost_dequeue_burst () from /usr/src/dpdk-stable-17.11.3/arm64-armv8a-linuxapp-gcc/lib/librte_vhost.so.4.1
    #2  0x000000000053e740 in netdev_dpdk_vhost_rxq_recv (rxq=<optimized out>, batch=0xffff7969e2c0) at lib/netdev-dpdk.c:1918
    #3  0x0000000000491d98 in netdev_rxq_recv (rx=<optimized out>, batch=0xffff7969e2c0, batch@entry=0xffff7969e300) at lib/netdev.c:701
    #4  0x000000000046dd0c in dp_netdev_process_rxq_port (pmd=pmd@entry=0xffff796a0010, rxq=0x1eb3c0e0, port_no=5) at lib/dpif-netdev.c:3279
    #5  0x000000000046e0a8 in pmd_thread_main (f_=0xffff796a0010) at lib/dpif-netdev.c:4146
    #6  0x00000000004e452c in ovsthread_wrapper (aux_=<optimized out>) at lib/ovs-thread.c:348
    #7  0x0000ffffa17b7bb0 in start_thread () from /lib64/libpthread.so.0
    #8  0x0000ffffa15db4c0 in thread_start () from /lib64/libc.so.6
    

2. 后记

后来了解了一下ovs dpdk才知道, dpdk的pmd线程因为使用了轮询模式, 就是100% CPU占用的,.

results matching ""

    No results matching ""