In the Linux kernel, the following vulnerability has been resolved:
net/mlx5: Reload only IB representors upon lag disable/enable On lag
disable, the bond IB device along with all of its representors are
destroyed, and then the slaves’ representors get reloaded. In case the
slave IB representor load fails, the eswitch error flow unloads all
representors, including ethernet representors, where the netdevs get
detached and removed from lag bond. Such flow is inaccurate as the lag
driver is not responsible for loading/unloading ethernet representors.
Furthermore, the flow described above begins by holding lag lock to prevent
bond changes during disable flow. However, when reaching the ethernet
representors detachment from lag, the lag lock is required again,
triggering the following deadlock: Call trace: __switch_to+0xf4/0x148
__schedule+0x2c8/0x7d0 schedule+0x50/0xe0
schedule_preempt_disabled+0x18/0x28 __mutex_lock.isra.13+0x2b8/0x570
__mutex_lock_slowpath+0x1c/0x28 mutex_lock+0x4c/0x68
mlx5_lag_remove_netdev+0x3c/0x1a0 [mlx5_core]
mlx5e_uplink_rep_disable+0x70/0xa0 [mlx5_core]
mlx5e_detach_netdev+0x6c/0xb0 [mlx5_core]
mlx5e_netdev_change_profile+0x44/0x138 [mlx5_core]
mlx5e_netdev_attach_nic_profile+0x28/0x38 [mlx5_core]
mlx5e_vport_rep_unload+0x184/0x1b8 [mlx5_core]
mlx5_esw_offloads_rep_load+0xd8/0xe0 [mlx5_core]
mlx5_eswitch_reload_reps+0x74/0xd0 [mlx5_core] mlx5_disable_lag+0x130/0x138
[mlx5_core] mlx5_lag_disable_change+0x6c/0x70 [mlx5_core] // hold
ldev->lock mlx5_devlink_eswitch_mode_set+0xc0/0x410 [mlx5_core]
devlink_nl_cmd_eswitch_set_doit+0xdc/0x180
genl_family_rcv_msg_doit.isra.17+0xe8/0x138 genl_rcv_msg+0xe4/0x220
netlink_rcv_skb+0x44/0x108 genl_rcv+0x40/0x58 netlink_unicast+0x198/0x268
netlink_sendmsg+0x1d4/0x418 sock_sendmsg+0x54/0x60 __sys_sendto+0xf4/0x120
__arm64_sys_sendto+0x30/0x40 el0_svc_common+0x8c/0x120 do_el0_svc+0x30/0xa0
el0_svc+0x20/0x30 el0_sync_handler+0x90/0xb8 el0_sync+0x160/0x180 Thus,
upon lag enable/disable, load and unload only the IB representors of the
slaves preventing the deadlock mentioned above. While at it, refactor the
mlx5_esw_offloads_rep_load() function to have a static helper method for
its internal logic, in symmetry with the representor unload design.
OS | Version | Architecture | Package | Version | Filename |
---|---|---|---|---|---|
ubuntu | 22.04 | noarch | linux | <Â any | UNKNOWN |
ubuntu | 24.04 | noarch | linux | <Â any | UNKNOWN |
ubuntu | 22.04 | noarch | linux-aws | <Â any | UNKNOWN |
ubuntu | 24.04 | noarch | linux-aws | <Â any | UNKNOWN |
ubuntu | 20.04 | noarch | linux-aws-5.15 | <Â any | UNKNOWN |
ubuntu | 22.04 | noarch | linux-aws-6.5 | <Â any | UNKNOWN |
ubuntu | 22.04 | noarch | linux-azure | <Â any | UNKNOWN |
ubuntu | 24.04 | noarch | linux-azure | <Â any | UNKNOWN |
ubuntu | 20.04 | noarch | linux-azure-5.15 | <Â any | UNKNOWN |
ubuntu | 22.04 | noarch | linux-azure-6.5 | <Â any | UNKNOWN |
git.kernel.org/linus/0f06228d4a2dcc1fca5b3ddb0eefa09c05b102c4 (6.10-rc1)
git.kernel.org/stable/c/0f06228d4a2dcc1fca5b3ddb0eefa09c05b102c4
git.kernel.org/stable/c/0f320f28f54b1b269a755be2e3fb3695e0b80b07
git.kernel.org/stable/c/e93fc8d959e56092e2eca1e5511c2d2f0ad6807a
git.kernel.org/stable/c/f03c714a0fdd1f93101a929d0e727c28a66383fc
launchpad.net/bugs/cve/CVE-2024-38557
nvd.nist.gov/vuln/detail/CVE-2024-38557
security-tracker.debian.org/tracker/CVE-2024-38557
www.cve.org/CVERecord?id=CVE-2024-38557