{"id":"CVE-2024-35931","summary":"drm/amdgpu: Skip do PCI error slot reset during RAS recovery","details":"In the Linux kernel, the following vulnerability has been resolved:\n\ndrm/amdgpu: Skip do PCI error slot reset during RAS recovery\n\nWhy:\n    The PCI error slot reset maybe triggered after inject ue to UMC multi times, this\n    caused system hang.\n    [  557.371857] amdgpu 0000:af:00.0: amdgpu: GPU reset succeeded, trying to resume\n    [  557.373718] [drm] PCIE GART of 512M enabled.\n    [  557.373722] [drm] PTB located at 0x0000031FED700000\n    [  557.373788] [drm] VRAM is lost due to GPU reset!\n    [  557.373789] [drm] PSP is resuming...\n    [  557.547012] mlx5_core 0000:55:00.0: mlx5_pci_err_detected Device state = 1 pci_status: 0. Exit, result = 3, need reset\n    [  557.547067] [drm] PCI error: detected callback, state(1)!!\n    [  557.547069] [drm] No support for XGMI hive yet...\n    [  557.548125] mlx5_core 0000:55:00.0: mlx5_pci_slot_reset Device state = 1 pci_status: 0. Enter\n    [  557.607763] mlx5_core 0000:55:00.0: wait vital counter value 0x16b5b after 1 iterations\n    [  557.607777] mlx5_core 0000:55:00.0: mlx5_pci_slot_reset Device state = 1 pci_status: 1. Exit, err = 0, result = 5, recovered\n    [  557.610492] [drm] PCI error: slot reset callback!!\n    ...\n    [  560.689382] amdgpu 0000:3f:00.0: amdgpu: GPU reset(2) succeeded!\n    [  560.689546] amdgpu 0000:5a:00.0: amdgpu: GPU reset(2) succeeded!\n    [  560.689562] general protection fault, probably for non-canonical address 0x5f080b54534f611f: 0000 [#1] SMP NOPTI\n    [  560.701008] CPU: 16 PID: 2361 Comm: kworker/u448:9 Tainted: G           OE     5.15.0-91-generic #101-Ubuntu\n    [  560.712057] Hardware name: Microsoft C278A/C278A, BIOS C2789.5.BS.1C11.AG.1 11/08/2023\n    [  560.720959] Workqueue: amdgpu-reset-hive amdgpu_ras_do_recovery [amdgpu]\n    [  560.728887] RIP: 0010:amdgpu_device_gpu_recover.cold+0xbf1/0xcf5 [amdgpu]\n    [  560.736891] Code: ff 41 89 c6 e9 1b ff ff ff 44 0f b6 45 b0 e9 4f ff ff ff be 01 00 00 00 4c 89 e7 e8 76 c9 8b ff 44 0f b6 45 b0 e9 3c fd ff ff \u003c48\u003e 83 ba 18 02 00 00 00 0f 84 6a f8 ff ff 48 8d 7a 78 be 01 00 00\n    [  560.757967] RSP: 0018:ffa0000032e53d80 EFLAGS: 00010202\n    [  560.763848] RAX: ffa00000001dfd10 RBX: ffa0000000197090 RCX: ffa0000032e53db0\n    [  560.771856] RDX: 5f080b54534f5f07 RSI: 0000000000000000 RDI: ff11000128100010\n    [  560.779867] RBP: ffa0000032e53df0 R08: 0000000000000000 R09: ffffffffffe77f08\n    [  560.787879] R10: 0000000000ffff0a R11: 0000000000000001 R12: 0000000000000000\n    [  560.795889] R13: ffa0000032e53e00 R14: 0000000000000000 R15: 0000000000000000\n    [  560.803889] FS:  0000000000000000(0000) GS:ff11007e7e800000(0000) knlGS:0000000000000000\n    [  560.812973] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033\n    [  560.819422] CR2: 000055a04c118e68 CR3: 0000000007410005 CR4: 0000000000771ee0\n    [  560.827433] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000\n    [  560.835433] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400\n    [  560.843444] PKRU: 55555554\n    [  560.846480] Call Trace:\n    [  560.849225]  \u003cTASK\u003e\n    [  560.851580]  ? show_trace_log_lvl+0x1d6/0x2ea\n    [  560.856488]  ? show_trace_log_lvl+0x1d6/0x2ea\n    [  560.861379]  ? amdgpu_ras_do_recovery+0x1b2/0x210 [amdgpu]\n    [  560.867778]  ? show_regs.part.0+0x23/0x29\n    [  560.872293]  ? __die_body.cold+0x8/0xd\n    [  560.876502]  ? die_addr+0x3e/0x60\n    [  560.880238]  ? exc_general_protection+0x1c5/0x410\n    [  560.885532]  ? asm_exc_general_protection+0x27/0x30\n    [  560.891025]  ? amdgpu_device_gpu_recover.cold+0xbf1/0xcf5 [amdgpu]\n    [  560.898323]  amdgpu_ras_do_recovery+0x1b2/0x210 [amdgpu]\n    [  560.904520]  process_one_work+0x228/0x3d0\nHow:\n    In RAS recovery, mode-1 reset is issued from RAS fatal error handling and expected\n    all the nodes in a hive to be reset. no need to issue another mode-1 during this procedure.","modified":"2026-04-02T11:52:42.685144Z","published":"2024-05-19T10:10:39.706Z","related":["SUSE-SU-2024:2135-1","SUSE-SU-2024:2203-1","SUSE-SU-2024:2973-1","SUSE-SU-2025:20008-1","SUSE-SU-2025:20028-1"],"database_specific":{"osv_generated_from":"https://github.com/CVEProject/cvelistV5/tree/main/cves/2024/35xxx/CVE-2024-35931.json","cna_assigner":"Linux"},"references":[{"type":"WEB","url":"https://git.kernel.org/stable/c/395ca1031acf89d8ecb26127c544a71688d96f35"},{"type":"WEB","url":"https://git.kernel.org/stable/c/601429cca96b4af3be44172c3b64e4228515dbe1"},{"type":"ADVISORY","url":"https://github.com/CVEProject/cvelistV5/tree/main/cves/2024/35xxx/CVE-2024-35931.json"},{"type":"ADVISORY","url":"https://nvd.nist.gov/vuln/detail/CVE-2024-35931"},{"type":"PACKAGE","url":"https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git"}],"affected":[{"ranges":[{"type":"GIT","repo":"https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git","events":[{"introduced":"d38ceaf99ed015f2a0b9af3499791bd3a3daae21"},{"fixed":"395ca1031acf89d8ecb26127c544a71688d96f35"},{"fixed":"601429cca96b4af3be44172c3b64e4228515dbe1"}]}],"database_specific":{"source":"https://storage.googleapis.com/cve-osv-conversion/osv-output/CVE-2024-35931.json"}}],"schema_version":"1.7.5"}