{"id":"PYSEC-2026-145","details":"vLLM is an inference and serving engine for large language models (LLMs). From  to before 0.20.0, the extract_hidden_states speculative decoding proposer in vLLM returns a tensor with an incorrect shape after the first decode step, causing a RuntimeError that crashes the EngineCore process. The crash is triggered when any request in the batch uses sampling penalty parameters (repetition_penalty, frequency_penalty, or presence_penalty). A single request with a penalty parameter (e.g., \"repetition_penalty\": 1.1) is sufficient to crash the server. This vulnerability is fixed in 0.20.0.","aliases":["CVE-2026-44223","GHSA-83vm-p52w-f9pw"],"modified":"2026-05-20T09:19:21.596358Z","published":"2026-05-12T20:16:43.293Z","references":[{"type":"ADVISORY","url":"https://github.com/vllm-project/vllm/security/advisories/GHSA-83vm-p52w-f9pw"},{"type":"FIX","url":"https://github.com/vllm-project/vllm/pull/38610"}],"affected":[{"package":{"name":"vllm","ecosystem":"PyPI","purl":"pkg:pypi/vllm"},"ranges":[{"type":"ECOSYSTEM","events":[{"introduced":"0.18.0"},{"fixed":"0.20.0"}]}],"versions":["0.18.0","0.18.1","0.19.0","0.19.1"],"database_specific":{"source":"https://github.com/pypa/advisory-database/blob/main/vulns/vllm/PYSEC-2026-145.yaml"}}],"schema_version":"1.7.5","severity":[{"type":"CVSS_V3","score":"CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H"}]}