vllm.env_override ¶
_update_scheduler_patched ¶
(Re)initializes the scheduler member. When initializing the scheduler, no CUBIN files should be generated (to avoid biasing any benchmarks and pessimizing fusion decisions).
Source code in vllm/env_override.py
memory_plan_reuse_patched ¶
Source code in vllm/env_override.py
should_partition_patched ¶
Return True if we should partition the inductor graph on this node