vllm.lora.ops.triton_ops.utils ¶
_LORA_A_PTR_DICT module-attribute
¶
_LORA_B_PTR_DICT module-attribute
¶
_get_lora_a_ptr ¶
_LORA_A_PTR_DICT
collects the required information during profile_run
, After this, it remains constant and subsequent usage is through LUT. Refer to: https://github.com/triton-lang/triton/blob/release/3.1.x/python/tutorials/08-grouped-gemm.py
Source code in vllm/lora/ops/triton_ops/utils.py
_get_lora_b_ptr ¶
_LORA_B_PTR_DICT
collects the required information during profile_run
, After this, it remains constant and subsequent usage is through LUT. Refer to: https://github.com/triton-lang/triton/blob/release/3.1.x/python/tutorials/08-grouped-gemm.py
Source code in vllm/lora/ops/triton_ops/utils.py
get_lora_op_configs cached
¶
get_lora_op_configs(
op_type: str,
max_loras: int,
batch: int,
hidden_size: int,
rank: int,
num_slices: int,
add_inputs: bool | None = None,
) -> dict[str, int | None]