vllm.attention.layers.chunked_local_attention ¶
ChunkedLocalAttention ¶
Bases: Attention
Source code in vllm/attention/layers/chunked_local_attention.py
__init__ ¶
__init__(
num_heads: int,
head_size: int,
scale: float,
attention_chunk_size: int,
num_kv_heads: int | None = None,
alibi_slopes: list[float] | None = None,
cache_config: CacheConfig | None = None,
quant_config: QuantizationConfig | None = None,
kv_sharing_target_layer_name: str | None = None,
prefix: str = "",
)
Source code in vllm/attention/layers/chunked_local_attention.py
create_chunked_local_attention_backend cached
¶
create_chunked_local_attention_backend(
underlying_attn_backend: AttentionBackend,
attention_chunk_size: int,
block_size: int,
) -> type[AttentionBackend]