kv cache 썸네일형 리스트형 [ICML2024] KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache Liu, Zirui, et al. "Kivi: A tuning-free asymmetric 2bit quantization for kv cache." ICML 2024 (Poster) https://icml.cc/virtual/2024/poster/34318 ICML Poster KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV CacheAbstract: Efficiently serving large language models (LLMs) requires batching many requests together to reduce the cost per request. Yet, the key-value (KV) cache, which stores atte.. 더보기 이전 1 다음