'llm Quantization' 태그의 글 목록

본문 바로가기

llm Quantization

[ICLR 2026] When Reasoning Meets Compression: Understanding the Effects of LLMs Compression on Large Reasoning Models https://arxiv.org/abs/2504.02010, ICLR 2026 When Reasoning Meets Compression: Understanding the Effects of LLMs Compression on Large Reasoning ModelsCompression methods, including quantization, distillation, and pruning, improve the computational efficiency of large reasoning models (LRMs). However, existing studies either fail to sufficiently compare all three compression methods on LRMs or lac.. 더보기

[ICML2024] KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache Liu, Zirui, et al. "Kivi: A tuning-free asymmetric 2bit quantization for kv cache." ICML 2024 (Poster) https://icml.cc/virtual/2024/poster/34318 ICML Poster KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV CacheAbstract: Efficiently serving large language models (LLMs) requires batching many requests together to reduce the cost per request. Yet, the key-value (KV) cache, which stores atte.. 더보기

이전 1 다음

티스토리툴바