본문 바로가기

Quantization

[ECCV2024] Post-training Quantization for Text-to-Image Diffusion Models with Progressive Calibration and Activation Relaxing https://arxiv.org/abs/2311.06322 , ECCV 2024 Post-training Quantization for Text-to-Image Diffusion Models with Progressive Calibration and Activation RelaxingHigh computational overhead is a troublesome problem for diffusion models. Recent studies have leveraged post-training quantization (PTQ) to compress diffusion models. However, most of them only focus on unconditional models, leaving the q.. 더보기
[ACM SAC 2025] Advanced Knowledge Transfer: Refined Feature Distillation for Zero-Shot Quantization in Edge Computing https://arxiv.org/abs/2412.19125 (Accepted)Abstract기존 Zero-Shot Quantization(ZSQ, Data-Free Quantizaton) 분야에서는 full-precision(FP) Model로부터 높은 quality의 데이터를 생성하는 데 초점을 두는 연구가 진행되고 있음. 하지만, low-bit(높은 압축률) 환경에서 Quantized Model을 학습할 때는 Quantized Model이 정보 수용량 관련 한계를 갖기 때문에 데이터를 생성하는 기법만으로는 적절한 학습이 이루어지지 않음. 이러한 한계를 개선하기 위해 본 논문에서는 Quantized Model을 효과적으로 학습하기 위한 AKT(Advanced Knowledge Transfer) Meth.. 더보기
[ICML2024] KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache Liu, Zirui, et al. "Kivi: A tuning-free asymmetric 2bit quantization for kv cache." ICML 2024 (Poster) https://icml.cc/virtual/2024/poster/34318 ICML Poster KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV CacheAbstract: Efficiently serving large language models (LLMs) requires batching many requests together to reduce the cost per request. Yet, the key-value (KV) cache, which stores atte.. 더보기
[CVPR2023] Adaptive Data-Free Quantization https://openaccess.thecvf.com/content/CVPR2023/html/Qian_Adaptive_Data-Free_Quantization_CVPR_2023_paper.html AbstractData Free Quantization에서 Quantized Model의 성능을 복원하기 위해 가짜의 데이터 샘플을 생성하는 경우가 많음.하지만, 기존 방식은 양자화가 진행되지 않은 Full-precision Model P을 기준으로 생성되기 때문에 Quantized Model과는 독립적이며, 생성된 샘플이 Quantized Model에 효과적인지 검증되지 않음. 또한, 일반화 오류가 존재해 다양한 Quantization 비트 폭에서 적응성이 좋은지 밝혀지지 않음. (Quantization은 3.. 더보기
[CVPR2023] Hard Sample matters a Lot in Zero-Shot Quantization https://openaccess.thecvf.com/content/CVPR2023/html/Li_Hard_Sample_Matters_a_Lot_in_Zero-Shot_Quantization_CVPR_2023_paper.html CVPR 2023 Open Access RepositoryHard Sample Matters a Lot in Zero-Shot Quantization Huantong Li, Xiangmiao Wu, Fanbing Lv, Daihai Liao, Thomas H. Li, Yonggang Zhang, Bo Han, Mingkui Tan; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (.. 더보기
[Low-power Computer Vision 2022] A Survey of Quantization Methods for Efficient Neural Network Inference https://arxiv.org/abs/2103.13630Low-power Computer Vision, 2022 A Survey of Quantization Methods for Efficient Neural Network InferencThis chapter provides approaches to the problem of quantizing the numerical values in deep Neural Network computations, covering the advantages/disadvantages ofwww.taylorfrancis.comAbstract AI분야에서 신경망 모델의 성능발전으로 인해 메모리 및 computational resource 관련 한계가 발생하고 있음.해당 한계.. 더보기