'Quantization' 태그의 글 목록

본문 바로가기

Quantization

[ICLR 2026] When Reasoning Meets Compression: Understanding the Effects of LLMs Compression on Large Reasoning Models https://arxiv.org/abs/2504.02010, ICLR 2026 When Reasoning Meets Compression: Understanding the Effects of LLMs Compression on Large Reasoning ModelsCompression methods, including quantization, distillation, and pruning, improve the computational efficiency of large reasoning models (LRMs). However, existing studies either fail to sufficiently compare all three compression methods on LRMs or lac.. 더보기

[ICLR 2025] DGQ: Distribution-Aware Group Quantization for Text-to-Image Diffusion Models https://arxiv.org/abs/2501.04304 AbstractText-to-Image 생성을 위한 Diffusion모델은 높은 계산량과 메모리사용으로 인해 실제 적용에 제약을 줌.이를 하기 위해 Quantization 기법이 이용되는데, 기존 Diffusion모델의 quantization은 낮은 비트에서 이미지 품질과 텍스트-이미지 alignment를 유지하는데 한계를 지님.본 논문에서는 activation에서 outlier가 있으며, 이는 이미지 품질을 결정하는데 중요한 역할을 한다는 것을 분석함. 또한, 텍스트-이미지 alignment에 cross-attention이 중요한 역할을 한다는 것을 분석함.논문에서 제안하는 DGQ(Distribution-aware Group Quantizat.. 더보기

[ICLR 2025] ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation https://arxiv.org/abs/2406.02540 ICLR 2025 ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video GenerationDiffusion transformers have demonstrated remarkable performance in visual generation tasks, such as generating realistic images or videos based on textual instructions. However, larger model sizes and multi-frame processing for video generation lead to i.. 더보기

[CVPR 2025] Q-DiT: Accurate Post-Training Quantization for Diffusion Transformers https://arxiv.org/abs/2406.17343 , CVPR 2025 (Accepted) Q-DiT: Accurate Post-Training Quantization for Diffusion TransformersRecent advancements in diffusion models, particularly the architectural transformation from UNet-based models to Diffusion Transformers (DiTs), significantly improve the quality and scalability of image and video generation. However, despite their impressiarxiv.org Abstrac.. 더보기

[ECCV2024] Post-training Quantization for Text-to-Image Diffusion Models with Progressive Calibration and Activation Relaxing https://arxiv.org/abs/2311.06322 , ECCV 2024 Post-training Quantization for Text-to-Image Diffusion Models with Progressive Calibration and Activation RelaxingHigh computational overhead is a troublesome problem for diffusion models. Recent studies have leveraged post-training quantization (PTQ) to compress diffusion models. However, most of them only focus on unconditional models, leaving the q.. 더보기

[ACM SAC 2025] Advanced Knowledge Transfer: Refined Feature Distillation for Zero-Shot Quantization in Edge Computing https://dl.acm.org/doi/abs/10.1145/3672608.3707816Abstract기존 Zero-Shot Quantization(ZSQ, Data-Free Quantizaton) 분야에서는 full-precision(FP) Model로부터 높은 quality의 데이터를 생성하는 데 초점을 두는 연구가 진행되고 있음. 하지만, low-bit(높은 압축률) 환경에서 Quantized Model을 학습할 때는 Quantized Model이 정보 수용량 관련 한계를 갖기 때문에 데이터를 생성하는 기법만으로는 적절한 학습이 이루어지지 않음. 이러한 한계를 개선하기 위해 본 논문에서는 Quantized Model을 효과적으로 학습하기 위한 AKT(Advanced Knowledge Transfe.. 더보기

[ICML2024] KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache Liu, Zirui, et al. "Kivi: A tuning-free asymmetric 2bit quantization for kv cache." ICML 2024 (Poster) https://icml.cc/virtual/2024/poster/34318 ICML Poster KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV CacheAbstract: Efficiently serving large language models (LLMs) requires batching many requests together to reduce the cost per request. Yet, the key-value (KV) cache, which stores atte.. 더보기

[CVPR2023] Adaptive Data-Free Quantization https://openaccess.thecvf.com/content/CVPR2023/html/Qian_Adaptive_Data-Free_Quantization_CVPR_2023_paper.html AbstractData Free Quantization에서 Quantized Model의 성능을 복원하기 위해 가짜의 데이터 샘플을 생성하는 경우가 많음.하지만, 기존 방식은 양자화가 진행되지 않은 Full-precision Model P을 기준으로 생성되기 때문에 Quantized Model과는 독립적이며, 생성된 샘플이 Quantized Model에 효과적인지 검증되지 않음. 또한, 일반화 오류가 존재해 다양한 Quantization 비트 폭에서 적응성이 좋은지 밝혀지지 않음. (Quantization은 3.. 더보기

이전 1 2 다음

티스토리툴바