OPAL: Outlier-Preserved Microscaling Quantization Accelerator for Generative Large Language Models | Proceedings of the 61st ACM/IEEE Design Automation Conference
Transformer-based large language models (LLMs) have achieved great success with the growing model size. LLMs' size grows by 240× every two years, which outpaces the hardware progress and makes model inference increasingly costly. Model quantization is ...