NanoQuant: Efficient Sub-1-Bit Quantization of Large Language Models
arXiv:2602.06694v1 Announce Type: new Abstract: Weight-only quantization has become a standard approach for efficiently serving large language models (LLMs). However, existing methods fail to efficiently...