EntroLLM: Entropy Encoded Weight Compression for Efficient Large Language Model Inference on Edge Devices
arXiv:2505.02380v4 Announce Type: replace Abstract: Large Language Models (LLMs) achieve strong performance across tasks, but face storage and compute challenges on edge devices. We propose...