Harmonia: Algorithm-Hardware Co-Design for Memory- and Compute-Efficient BFP-based LLM Inference
arXiv:2602.04595v2 Announce Type: replace Abstract: Large Language Models (LLMs) are powerful but incur high memory and computation costs. Quantization is an effective solution, with INT...