Characterizing LLM Inference Energy-Performance Tradeoffs across Workloads and GPU Scaling
arXiv:2501.08219v4 Announce Type: replace Abstract: LLM inference exhibits substantial variability across queries and execution phases, yet inference configurations are often applied uniformly. We present a...