Beyond Speedup -- Utilizing KV Cache for Sampling and Reasoning
arXiv:2601.20326v1 Announce Type: new Abstract: KV caches, typically used only to speed up autoregressive decoding, encode contextual information that can be reused for downstream tasks...