Time series and text tokens don't need equal treatment in LLMs—compressing redundant time series patterns and reducing prompt tokens at deeper layers can speed up inference 7.68× without hurting performance.
This paper shows that time series tokens and prompt tokens in language models have different information patterns, so treating them equally wastes computation. The authors develop a compression method that removes redundant frequency patterns from time series data and gradually drops prompt tokens deeper in the model, achieving up to 7.68× faster inference while maintaining or improving accuracy.