Analysing The Impact of Sequence Composition on Language Model Pre-Training Read Paper Pre-training sequence composition plays a critical role in language model performance. Traditional causal masking […]
A Simple and Effective L2 Norm-Based Strategy for KV Cache Compression Read Paper Deploying large language models (LLMs) is challenging due to the high memory demands […]