Steering Knowledge Selection Behaviours in LLMs

Large language models (LLMs) often face conflicts between stored knowledge and contextual information, which can lead to outdated or incorrect responses. Analyzing LLMs’ internal activations, we find that mid-layer signals can detect these context-memory knowledge conflicts. To address this, we introduce SpARE, a training-free representation engineering approach using pre-trained sparse auto-encoders (SAEs) to steer knowledge selection during inference. By editing specific internal activations, SpARE effectively manages knowledge conflicts, improving accuracy in open-domain question-answering tasks by 10% over existing methods and 15% over contrastive decoding.

Are We Done with MMLU?

Are We Done with MMLU? Read Paper Our analysis uncovers significant issues with the Massive Multitask Language Understanding (MMLU) benchmark, which is widely used to assess LLMs. We found numerous ground truth errors, with 57% of the Virology subset’s questions containing inaccuracies, obscuring the true capabilities of models. To address this, we introduce a new error taxonomy and create MMLU-Redux—a refined subset of 3,000 manually re-annotated questions across 30 subjects. Our experiments with MMLU-Redux reveal notable discrepancies in previously reported model performance metrics, underscoring the need for revising MMLU’s flawed questions. We invite the community to contribute to further annotations to enhance the reliability of this important benchmark.

Enhancing AI Model Robustness with Natural Language Explanations

Enhancing AI Model Robustness with Natural Language Explanations Read Paper In this paper, we explore how natural language explanations (NLEs) can improve the robustness of large language models (LLMs) in tasks like natural language inference and paraphrase detection. By prompting LLMs with a mix of human-generated and AI-produced NLEs, we observed notable improvements in handling adversarial inputs. Our findings indicate that this method consistently outperforms traditional approaches, offering a more effective way to enhance model accuracy in challenging scenarios. 

Probing the Emergence of Cross-lingual Alignment during LLM Training

Probing the Emergence of Cross-lingual Alignment during LLM Training Read Paper Multilingual LLMs excel at zero-shot cross-lingual transfer, likely by aligning languages without parallel sentence supervision. This study uses intrinsic probing to analyze neuron overlap encoding linguistic features, correlating it with transfer performance. By examining BLOOM checkpoints across training steps and model scales, a strong link between neuron overlap and downstream performance is identified. The findings also reveal phases in pre-training where alignment and multilingual abilities degrade, offering new insights into multilingual training dynamics.

SPARSEFIT: Few-shot Prompting with Sparse Fine-tuning

SPARSEFIT: Few-shot Prompting with Sparse Fine-tuning for Jointly Generating Predictions and Natural Language Explanations Read Paper This work introduces SparseFit, a sparse few-shot fine-tuning strategy for generating natural language explanations (NLEs) with pre-trained language models (PLMs). SparseFit uses discrete prompts to jointly generate predictions and NLEs while fine-tuning only 6.8% of the model’s parameters, making it more efficient than full fine-tuning. Tested on three T5 model sizes and four datasets, SparseFit achieves competitive task performance and NLE quality, outperforming other parameter-efficient fine-tuning (PEFT) methods on average in both predictive accuracy and explanation quality.  

Analysing The Impact of Sequence Composition on Language Model Pre-Training

Analysing The Impact of Sequence Composition on Language Model Pre-Training Read Paper Pre-training sequence composition plays a critical role in language model performance. Traditional causal masking can introduce distractions from unrelated documents, hindering effectiveness. Intra-document causal masking, which conditions tokens only within the same document, addresses this issue and enhances results. Additionally, the BM25Chunk method improves in-context learning, knowledge retention, and context utilization by grouping related documents, all while maintaining efficiency.