Noiser: Bounded Input Perturbations for Attributing Large Language Models

Feature attribution (FA) methods are common post-hoc approaches that explain how Large Language Models (LLMs) make predictions. Generating faithful at...

September 9, 2025 1 min read

Full paper · available on arxiv.org

Read paper

Feature attribution (FA) methods are common post-hoc approaches that explain how Large Language Models (LLMs) make predictions. Generating faithful attributions that reflect the actual inner behavior of the model is crucial. We introduce Noiser, a perturbation-based FA method that imposes bounded noise on each input embedding and measures the robustness of the model against partially noised input to obtain input attributions.

We also propose an answerability metric that employs an instructed judge model to assess the extent to which highly scored tokens suffice to recover the predicted output. Through comprehensive evaluation across six LLMs and three tasks, we demonstrate that Noiser consistently outperforms existing gradient-based, attention-based, and perturbation-based FA methods.

Keep reading.

November 2025

GRADA: Graph-based Reranker against Adversarial Documents Attack

Retrieval Augmented Generation (RAG) frameworks improve the accuracy of large language models (LLMs) by integrating external knowledge from retrieved ...

November 2025

FLARE: Faithful Logic-Aided Reasoning and Exploration

We introduce Faithful Logic-Aided Reasoning and Exploration (FLARE), a novel interpretable approach for traversing the problem space using task decomp...

November 2025

DeCoRe: Decoding by Contrasting Retrieval Heads to Mitigate Hallucinations

Large Language Models (LLMs) often hallucinate, producing unfaithful or factually incorrect outputs by misrepresenting the provided context. We propos...

Start the conversation

Talk to a senior consultant.

30 minutes. Bring a problem you’re stuck on — we’ll tell you what we’d do next.

Book a consultation →