An Analysis of Decoding Methods for LLM-based Agents for Faithful Multi-Hop Question Answering

Large Language Models (LLMs) frequently produce factually inaccurate outputs—a phenomenon known as hallucination—which limits their accuracy in knowle...

August 26, 2025 1 min read

Full paper · available on arxiv.org

Read paper

Large Language Models (LLMs) frequently produce factually inaccurate outputs—a phenomenon known as hallucination, which limits their accuracy in knowledge-intensive NLP tasks. Retrieval-augmented generation and agentic frameworks such as Reasoning and Acting (ReAct) can address this by giving the model access to external knowledge. However, LLMs often fail to remain faithful to the retrieved information.

We present a systematic analysis of how combining the ReAct framework with decoding strategies (i.e., DeCoRe, DoLa, and CAD) can influence the faithfulness of LLM-generated answers. Our results show that combining an agentic framework for knowledge retrieval with decoding methods can increase accuracy on Multi-Hop Question Answering tasks, observing an F1 increase from 19.5 to 32.6 on HotpotQA.

Keep reading.

November 2025

GRADA: Graph-based Reranker against Adversarial Documents Attack

Retrieval Augmented Generation (RAG) frameworks improve the accuracy of large language models (LLMs) by integrating external knowledge from retrieved ...

November 2025

FLARE: Faithful Logic-Aided Reasoning and Exploration

We introduce Faithful Logic-Aided Reasoning and Exploration (FLARE), a novel interpretable approach for traversing the problem space using task decomp...

November 2025

DeCoRe: Decoding by Contrasting Retrieval Heads to Mitigate Hallucinations

Large Language Models (LLMs) often hallucinate, producing unfaithful or factually incorrect outputs by misrepresenting the provided context. We propos...

Start the conversation

Talk to a senior consultant.

30 minutes. Bring a problem you’re stuck on — we’ll tell you what we’d do next.

Book a consultation →