In this paper, we explore how natural language explanations (NLEs) can improve the robustness of large language models (LLMs) in tasks like natural language inference and paraphrase detection. By prompting LLMs with a mix of human-generated and AI-produced NLEs, we observed notable improvements in handling adversarial inputs. Our findings indicate that this method consistently outperforms traditional approaches, offering a more effective way to enhance model accuracy in challenging scenarios.
©2024 Miniml Ltd. All rights reserved