Miniml’s leadership includes active academics — University of Edinburgh professors, an Alan Turing Institute Fellow, and an ELLIS Scholar. We work at the boundary of machine learning research and enterprise systems, then carry the methods that hold up under scrutiny into production. Research that ships.
Technical work from Miniml researchers and collaborators — new methods, evaluations, and findings from frontier model development. Each entry links to a short summary on this site, with the full paper one click away.
Retrieval Augmented Generation (RAG) frameworks improve the accuracy of large language models (LLMs) by integrating external knowledge from retrieved ...
Read summary →We introduce Faithful Logic-Aided Reasoning and Exploration (FLARE), a novel interpretable approach for traversing the problem space using task decomp...
Read summary →Large Language Models (LLMs) often hallucinate, producing unfaithful or factually incorrect outputs by misrepresenting the provided context. We propos...
Read summary →Neurosymbolic (NeSy) predictors combine neural perception with symbolic reasoning to solve tasks like visual reasoning. However, standard NeSy predict...
Read summary →We introduce MMLongBench, the first benchmark covering a diverse set of long-context vision-language tasks to evaluate long-context vision-language mo...
Read summary →The ubiquitous independence assumption among symbolic concepts in neurosymbolic (NeSy) predictors is a convenient simplification that speeds up probab...
Read summary →Feature attribution (FA) methods are common post-hoc approaches that explain how Large Language Models (LLMs) make predictions. Generating faithful at...
Read summary →Large Language Models (LLMs) frequently produce factually inaccurate outputs—a phenomenon known as hallucination—which limits their accuracy in knowle...
Read summary →Autoregressive language models rely on a Key-Value (KV) Cache to avoid re-computing past hidden states during generation. As model sizes and context l...
Read summary →Generating accurate and concise textual summaries from multimodal documents is challenging, especially when dealing with visually complex content like...
Read summary →We construct evaluation tasks where extending the reasoning length of Large Reasoning Models (LRMs) deteriorates performance, exhibiting an inverse sc...
Read summary →Understanding time from visual representations is a fundamental cognitive skill, yet it remains a challenge for multimodal large language models (MLLM...
Read summary →We present SynDARin, a methodology for synthesizing high-quality reasoning datasets in low-resource languages. Our approach combines template-based ge...
Read summary →We introduce Adaptive Computation Modules (ACMs) that enable fine-grained conditional computation for more efficient neural network inference. Our app...
Read summary →We propose a self-training approach that enables large language models to learn tool use without requiring human demonstrations. Our method uses self-...
Read summary →We provide theoretical and empirical analysis of when proxy rewards can improve sample efficiency in preference learning. Our work establishes conditi...
Read summary →We challenge conventional wisdom about complex query answering by demonstrating that many supposedly complex queries can be solved with surprisingly s...
Read summary →We develop a comprehensive auditing framework to detect behavioral shifts in language models across different contexts and time periods. Our approach ...
Read summary →We investigate the cross-lingual transferability of backdoor attacks in instruction-tuned large language models. Our findings reveal that backdoors ca...
Read summary →Read Paper Knowledge Selection in LLMs We investigate how large language models (LLMs) select and utilise knowledge when generating responses. Our analysis reveals that LLMs exhibit systematic biases in knowledge selection, often favouring certain…
Read summary →We introduce Mixtures of In-Context Learners (MiCL), a novel approach that combines multiple in-context learning strategies to improve few-shot perfor...
Read summary →Low-rank lottery tickets: finding efficient low-rank neural networks via matrix differential equations Read Paper Neural networks deliver exceptional performance but can be impractical for applications with limited hardware or energy resources due to their…
Read summary →Robust low-rank training via approximate orthonormal constraints Read Paper As models and datasets grow, pruning techniques using low-rank matrix factorizations have become popular for reducing resource demands while maintaining accuracy. However, we find that…
Read summary →Read Paper Analyzing Flaws in MMLUOur analysis uncovers significant issues with the Massive Multitask Language Understanding (MMLU) benchmark, which is widely used to assess LLMs. We found numerous ground truth errors, with 57% of…
Read summary →Enhancing AI Model Robustness with Natural Language Explanations Read Paper In this paper, we explore how natural language explanations (NLEs) can improve the robustness of large language models (LLMs) in tasks like natural language…
Read summary →Probing the Emergence of Cross-lingual Alignment during LLM Training Read Paper Multilingual LLMs excel at zero-shot cross-lingual transfer, likely by aligning languages without parallel sentence supervision. This study uses intrinsic probing to analyze neuron…
Read summary →Using Natural Language Explanations to Improve Robustness of In-context Learning Read Paper This work explores improving the robustness of LLMs against adversarial inputs by augmenting in-context learning (ICL) with natural language explanations (NLEs). Prompting…
Read summary →SPARSEFIT: Few-shot Prompting with Sparse Fine-tuning for Jointly Generating Predictions and Natural Language Explanations Read Paper This work introduces SparseFit, a sparse few-shot fine-tuning strategy for generating natural language explanations (NLEs) with pre-trained language…
Read summary →Analysing The Impact of Sequence Composition on Language Model Pre-Training Read Paper Pre-training sequence composition plays a critical role in language model performance. Traditional causal masking can introduce distractions from unrelated documents, hindering effectiveness.…
Read summary →A Simple and Effective L2 Norm-Based Strategy for KV Cache Compression Read Paper Deploying large language models (LLMs) is challenging due to the high memory demands of the Key-Value (KV) cache, especially with longer…
Read summary →Most enterprise problems do not need a new method. They need known methods applied with discipline. Research enters our work where it genuinely moves the result — and the bar for that is high.
If your hardest problem needs more than off-the-shelf tools — or you simply want a second opinion grounded in current research — talk to us. A 30-minute conversation with a senior consultant who can tell you what is genuinely solvable today, and what it would take to ship it.
Book a consultation →