Research - Miniml

PosterSum: A Multimodal Benchmark for Scientific Poster Summarization

Generating accurate and concise textual summaries from multimodal documents is challenging, especially when dealing with visually complex content like…

Inverse Scaling in Test-Time Compute

We construct evaluation tasks where extending the reasoning length of Large Reasoning Models (LRMs) deteriorates performance, exhibiting an inverse sc…

Lost in Time: Clock and Calendar Understanding Challenges in Multimodal LLMs

Understanding time from visual representations is a fundamental cognitive skill, yet it remains a challenge for multimodal large language models (MLLM…

SynDARin: Synthesising Datasets for Automated Reasoning in Low-Resource Languages

We present SynDARin, a methodology for synthesizing high-quality reasoning datasets in low-resource languages. Our approach combines template-based ge…

Adaptive Computation Modules: Granular Conditional Computation for Efficient Inference

We introduce Adaptive Computation Modules (ACMs) that enable fine-grained conditional computation for more efficient neural network inference. Our app…

Self-Training Large Language Models for Tool-Use Without Demonstrations

We propose a self-training approach that enables large language models to learn tool use without requiring human demonstrations. Our method uses self-…

When Can Proxies Improve the Sample Complexity of Preference Learning?

We provide theoretical and empirical analysis of when proxy rewards can improve sample efficiency in preference learning. Our work establishes conditi…

Is Complex Query Answering Really Complex?

We challenge conventional wisdom about complex query answering by demonstrating that many supposedly complex queries can be solved with surprisingly s…

An Auditing Test to Detect Behavioral Shift in Language Models

We develop a comprehensive auditing framework to detect behavioral shifts in language models across different contexts and time periods. Our approach …

Category: Research