MMLongBench: Benchmarking Long-Context Vision-Language Models Effectively and Thoroughly

MMLongBench: Long-Context Evaluation

We introduce MMLongBench, the first benchmark covering a diverse set of long-context vision-language tasks to evaluate long-context vision-language models (LCVLMs) effectively and thoroughly. MMLongBench is composed of 13,331 examples spanning five different categories of downstream tasks, such as Visual RAG and Many-Shot ICL.

All examples are delivered at five standardized input lengths (8K-128K tokens). Through a thorough benchmarking of 46 closed-source and open-source LCVLMs, we provide a comprehensive analysis of current models’ vision-language long-context ability. Our results show that both closed-source and open-source models face challenges in long-context vision-language tasks, indicating substantial room for future improvement.

MMLongBench: Benchmarking Long-Context Vision-Language Models Effectively and Thoroughly

MMLongBench: Long-Context Evaluation

Share :

Recent Posts

The Future of Large Language Models (LLMs): Opportunities for Enterprises

5 Ways To Use a Free AI Art Generator at Home

Perplexity AI vs ChatGPT vs Gemini vs Deepseek: AI Tool Comparison

Why Robotics Needs Its ChatGPT Moment

The Future of AI: How Artificial Intelligence Will Change the World

Comparative Analysis: Development Vs. Testing Vs. Production Environments