Robust low-rank training via approximate orthonormal constraints

As models and datasets grow, pruning techniques using low-rank matrix factorizations have become popular for reducing resource demands while maintaining accuracy. However, we find that these methods often degrade robustness against adversarial attacks due to exploding singular values in the low-rank matrices.

To address this, we propose a robust low-rank training algorithm that keeps weights on the low-rank manifold while enforcing approximate orthonormal constraints. This approach reduces both training and inference costs while improving model conditioning and adversarial robustness, all without compromising accuracy. Our theoretical analysis and experimental results confirm that the robust low-rank network closely approximates the performance of full models when effective low-rank sub-networks are available.

To address this, we introduce a new error taxonomy and create MMLU-Redux—a refined subset of 3,000 manually re-annotated questions across 30 subjects. Our experiments with MMLU-Redux reveal notable discrepancies in previously reported model performance metrics, underscoring the need for revising MMLU’s flawed questions. We invite the community to contribute to further annotations to enhance the reliability of this important benchmark.