Multilingual LLMs excel at zero-shot cross-lingual transfer, likely by aligning languages without parallel sentence supervision. This study uses intrinsic probing to analyze neuron overlap encoding linguistic features, correlating it with transfer performance. By examining BLOOM checkpoints across training steps and model scales, a strong link between neuron overlap and downstream performance is identified. The findings also reveal phases in pre-training where alignment and multilingual abilities degrade, offering new insights into multilingual training dynamics.
©2024 Miniml Ltd. All rights reserved