Model merging at scale: what the latest benchmarks say about base-model quality and expert count
Recent large-scale merging results suggest that stronger base models and larger model sizes make merging easier, and that merging more expert checkpoints can improve zero-shot generalization — but the gains flatten across methods at larger scales, so method choice matters less than base quality and expert count.