How We Compared mergekit, TIES, DARE, and SLERP
The current SERP for this question fragments answers across mergekit repository docs, community merge recipes, and individual method papers. This article consolidates those into a single decision framework keyed to the two variables that actually determine which method wins: how many models you're merging and how much interference exists between their task vectors.
mergekit is the orchestration layer — it is not a competing algorithm but the toolkit that exposes SLERP, TIES, DARE, MoE-style merging, and multi-stage workflows under one YAML-driven interface. The methods are the algorithms; mergekit is how you run them. Conflating the two is the most common source of confusion in this space.
| Criterion | SLERP | TIES | DARE | mergekit |
|---|---|---|---|---|
| Model count | 2 models only | 2+ models | 2+ models | All of the above |
| Interference handling | None (naive blend) | Sign-consensus pruning | Delta-drop + rescaling | Method-dependent |
| Compatibility required | Same architecture | Same architecture | Same architecture | Same architecture |
| Merge complexity | Low | Medium | Medium–High | Low to High (workflow-dependent) |
| Primary use case | Simple interpolation | Multi-task conflict resolution | Sparsity-aware interference reduction | Orchestration + method chaining |
Decision criteria that matter in 2026
Four variables drive method fit: model count, sign conflicts between fine-tuned delta vectors, density settings when using sparsification-aware methods, and whether the target merge is a simple parameter blend or a sign-aware pruning operation.
mergekit's merge method guide encodes all four dimensions in its method guide. TIES resolves interference by "trimming low-magnitude changes in fine-tuned model's values and then resolves sign disagreements across the models being merged" — meaning it actively prunes parameter updates that conflict directionally before aggregating. DARE adds a second level of control: density, configurable as a decimal (e.g., density: 0.6 in mergekit-multi recipes), which governs what fraction of delta parameters survive before rescaling. SLERP has no pruning stage; it spherically interpolates between two weight sets directly.
| Variable | SLERP | TIES | DARE |
|---|---|---|---|
| Model count target | Exactly 2 | ≥2 | ≥2 |
| Sign-conflict resolution | No | Yes (sign consensus) | Yes (via drop) |
| Density/sparsity control | No | No | Yes (density parameter) |
| Merge type | Simple blend | Sign-aware merge | Pruning + rescaling |
What we excluded from the comparison
This article does not compare hardware minimums, deployment versions of Llama 3.1 70B or Qwen2.5 checkpoints, or container orchestration options. Those are operational concerns orthogonal to algorithm selection.
mergekit's multi-stage workflow system (mergekit-multi) "topologically sorts your merge configurations to determine the correct order of execution" and can "cache intermediate results for faster re-runs" — useful operational features, but not algorithm-selection criteria.
Pro Tip: If you're evaluating merge methods, separate algorithm choice from infrastructure choice. Whether you're running on a single A100 or an NVIDIA H100 cluster does not change whether TIES or SLERP is the right method for your task vector interference profile.
At a glance: when each method fits best
At a Glance: Two similar models with low task conflict → SLERP. Multiple models or overlapping task domains → TIES. High interference + need for sparsity control → DARE. Complex multi-stage pipelines → mergekit-multi with TIES or DARE.
SLERP is the right tool when you have exactly two models and their capabilities don't strongly conflict. TIES is the safer default the moment you add a third model or know the fine-tunes pull in different directions. DARE adds a sparsification step on top of the merge that actively drops and rescales delta parameters — it earns its place when interference is high enough that even sign-consensus merging leaves too much noise in the combined weights.
Two-model interpolation with SLERP
SLERP ("spherical linear interpolation") operates on the surface of a high-dimensional hypersphere, preserving the norm of weight vectors through the blend. Research on geodesic merging on the Fisher–Rao manifold characterizes these as "two-model geodesics" and notes that SLERP "preserves norm on a hypersphere and often outperforms linear interpolation for two models". The EXTEND paper is explicit: "SLERP is tailored for the combination of two models."
| SLERP fit | Practical reading |
|---|---|
| Cleanest two-model blend | Best when you are interpolating between exactly two compatible models |
| Model-count limit | Breaks down as model count rises because sequential application becomes order-sensitive |
That constraint is structural, not incidental. SLERP's formulation assumes a single interpolation parameter t ∈ [0, 1] between two endpoints. Extending it to three or more models requires sequential application — which introduces order sensitivity and loses the sign-conflict awareness that TIES and DARE were designed to provide.
Multi-model merges with TIES
Yes, TIES is explicitly designed for merging more than two models. Its paper frames the problem as combining "multiple task-specific models into a single multitask model without performing additional training." The mechanism that makes this work at scale is the "sign consensus algorithm" implemented in mergekit: for each parameter, TIES determines the dominant sign direction across all merged models and zeros out parameters that disagree, then aggregates the survivors. This prevents opposing fine-tune gradients from averaging each other into uselessness.
| TIES vs SLERP | Practical reading |
|---|---|
| Sign conflicts | TIES handles sign conflicts directly; SLERP does not |
| Multi-task interference | TIES is better suited to multi-task interference than a naive interpolation |
The practical implication: when you merge a Mistral 7B coding fine-tune with a reasoning fine-tune and an instruction-following fine-tune, TIES actively arbitrates directional conflicts rather than blindly averaging them.
Multi-model pruning and density-aware merges with DARE
DARE ("Drop And REscale") "randomly drops delta parameters and rescales the remaining ones to preserve expectations." This is a fundamentally different operation from SLERP's interpolation or TIES's sign-consensus aggregation. DARE operates on the delta weights (the difference between a fine-tuned model and its base) rather than the full weight tensors, which means it can sparsify the contribution of each donor model before merging.
| DARE behavior | Practical reading |
|---|---|
| Pruning and reweighting | DARE is pruning-and-reweighting, not interpolation |
| Where it wins | It outperforms naive interpolation when interference is high and you need sparsity control |
In mergekit-multi YAML recipes, the density parameter (e.g., density: 0.6) controls what fraction of each model's delta parameters survive the drop phase. This lets practitioners tune the interference reduction independently from the merge weights — a control knob that neither SLERP nor TIES expose.
DARE is not universally better than SLERP. It is better specifically when the merged models carry enough parameter-level interference that a 30–50% delta dropout actually reduces conflict without gutting the capability signal.
mergekit as the orchestration layer
mergekit does not compete with SLERP, TIES, or DARE — it implements all of them. The README describes "many merge methods," "Multi-Stage Merging (mergekit-multi)," and "Raw PyTorch Model Merging (mergekit-pytorch)" under one toolkit. The Merge Method Guide lists Linear, SLERP, TIES, DARE, and additional methods in a single reference.
| Capability | What it means in practice |
|---|---|
| SLERP support | Two-model spherical interpolation via YAML config |
| TIES support | Sign-consensus multi-model merging |
| DARE support | Density-parameterized delta pruning + rescaling |
| MoE-style merging | Route layers to different expert sources |
| mergekit-multi | Chain merge operations, handle dependencies, cache intermediates |
| mergekit-pytorch | Direct PyTorch tensor manipulation for custom methods |
Why mergekit matters for resource-constrained merges
mergekit is "engineered to facilitate the straightforward application of both current and forthcoming model merging techniques" and explicitly supports "GPU or CPU execution" with "lazy loading of tensors for low memory use." For practitioners without access to multi-GPU nodes, CPU out-of-core merging means a 70B-parameter merge is feasible on a high-RAM workstation — it's slow, but it completes without an NVIDIA H100 in the loop.
Pro Tip: When your VRAM budget can't fit two copies of your model simultaneously, run mergekit with CPU execution and lazy tensor loading. A Llama 3.1 70B merge in bfloat16 occupies ~140 GB of RAM in the worst case but can be streamed layer-by-layer via mergekit's out-of-core path without requiring that full allocation at once.
Where mergekit adds value beyond a single method
Calling SLERP or TIES directly as standalone scripts works for single-operation merges. The value of mergekit's multi-stage workflow system emerges when the merge itself is a pipeline: a DARE merge to reduce interference, followed by a SLERP blend of the result with a base model, followed by a final TIES merge with a third specialist.
| Feature | Standalone method call | mergekit-multi |
|---|---|---|
| Multi-step pipelines | Manual scripting | YAML-defined, dependency-resolved |
| Intermediate caching | Manual | Automatic |
| Method composition | Ad hoc | First-class workflow feature |
| Custom method extensions | Requires fork | Import via module discovery |
| Reproducibility | Script-dependent | Config-file portable |
mergekit-multi can "chain multiple merge operations together," "use outputs from previous merges as inputs to subsequent ones," and "automatically handle dependencies between merge steps." YAML-driven recipes make this reproducible and shareable on the Hugging Face Hub without custom scripting.
Method-by-method strengths and failure modes
| Method | Strengths | Failure modes | Ideal use case |
|---|---|---|---|
| SLERP | Norm-preserving blend, low complexity, clean two-model interpolation | No sign-conflict handling; capability washout with ≥3 models or dissimilar tasks | Fine-tune A + fine-tune B on the same task domain |
| TIES | Sign-consensus interference reduction, multi-model capable, retains individual strengths | No sparsity control; can't tune density of contribution | Multi-task merge with overlapping or conflicting fine-tune objectives |
| DARE | Density-tunable delta pruning, reduces interference at parameter level, composable with TIES | Over-pruning risk at low density; added hyperparameter to tune | High-interference multi-model merges where sign-consensus alone isn't enough |
TIES and DARE both target "resolve interference when merging multiple models" as their design objective. The mechanism differs: TIES uses sign consensus to decide which parameters survive, while DARE uses stochastic dropout of delta parameters before rescaling. Neither is a simple weighted average.
When SLERP is the wrong choice
Watch Out: SLERP applied sequentially across three or more fine-tuned Mistral 7B or Qwen2.5 checkpoints with distinct task specializations will average opposing task vectors together. The result is a model that loses the sharpest edges of each fine-tune without TIES's sign arbitration to protect them. This "capability washout" appears in evals as mediocre-but-broad performance — the model functions but no longer excels at any of the constituent tasks.
SLERP is "tailored for the combination of two models" by design, not by convention. When you scale to three or more dissimilar fine-tunes, the sequential interpolation path introduces order-dependence and progressively dilutes signal from earlier-merged models.
When TIES is the safer default
TIES is the right default whenever sign conflicts are plausible — which is any multi-model merge where the fine-tunes were trained on different datasets or with different loss objectives. "Resolves sign disagreements across the models being merged" is the core guarantee, and it holds for any number of input models, not just pairs.
Pro Tip: When merging three or more models where you suspect task overlap (e.g., two reasoning fine-tunes plus a coding fine-tune of Llama 3.1), default to TIES before trying DARE. TIES introduces no additional density hyperparameter, which reduces the tuning surface for your first merge attempt. Add DARE on top only if TIES evals show residual interference.
When DARE earns its place
DARE "randomly drops delta parameters and rescales the remaining ones to preserve expectations" — the rescaling step is critical, because naive dropout without rescaling would shrink the effective magnitude of each model's contribution. The density control (density: 0.6 to retain 60% of each model's deltas) lets you explicitly trade contribution breadth for interference reduction.
Watch Out: Setting density below 0.4 risks over-pruning: you may drop parameter updates that carry real task signal, not just interference noise. Start at
density: 0.7and lower incrementally while monitoring task-specific evals. A model that scores well on MMLU after an aggressive DARE merge may have lost meaningful GSM8K or HumanEval capability that the delta dropout discarded.
Decision matrix for merge choice
| Scenario | Model count | Interference severity | Recommended method | mergekit config entry |
|---|---|---|---|---|
| Two similar fine-tunes, same domain | 2 | Low | SLERP | merge_method: slerp |
| Multiple fine-tunes, overlapping tasks | 3–6 | Medium | TIES | merge_method: ties |
| Multiple fine-tunes, high conflict | 3–6 | High | DARE (+ optional TIES) | merge_method: dare_ties |
| Sequential / multi-stage merge pipeline | Any | Any | mergekit-multi workflow | YAML pipeline with method per stage |
Choose SLERP when
SLERP is correct when you have exactly two models that share a base architecture, the fine-tunes target the same or adjacent domains, and you want the lowest-complexity merge path. Its norm-preserving interpolation "preserves norm on a hypersphere" and consistently outperforms linear interpolation in the two-model regime. If your merge is base-model + one fine-tune with a single t parameter to tune, SLERP is the right call.
Choose TIES when
TIES is correct when model count exceeds two, when the fine-tunes were trained on different tasks or datasets, or when you have any reason to believe their delta parameters pull in conflicting directions. The "sign consensus algorithm" that TIES applies directly addresses the structural failure mode of SLERP at multi-model scale. TIES over SLERP is not a universal upgrade — it's a fit-for-purpose choice when "sign disagreements across the models being merged" are the problem.
Choose DARE when
DARE is correct when TIES alone doesn't sufficiently suppress interference, or when you specifically need to control the sparsity of each model's delta contribution. If your TIES merge still shows benchmark regressions on held-out tasks, DARE's density parameter gives you a lever to reduce parameter-level noise that sign-consensus alone can't eliminate. DARE is not universally better than TIES — it adds a tuning dimension (density) that TIES avoids.
Benchmark signals to validate your choice
No single canonical 2026 benchmark table exists comparing SLERP, TIES, and DARE head-to-head across MMLU, GSM8K, and HumanEval on a standardized model family. The mergekit paper establishes that "model merging facilitates the creation of multitask models without the need for additional training" but does not publish a unified leaderboard. Method-specific papers report results on their own experimental setups, which use different base models and fine-tune pairs.
The practical consequence: you cannot rely on published benchmark deltas to validate your specific merge. You must run your own evals.
| Eval target | What it measures post-merge | Why it matters for merge validation |
|---|---|---|
| MMLU (5-shot) | General knowledge breadth | Detects capability washout from aggressive merging |
| GSM8K | Step-by-step math reasoning | Sensitive to sign-conflict degradation in reasoning fine-tunes |
| HumanEval | Code generation correctness | Identifies when a coding fine-tune's delta got over-pruned by DARE |
| Domain-specific held-out set | Task your merge was built for | Ground truth for whether the merge achieved its objective |
| ### What to measure in your own evals |
Run evals on the individual source models before merging to establish per-task baselines. Then run the same eval suite on the merged model. The delta — positive or negative on each task — is your signal. mergekit's design goal of enabling "multitask models without the need for additional training" only holds if the merged model retains meaningful scores across all constituent tasks.
Prioritize task-specific evals over aggregate benchmarks. A merged Mistral 7B that scores 2 points higher on MMLU but 8 points lower on GSM8K has not been improved — it has been transformed into a different model with a different capability profile.
How to read benchmark regressions
Pro Tip: A benchmark gain on one task after a TIES or DARE merge does not confirm the merge succeeded. Because both methods "resolve interference when merging multiple models" by selectively suppressing parameters, they can increase performance on the "winning" task while discarding signal from the "losing" task's deltas. Always evaluate the full task suite, not just the task you optimized for.
A regression on a disjoint task (coding drops while reasoning improves) is the canonical signal that sign-consensus or delta-dropping went too far in one direction. DARE's density parameter is your primary lever for rebalancing; TIES's task weight parameters serve a similar function.
People also ask about model merging
| Question | Answer | Source |
|---|---|---|
| What is the difference between SLERP and TIES? | SLERP spherically interpolates between exactly two models with no interference handling. TIES applies sign-consensus across ≥2 models to resolve directional conflicts between delta parameters. | arXiv:2603.04972, arXiv:2306.01708 |
| Can TIES merge more than two models? | Yes. TIES is designed specifically for "combining multiple task-specific models into a single multitask model." | arXiv:2306.01708 |
| Is DARE better than SLERP? | DARE is better than SLERP for multi-model merges with high interference. For simple two-model blends, SLERP is lower complexity with no hyperparameter to tune. Neither dominates universally. | arXiv:2403.13257, arXiv:2408.03092 |
| Which merge method is best for multiple models? | TIES is the safer default for ≥3 models. Add DARE when sign-consensus alone doesn't suppress interference sufficiently. | arXiv:2306.01708 |
What is mergekit used for in practice?
mergekit is "Tools for merging pretrained large language models" — specifically the toolkit that practitioners use to run SLERP, TIES, DARE, and multi-stage merge pipelines on local hardware without writing custom PyTorch merging scripts.
Pro Tip: mergekit's primary practical value for resource-constrained teams is removing the infrastructure overhead of model merging. You define the merge recipe in YAML, point mergekit at the source model directories or Hugging Face Hub paths, and run it — on CPU if necessary. No training loop, no CUDA kernel tuning, no bespoke gradient checkpointing.
Does mergekit support CPU-only merging?
The mergekit README explicitly lists "GPU or CPU execution" and "lazy loading of tensors for low memory use" as supported operational modes. CPU execution is real and documented.
Watch Out: CPU support does not eliminate model-size constraints. Merging two 70B-parameter models in bfloat16 still requires enough system RAM to hold the tensors being processed at any given moment — lazy loading reduces peak RAM by streaming layer-by-layer, but it does not make the merge free. A machine with 64 GB RAM will struggle with 70B merges even with out-of-core loading. CPU merges are also orders of magnitude slower than GPU merges for large models.
Sources and references
Canonical references for merge methods
| Source | URL | What it covers |
|---|---|---|
| mergekit README | github.com/arcee-ai/mergekit | Toolkit overview, CPU/GPU support, feature list |
| mergekit Merge Method Guide | docs/merge_methods.md | SLERP, TIES, DARE, Linear, and additional method documentation |
| mergekit-multi docs | docs/multimerge.md | Multi-stage workflow chaining, dependency handling, density recipes |
| mergekit custom method guide | docs/create_a_merge_method.md | Extensibility via module discovery |
| TIES-Merging paper | arXiv:2306.01708 | Original TIES method, sign-consensus algorithm, multi-model interference framing |
| Arcee MergeKit paper | arXiv:2403.13257 | mergekit system design, DARE reference, multitask merge framing |
| DARE reference | arXiv:2403.13257 | Drop And REscale method description |
| DARE review coverage | arXiv:2603.09938v2 | 2026 survey covering DARE delta-drop and rescaling behavior |
| SLERP geodesic merging | arXiv:2603.04972v1 | Fisher–Rao manifold framing, two-model geodesic characterization |
| EXTEND paper | arXiv:2408.03092 | "SLERP is tailored for the combination of two models" — direct citation |
Keywords: mergekit, SLERP, TIES, DARE, Llama 3.1 70B, Mistral 7B, Qwen2.5, Hugging Face Hub, PyTorch, CPU out-of-core merging, low-memory tensor loading, MoE-style merging, NVIDIA H100, GitHub README, mergekit-multi



