Skip to content
AxiomLogicaSearch
AI & ML

mergekit vs TIES vs DARE vs SLERP: which model-merging method should you use in 2026?

At small scale, SLERP is clean for two-model interpolation and TIES/DARE handle multi-model interference better, while mergekit is the orchestration layer that exposes them all — but the best choice changes with model count, compatibility, and whether you want a simple blend or sign-aware pruning.

mergekit vs TIES vs DARE vs SLERP: which model-merging method should you use in 2026?
mergekit vs TIES vs DARE vs SLERP: which model-merging method should you use in 2026?

How We Compared mergekit, TIES, DARE, and SLERP

The current SERP for this question fragments answers across mergekit repository docs, community merge recipes, and individual method papers. This article consolidates those into a single decision framework keyed to the two variables that actually determine which method wins: how many models you're merging and how much interference exists between their task vectors.

mergekit is the orchestration layer — it is not a competing algorithm but the toolkit that exposes SLERP, TIES, DARE, MoE-style merging, and multi-stage workflows under one YAML-driven interface. The methods are the algorithms; mergekit is how you run them. Conflating the two is the most common source of confusion in this space.

Criterion SLERP TIES DARE mergekit
Model count 2 models only 2+ models 2+ models All of the above
Interference handling None (naive blend) Sign-consensus pruning Delta-drop + rescaling Method-dependent
Compatibility required Same architecture Same architecture Same architecture Same architecture
Merge complexity Low Medium Medium–High Low to High (workflow-dependent)
Primary use case Simple interpolation Multi-task conflict resolution Sparsity-aware interference reduction Orchestration + method chaining

Decision criteria that matter in 2026

Four variables drive method fit: model count, sign conflicts between fine-tuned delta vectors, density settings when using sparsification-aware methods, and whether the target merge is a simple parameter blend or a sign-aware pruning operation.

mergekit's merge method guide encodes all four dimensions in its method guide. TIES resolves interference by "trimming low-magnitude changes in fine-tuned model's values and then resolves sign disagreements across the models being merged" — meaning it actively prunes parameter updates that conflict directionally before aggregating. DARE adds a second level of control: density, configurable as a decimal (e.g., density: 0.6 in mergekit-multi recipes), which governs what fraction of delta parameters survive before rescaling. SLERP has no pruning stage; it spherically interpolates between two weight sets directly.

Variable SLERP TIES DARE
Model count target Exactly 2 ≥2 ≥2
Sign-conflict resolution No Yes (sign consensus) Yes (via drop)
Density/sparsity control No No Yes (density parameter)
Merge type Simple blend Sign-aware merge Pruning + rescaling

What we excluded from the comparison

This article does not compare hardware minimums, deployment versions of Llama 3.1 70B or Qwen2.5 checkpoints, or container orchestration options. Those are operational concerns orthogonal to algorithm selection.

mergekit's multi-stage workflow system (mergekit-multi) "topologically sorts your merge configurations to determine the correct order of execution" and can "cache intermediate results for faster re-runs" — useful operational features, but not algorithm-selection criteria.

Pro Tip: If you're evaluating merge methods, separate algorithm choice from infrastructure choice. Whether you're running on a single A100 or an NVIDIA H100 cluster does not change whether TIES or SLERP is the right method for your task vector interference profile.


At a glance: when each method fits best

At a Glance: Two similar models with low task conflict → SLERP. Multiple models or overlapping task domains → TIES. High interference + need for sparsity control → DARE. Complex multi-stage pipelines → mergekit-multi with TIES or DARE.

SLERP is the right tool when you have exactly two models and their capabilities don't strongly conflict. TIES is the safer default the moment you add a third model or know the fine-tunes pull in different directions. DARE adds a sparsification step on top of the merge that actively drops and rescales delta parameters — it earns its place when interference is high enough that even sign-consensus merging leaves too much noise in the combined weights.

Two-model interpolation with SLERP

SLERP ("spherical linear interpolation") operates on the surface of a high-dimensional hypersphere, preserving the norm of weight vectors through the blend. Research on geodesic merging on the Fisher–Rao manifold characterizes these as "two-model geodesics" and notes that SLERP "preserves norm on a hypersphere and often outperforms linear interpolation for two models". The EXTEND paper is explicit: "SLERP is tailored for the combination of two models."

SLERP fit Practical reading
Cleanest two-model blend Best when you are interpolating between exactly two compatible models
Model-count limit Breaks down as model count rises because sequential application becomes order-sensitive

That constraint is structural, not incidental. SLERP's formulation assumes a single interpolation parameter t ∈ [0, 1] between two endpoints. Extending it to three or more models requires sequential application — which introduces order sensitivity and loses the sign-conflict awareness that TIES and DARE were designed to provide.

Multi-model merges with TIES

Yes, TIES is explicitly designed for merging more than two models. Its paper frames the problem as combining "multiple task-specific models into a single multitask model without performing additional training." The mechanism that makes this work at scale is the "sign consensus algorithm" implemented in mergekit: for each parameter, TIES determines the dominant sign direction across all merged models and zeros out parameters that disagree, then aggregates the survivors. This prevents opposing fine-tune gradients from averaging each other into uselessness.

TIES vs SLERP Practical reading
Sign conflicts TIES handles sign conflicts directly; SLERP does not
Multi-task interference TIES is better suited to multi-task interference than a naive interpolation

The practical implication: when you merge a Mistral 7B coding fine-tune with a reasoning fine-tune and an instruction-following fine-tune, TIES actively arbitrates directional conflicts rather than blindly averaging them.

Multi-model pruning and density-aware merges with DARE

DARE ("Drop And REscale") "randomly drops delta parameters and rescales the remaining ones to preserve expectations." This is a fundamentally different operation from SLERP's interpolation or TIES's sign-consensus aggregation. DARE operates on the delta weights (the difference between a fine-tuned model and its base) rather than the full weight tensors, which means it can sparsify the contribution of each donor model before merging.

DARE behavior Practical reading
Pruning and reweighting DARE is pruning-and-reweighting, not interpolation
Where it wins It outperforms naive interpolation when interference is high and you need sparsity control

In mergekit-multi YAML recipes, the density parameter (e.g., density: 0.6) controls what fraction of each model's delta parameters survive the drop phase. This lets practitioners tune the interference reduction independently from the merge weights — a control knob that neither SLERP nor TIES expose.

DARE is not universally better than SLERP. It is better specifically when the merged models carry enough parameter-level interference that a 30–50% delta dropout actually reduces conflict without gutting the capability signal.


mergekit as the orchestration layer

mergekit does not compete with SLERP, TIES, or DARE — it implements all of them. The README describes "many merge methods," "Multi-Stage Merging (mergekit-multi)," and "Raw PyTorch Model Merging (mergekit-pytorch)" under one toolkit. The Merge Method Guide lists Linear, SLERP, TIES, DARE, and additional methods in a single reference.

Capability What it means in practice
SLERP support Two-model spherical interpolation via YAML config
TIES support Sign-consensus multi-model merging
DARE support Density-parameterized delta pruning + rescaling
MoE-style merging Route layers to different expert sources
mergekit-multi Chain merge operations, handle dependencies, cache intermediates
mergekit-pytorch Direct PyTorch tensor manipulation for custom methods

Why mergekit matters for resource-constrained merges

mergekit is "engineered to facilitate the straightforward application of both current and forthcoming model merging techniques" and explicitly supports "GPU or CPU execution" with "lazy loading of tensors for low memory use." For practitioners without access to multi-GPU nodes, CPU out-of-core merging means a 70B-parameter merge is feasible on a high-RAM workstation — it's slow, but it completes without an NVIDIA H100 in the loop.

Pro Tip: When your VRAM budget can't fit two copies of your model simultaneously, run mergekit with CPU execution and lazy tensor loading. A Llama 3.1 70B merge in bfloat16 occupies ~140 GB of RAM in the worst case but can be streamed layer-by-layer via mergekit's out-of-core path without requiring that full allocation at once.

Where mergekit adds value beyond a single method

Calling SLERP or TIES directly as standalone scripts works for single-operation merges. The value of mergekit's multi-stage workflow system emerges when the merge itself is a pipeline: a DARE merge to reduce interference, followed by a SLERP blend of the result with a base model, followed by a final TIES merge with a third specialist.

Feature Standalone method call mergekit-multi
Multi-step pipelines Manual scripting YAML-defined, dependency-resolved
Intermediate caching Manual Automatic
Method composition Ad hoc First-class workflow feature
Custom method extensions Requires fork Import via module discovery
Reproducibility Script-dependent Config-file portable

mergekit-multi can "chain multiple merge operations together," "use outputs from previous merges as inputs to subsequent ones," and "automatically handle dependencies between merge steps." YAML-driven recipes make this reproducible and shareable on the Hugging Face Hub without custom scripting.


Method-by-method strengths and failure modes

Method Strengths Failure modes Ideal use case
SLERP Norm-preserving blend, low complexity, clean two-model interpolation No sign-conflict handling; capability washout with ≥3 models or dissimilar tasks Fine-tune A + fine-tune B on the same task domain
TIES Sign-consensus interference reduction, multi-model capable, retains individual strengths No sparsity control; can't tune density of contribution Multi-task merge with overlapping or conflicting fine-tune objectives
DARE Density-tunable delta pruning, reduces interference at parameter level, composable with TIES Over-pruning risk at low density; added hyperparameter to tune High-interference multi-model merges where sign-consensus alone isn't enough

TIES and DARE both target "resolve interference when merging multiple models" as their design objective. The mechanism differs: TIES uses sign consensus to decide which parameters survive, while DARE uses stochastic dropout of delta parameters before rescaling. Neither is a simple weighted average.

When SLERP is the wrong choice

Watch Out: SLERP applied sequentially across three or more fine-tuned Mistral 7B or Qwen2.5 checkpoints with distinct task specializations will average opposing task vectors together. The result is a model that loses the sharpest edges of each fine-tune without TIES's sign arbitration to protect them. This "capability washout" appears in evals as mediocre-but-broad performance — the model functions but no longer excels at any of the constituent tasks.

SLERP is "tailored for the combination of two models" by design, not by convention. When you scale to three or more dissimilar fine-tunes, the sequential interpolation path introduces order-dependence and progressively dilutes signal from earlier-merged models.

When TIES is the safer default

TIES is the right default whenever sign conflicts are plausible — which is any multi-model merge where the fine-tunes were trained on different datasets or with different loss objectives. "Resolves sign disagreements across the models being merged" is the core guarantee, and it holds for any number of input models, not just pairs.

Pro Tip: When merging three or more models where you suspect task overlap (e.g., two reasoning fine-tunes plus a coding fine-tune of Llama 3.1), default to TIES before trying DARE. TIES introduces no additional density hyperparameter, which reduces the tuning surface for your first merge attempt. Add DARE on top only if TIES evals show residual interference.

When DARE earns its place

DARE "randomly drops delta parameters and rescales the remaining ones to preserve expectations" — the rescaling step is critical, because naive dropout without rescaling would shrink the effective magnitude of each model's contribution. The density control (density: 0.6 to retain 60% of each model's deltas) lets you explicitly trade contribution breadth for interference reduction.

Watch Out: Setting density below 0.4 risks over-pruning: you may drop parameter updates that carry real task signal, not just interference noise. Start at density: 0.7 and lower incrementally while monitoring task-specific evals. A model that scores well on MMLU after an aggressive DARE merge may have lost meaningful GSM8K or HumanEval capability that the delta dropout discarded.


Decision matrix for merge choice

Scenario Model count Interference severity Recommended method mergekit config entry
Two similar fine-tunes, same domain 2 Low SLERP merge_method: slerp
Multiple fine-tunes, overlapping tasks 3–6 Medium TIES merge_method: ties
Multiple fine-tunes, high conflict 3–6 High DARE (+ optional TIES) merge_method: dare_ties
Sequential / multi-stage merge pipeline Any Any mergekit-multi workflow YAML pipeline with method per stage

Choose SLERP when

SLERP is correct when you have exactly two models that share a base architecture, the fine-tunes target the same or adjacent domains, and you want the lowest-complexity merge path. Its norm-preserving interpolation "preserves norm on a hypersphere" and consistently outperforms linear interpolation in the two-model regime. If your merge is base-model + one fine-tune with a single t parameter to tune, SLERP is the right call.

Choose TIES when

TIES is correct when model count exceeds two, when the fine-tunes were trained on different tasks or datasets, or when you have any reason to believe their delta parameters pull in conflicting directions. The "sign consensus algorithm" that TIES applies directly addresses the structural failure mode of SLERP at multi-model scale. TIES over SLERP is not a universal upgrade — it's a fit-for-purpose choice when "sign disagreements across the models being merged" are the problem.

Choose DARE when

DARE is correct when TIES alone doesn't sufficiently suppress interference, or when you specifically need to control the sparsity of each model's delta contribution. If your TIES merge still shows benchmark regressions on held-out tasks, DARE's density parameter gives you a lever to reduce parameter-level noise that sign-consensus alone can't eliminate. DARE is not universally better than TIES — it adds a tuning dimension (density) that TIES avoids.


Benchmark signals to validate your choice

No single canonical 2026 benchmark table exists comparing SLERP, TIES, and DARE head-to-head across MMLU, GSM8K, and HumanEval on a standardized model family. The mergekit paper establishes that "model merging facilitates the creation of multitask models without the need for additional training" but does not publish a unified leaderboard. Method-specific papers report results on their own experimental setups, which use different base models and fine-tune pairs.

The practical consequence: you cannot rely on published benchmark deltas to validate your specific merge. You must run your own evals.

Eval target What it measures post-merge Why it matters for merge validation
MMLU (5-shot) General knowledge breadth Detects capability washout from aggressive merging
GSM8K Step-by-step math reasoning Sensitive to sign-conflict degradation in reasoning fine-tunes
HumanEval Code generation correctness Identifies when a coding fine-tune's delta got over-pruned by DARE
Domain-specific held-out set Task your merge was built for Ground truth for whether the merge achieved its objective
### What to measure in your own evals

Run evals on the individual source models before merging to establish per-task baselines. Then run the same eval suite on the merged model. The delta — positive or negative on each task — is your signal. mergekit's design goal of enabling "multitask models without the need for additional training" only holds if the merged model retains meaningful scores across all constituent tasks.

Prioritize task-specific evals over aggregate benchmarks. A merged Mistral 7B that scores 2 points higher on MMLU but 8 points lower on GSM8K has not been improved — it has been transformed into a different model with a different capability profile.

How to read benchmark regressions

Pro Tip: A benchmark gain on one task after a TIES or DARE merge does not confirm the merge succeeded. Because both methods "resolve interference when merging multiple models" by selectively suppressing parameters, they can increase performance on the "winning" task while discarding signal from the "losing" task's deltas. Always evaluate the full task suite, not just the task you optimized for.

A regression on a disjoint task (coding drops while reasoning improves) is the canonical signal that sign-consensus or delta-dropping went too far in one direction. DARE's density parameter is your primary lever for rebalancing; TIES's task weight parameters serve a similar function.


People also ask about model merging

Question Answer Source
What is the difference between SLERP and TIES? SLERP spherically interpolates between exactly two models with no interference handling. TIES applies sign-consensus across ≥2 models to resolve directional conflicts between delta parameters. arXiv:2603.04972, arXiv:2306.01708
Can TIES merge more than two models? Yes. TIES is designed specifically for "combining multiple task-specific models into a single multitask model." arXiv:2306.01708
Is DARE better than SLERP? DARE is better than SLERP for multi-model merges with high interference. For simple two-model blends, SLERP is lower complexity with no hyperparameter to tune. Neither dominates universally. arXiv:2403.13257, arXiv:2408.03092
Which merge method is best for multiple models? TIES is the safer default for ≥3 models. Add DARE when sign-consensus alone doesn't suppress interference sufficiently. arXiv:2306.01708

What is mergekit used for in practice?

mergekit is "Tools for merging pretrained large language models" — specifically the toolkit that practitioners use to run SLERP, TIES, DARE, and multi-stage merge pipelines on local hardware without writing custom PyTorch merging scripts.

Pro Tip: mergekit's primary practical value for resource-constrained teams is removing the infrastructure overhead of model merging. You define the merge recipe in YAML, point mergekit at the source model directories or Hugging Face Hub paths, and run it — on CPU if necessary. No training loop, no CUDA kernel tuning, no bespoke gradient checkpointing.

Does mergekit support CPU-only merging?

The mergekit README explicitly lists "GPU or CPU execution" and "lazy loading of tensors for low memory use" as supported operational modes. CPU execution is real and documented.

Watch Out: CPU support does not eliminate model-size constraints. Merging two 70B-parameter models in bfloat16 still requires enough system RAM to hold the tensors being processed at any given moment — lazy loading reduces peak RAM by streaming layer-by-layer, but it does not make the merge free. A machine with 64 GB RAM will struggle with 70B merges even with out-of-core loading. CPU merges are also orders of magnitude slower than GPU merges for large models.

Sources and references

Canonical references for merge methods

Source URL What it covers
mergekit README github.com/arcee-ai/mergekit Toolkit overview, CPU/GPU support, feature list
mergekit Merge Method Guide docs/merge_methods.md SLERP, TIES, DARE, Linear, and additional method documentation
mergekit-multi docs docs/multimerge.md Multi-stage workflow chaining, dependency handling, density recipes
mergekit custom method guide docs/create_a_merge_method.md Extensibility via module discovery
TIES-Merging paper arXiv:2306.01708 Original TIES method, sign-consensus algorithm, multi-model interference framing
Arcee MergeKit paper arXiv:2403.13257 mergekit system design, DARE reference, multitask merge framing
DARE reference arXiv:2403.13257 Drop And REscale method description
DARE review coverage arXiv:2603.09938v2 2026 survey covering DARE delta-drop and rescaling behavior
SLERP geodesic merging arXiv:2603.04972v1 Fisher–Rao manifold framing, two-model geodesic characterization
EXTEND paper arXiv:2408.03092 "SLERP is tailored for the combination of two models" — direct citation

Keywords: mergekit, SLERP, TIES, DARE, Llama 3.1 70B, Mistral 7B, Qwen2.5, Hugging Face Hub, PyTorch, CPU out-of-core merging, low-memory tensor loading, MoE-style merging, NVIDIA H100, GitHub README, mergekit-multi

Was this guide helpful?

The weekly brief.

One email each Sunday with what we tested, what we'd buy, and what to skip. No filler.

Share: X · LinkedIn · Reddit