AI & ML

mergekit vs TIES vs DARE vs SLERP: which model-merging method should you use in 2026?

At small scale, SLERP is clean for two-model interpolation and TIES/DARE handle multi-model interference better, while mergekit is the orchestration layer that exposes them all — but the best choice changes with model count, compatibility, and whether you want a simple blend or sign-aware pruning.

By AxiomLogica Editorial

May 18, 202617 min read

Reviewed by Editorial

mergekit vs TIES vs DARE vs SLERP: which model-merging method should you use in 2026?

How We Compared mergekit, TIES, DARE, and SLERP

The current SERP for this question fragments answers across mergekit repository docs, community merge recipes, and individual method papers. This article consolidates those into a single decision framework keyed to the two variables that actually determine which method wins: how many models you're merging and how much interference exists between their task vectors.

mergekit is the orchestration layer — it is not a competing algorithm but the toolkit that exposes SLERP, TIES, DARE, MoE-style merging, and multi-stage workflows under one YAML-driven interface. The methods are the algorithms; mergekit is how you run them. Conflating the two is the most common source of confusion in this space.

Criterion	SLERP	TIES	DARE	mergekit
Model count	2 models only	2+ models	2+ models	All of the above
Interference handling	None (naive blend)	Sign-consensus pruning	Delta-drop + rescaling	Method-dependent
Compatibility required	Same architecture	Same architecture	Same architecture	Same architecture
Merge complexity	Low	Medium	Medium–High	Low to High (workflow-dependent)
Primary use case	Simple interpolation	Multi-task conflict resolution	Sparsity-aware interference reduction	Orchestration + method chaining

Decision criteria that matter in 2026

Four variables drive method fit: model count, sign conflicts between fine-tuned delta vectors, density settings when using sparsification-aware methods, and whether the target merge is a simple parameter blend or a sign-aware pruning operation.

mergekit's merge method guide encodes all four dimensions in its method guide. TIES resolves interference by "trimming low-magnitude changes in fine-tuned model's values and then resolves sign disagreements across the models being merged" — meaning it actively prunes parameter updates that conflict directionally before aggregating. DARE adds a second level of control: density, configurable as a decimal (e.g., density: 0.6 in mergekit-multi recipes), which governs what fraction of delta parameters survive before rescaling. SLERP has no pruning stage; it spherically interpolates between two weight sets directly.

Variable	SLERP	TIES	DARE
Model count target	Exactly 2	≥2	≥2
Sign-conflict resolution	No	Yes (sign consensus)	Yes (via drop)
Density/sparsity control	No	No	Yes (density parameter)
Merge type	Simple blend	Sign-aware merge	Pruning + rescaling

What we excluded from the comparison

This article does not compare hardware minimums, deployment versions of Llama 3.1 70B or Qwen2.5 checkpoints, or container orchestration options. Those are operational concerns orthogonal to algorithm selection.

mergekit's multi-stage workflow system (mergekit-multi) "topologically sorts your merge configurations to determine the correct order of execution" and can "cache intermediate results for faster re-runs" — useful operational features, but not algorithm-selection criteria.

Pro Tip: If you're evaluating merge methods, separate algorithm choice from infrastructure choice. Whether you're running on a single A100 or an NVIDIA H100 cluster does not change whether TIES or SLERP is the right method for your task vector interference profile.

At a glance: when each method fits best

At a Glance: Two similar models with low task conflict → SLERP. Multiple models or overlapping task domains → TIES. High interference + need for sparsity control → DARE. Complex multi-stage pipelines → mergekit-multi with TIES or DARE.

SLERP is the right tool when you have exactly two models and their capabilities don't strongly conflict. TIES is the safer default the moment you add a third model or know the fine-tunes pull in different directions. DARE adds a sparsification step on top of the merge that actively drops and rescales delta parameters — it earns its place when interference is high enough that even sign-consensus merging leaves too much noise in the combined weights.

Two-model interpolation with SLERP

SLERP ("spherical linear interpolation") operates on the surface of a high-dimensional hypersphere, preserving the norm of weight vectors through the blend. Research on geodesic merging on the Fisher–Rao manifold characterizes these as "two-model geodesics" and notes that SLERP "preserves norm on a hypersphere and often outperforms linear interpolation for two models". The EXTEND paper is explicit: "SLERP is tailored for the combination of two models."

SLERP fit	Practical reading
Cleanest two-model blend	Best when you are interpolating between exactly two compatible models
Model-count limit	Breaks down as model count rises because sequential application becomes order-sensitive

That constraint is structural, not incidental. SLERP's formulation assumes a single interpolation parameter t ∈ [0, 1] between two endpoints. Extending it to three or more models requires sequential application — which introduces order sensitivity and loses the sign-conflict awareness that TIES and DARE were designed to provide.

Multi-model merges with TIES

Yes, TIES is explicitly designed for merging more than two models. Its paper frames the problem as combining "multiple task-specific models into a single multitask model without performing additional training." The mechanism that makes this work at scale is the "sign consensus algorithm" implemented in mergekit: for each parameter, TIES determines the dominant sign direction across all merged models and zeros out parameters that disagree, then aggregates the survivors. This prevents opposing fine-tune gradients from averaging each other into uselessness.

TIES vs SLERP	Practical reading
Sign conflicts	TIES handles sign conflicts directly; SLERP does not
Multi-task interference	TIES is better suited to multi-task interference than a naive interpolation

The practical implication: when you merge a Mistral 7B coding fine-tune with a reasoning fine-tune and an instruction-following fine-tune, TIES actively arbitrates directional conflicts rather than blindly averaging them.

Multi-model pruning and density-aware merges with DARE

DARE ("Drop And REscale") "randomly drops delta parameters and rescales the remaining ones to preserve expectations." This is a fundamentally different operation from SLERP's interpolation or TIES's sign-consensus aggregation. DARE operates on the delta weights (the difference between a fine-tuned model and its base) rather than the full weight tensors, which means it can sparsify the contribution of each donor model before merging.

DARE behavior	Practical reading
Pruning and reweighting	DARE is pruning-and-reweighting, not interpolation
Where it wins	It outperforms naive interpolation when interference is high and you need sparsity control

In mergekit-multi YAML recipes, the density parameter (e.g., density: 0.6) controls what fraction of each model's delta parameters survive the drop phase. This lets practitioners tune the interference reduction independently from the merge weights — a control knob that neither SLERP nor TIES expose.

DARE is not universally better than SLERP. It is better specifically when the merged models carry enough parameter-level interference that a 30–50% delta dropout actually reduces conflict without gutting the capability signal.

mergekit as the orchestration layer

mergekit does not compete with SLERP, TIES, or DARE — it implements all of them. The README describes "many merge methods," "Multi-Stage Merging (mergekit-multi)," and "Raw PyTorch Model Merging (mergekit-pytorch)" under one toolkit. The Merge Method Guide lists Linear, SLERP, TIES, DARE, and additional methods in a single reference.

Capability	What it means in practice
SLERP support	Two-model spherical interpolation via YAML config
TIES support	Sign-consensus multi-model merging
DARE support	Density-parameterized delta pruning + rescaling
MoE-style merging	Route layers to different expert sources
mergekit-multi	Chain merge operations, handle dependencies, cache intermediates
mergekit-pytorch	Direct PyTorch tensor manipulation for custom methods

Why mergekit matters for resource-constrained merges

mergekit is "engineered to facilitate the straightforward application of both current and forthcoming model merging techniques" and explicitly supports "GPU or CPU execution" with "lazy loading of tensors for low memory use." For practitioners without access to multi-GPU nodes, CPU out-of-core merging means a 70B-parameter merge is feasible on a high-RAM workstation — it's slow, but it completes without an NVIDIA H100 in the loop.

Pro Tip: When your VRAM budget can't fit two copies of your model simultaneously, run mergekit with CPU execution and lazy tensor loading. A Llama 3.1 70B merge in bfloat16 occupies ~140 GB of RAM in the worst case but can be streamed layer-by-layer via mergekit's out-of-core path without requiring that full allocation at once.

Where mergekit adds value beyond a single method

Calling SLERP or TIES directly as standalone scripts works for single-operation merges. The value of mergekit's multi-stage workflow system emerges when the merge itself is a pipeline: a DARE merge to reduce interference, followed by a SLERP blend of the result with a base model, followed by a final TIES merge with a third specialist.

Feature	Standalone method call	mergekit-multi
Multi-step pipelines	Manual scripting	YAML-defined, dependency-resolved
Intermediate caching	Manual	Automatic
Method composition	Ad hoc	First-class workflow feature
Custom method extensions	Requires fork	Import via module discovery
Reproducibility	Script-dependent	Config-file portable

mergekit-multi can "chain multiple merge operations together," "use outputs from previous merges as inputs to subsequent ones," and "automatically handle dependencies between merge steps." YAML-driven recipes make this reproducible and shareable on the Hugging Face Hub without custom scripting.

Method-by-method strengths and failure modes

Method	Strengths	Failure modes	Ideal use case
SLERP	Norm-preserving blend, low complexity, clean two-model interpolation	No sign-conflict handling; capability washout with ≥3 models or dissimilar tasks	Fine-tune A + fine-tune B on the same task domain
TIES	Sign-consensus interference reduction, multi-model capable, retains individual strengths	No sparsity control; can't tune density of contribution	Multi-task merge with overlapping or conflicting fine-tune objectives
DARE	Density-tunable delta pruning, reduces interference at parameter level, composable with TIES	Over-pruning risk at low density; added hyperparameter to tune	High-interference multi-model merges where sign-consensus alone isn't enough

TIES and DARE both target "resolve interference when merging multiple models" as their design objective. The mechanism differs: TIES uses sign consensus to decide which parameters survive, while DARE uses stochastic dropout of delta parameters before rescaling. Neither is a simple weighted average.

When SLERP is the wrong choice

Watch Out: SLERP applied sequentially across three or more fine-tuned Mistral 7B or Qwen2.5 checkpoints with distinct task specializations will average opposing task vectors together. The result is a model that loses the sharpest edges of each fine-tune without TIES's sign arbitration to protect them. This "capability washout" appears in evals as mediocre-but-broad performance — the model functions but no longer excels at any of the constituent tasks.

SLERP is "tailored for the combination of two models" by design, not by convention. When you scale to three or more dissimilar fine-tunes, the sequential interpolation path introduces order-dependence and progressively dilutes signal from earlier-merged models.

When TIES is the safer default

TIES is the right default whenever sign conflicts are plausible — which is any multi-model merge where the fine-tunes were trained on different datasets or with different loss objectives. "Resolves sign disagreements across the models being merged" is the core guarantee, and it holds for any number of input models, not just pairs.

Pro Tip: When merging three or more models where you suspect task overlap (e.g., two reasoning fine-tunes plus a coding fine-tune of Llama 3.1), default to TIES before trying DARE. TIES introduces no additional density hyperparameter, which reduces the tuning surface for your first merge attempt. Add DARE on top only if TIES evals show residual interference.

When DARE earns its place

DARE "randomly drops delta parameters and rescales the remaining ones to preserve expectations" — the rescaling step is critical, because naive dropout without rescaling would shrink the effective magnitude of each model's contribution. The density control (density: 0.6 to retain 60% of each model's deltas) lets you explicitly trade contribution breadth for interference reduction.

Watch Out: Setting density below 0.4 risks over-pruning: you may drop parameter updates that carry real task signal, not just interference noise. Start at density: 0.7 and lower incrementally while monitoring task-specific evals. A model that scores well on MMLU after an aggressive DARE merge may have lost meaningful GSM8K or HumanEval capability that the delta dropout discarded.

Decision matrix for merge choice

Scenario	Model count	Interference severity	Recommended method	mergekit config entry
Two similar fine-tunes, same domain	2	Low	SLERP	`merge_method: slerp`
Multiple fine-tunes, overlapping tasks	3–6	Medium	TIES	`merge_method: ties`
Multiple fine-tunes, high conflict	3–6	High	DARE (+ optional TIES)	`merge_method: dare_ties`
Sequential / multi-stage merge pipeline	Any	Any	mergekit-multi workflow	YAML pipeline with method per stage

Choose SLERP when

SLERP is correct when you have exactly two models that share a base architecture, the fine-tunes target the same or adjacent domains, and you want the lowest-complexity merge path. Its norm-preserving interpolation "preserves norm on a hypersphere" and consistently outperforms linear interpolation in the two-model regime. If your merge is base-model + one fine-tune with a single t parameter to tune, SLERP is the right call.

Choose TIES when

TIES is correct when model count exceeds two, when the fine-tunes were trained on different tasks or datasets, or when you have any reason to believe their delta parameters pull in conflicting directions. The "sign consensus algorithm" that TIES applies directly addresses the structural failure mode of SLERP at multi-model scale. TIES over SLERP is not a universal upgrade — it's a fit-for-purpose choice when "sign disagreements across the models being merged" are the problem.

Choose DARE when

DARE is correct when TIES alone doesn't sufficiently suppress interference, or when you specifically need to control the sparsity of each model's delta contribution. If your TIES merge still shows benchmark regressions on held-out tasks, DARE's density parameter gives you a lever to reduce parameter-level noise that sign-consensus alone can't eliminate. DARE is not universally better than TIES — it adds a tuning dimension (density) that TIES avoids.

Benchmark signals to validate your choice

No single canonical 2026 benchmark table exists comparing SLERP, TIES, and DARE head-to-head across MMLU, GSM8K, and HumanEval on a standardized model family. The mergekit paper establishes that "model merging facilitates the creation of multitask models without the need for additional training" but does not publish a unified leaderboard. Method-specific papers report results on their own experimental setups, which use different base models and fine-tune pairs.

The practical consequence: you cannot rely on published benchmark deltas to validate your specific merge. You must run your own evals.

Eval target	What it measures post-merge	Why it matters for merge validation
MMLU (5-shot)	General knowledge breadth	Detects capability washout from aggressive merging
GSM8K	Step-by-step math reasoning	Sensitive to sign-conflict degradation in reasoning fine-tunes
HumanEval	Code generation correctness	Identifies when a coding fine-tune's delta got over-pruned by DARE
Domain-specific held-out set	Task your merge was built for	Ground truth for whether the merge achieved its objective

### What to measure in your own evals

Run evals on the individual source models before merging to establish per-task baselines. Then run the same eval suite on the merged model. The delta — positive or negative on each task — is your signal. mergekit's design goal of enabling "multitask models without the need for additional training" only holds if the merged model retains meaningful scores across all constituent tasks.

Prioritize task-specific evals over aggregate benchmarks. A merged Mistral 7B that scores 2 points higher on MMLU but 8 points lower on GSM8K has not been improved — it has been transformed into a different model with a different capability profile.

How to read benchmark regressions

Pro Tip: A benchmark gain on one task after a TIES or DARE merge does not confirm the merge succeeded. Because both methods "resolve interference when merging multiple models" by selectively suppressing parameters, they can increase performance on the "winning" task while discarding signal from the "losing" task's deltas. Always evaluate the full task suite, not just the task you optimized for.

A regression on a disjoint task (coding drops while reasoning improves) is the canonical signal that sign-consensus or delta-dropping went too far in one direction. DARE's density parameter is your primary lever for rebalancing; TIES's task weight parameters serve a similar function.

Question	Answer	Source
What is the difference between SLERP and TIES?	SLERP spherically interpolates between exactly two models with no interference handling. TIES applies sign-consensus across ≥2 models to resolve directional conflicts between delta parameters.	arXiv:2603.04972, arXiv:2306.01708
Can TIES merge more than two models?	Yes. TIES is designed specifically for "combining multiple task-specific models into a single multitask model."	arXiv:2306.01708
Is DARE better than SLERP?	DARE is better than SLERP for multi-model merges with high interference. For simple two-model blends, SLERP is lower complexity with no hyperparameter to tune. Neither dominates universally.	arXiv:2403.13257, arXiv:2408.03092
Which merge method is best for multiple models?	TIES is the safer default for ≥3 models. Add DARE when sign-consensus alone doesn't suppress interference sufficiently.	arXiv:2306.01708

Sources and references

Canonical references for merge methods

Source	URL	What it covers
mergekit README	github.com/arcee-ai/mergekit	Toolkit overview, CPU/GPU support, feature list
mergekit Merge Method Guide	docs/merge_methods.md	SLERP, TIES, DARE, Linear, and additional method documentation
mergekit-multi docs	docs/multimerge.md	Multi-stage workflow chaining, dependency handling, density recipes
mergekit custom method guide	docs/create_a_merge_method.md	Extensibility via module discovery
TIES-Merging paper	arXiv:2306.01708	Original TIES method, sign-consensus algorithm, multi-model interference framing
Arcee MergeKit paper	arXiv:2403.13257	mergekit system design, DARE reference, multitask merge framing
DARE reference	arXiv:2403.13257	Drop And REscale method description
DARE review coverage	arXiv:2603.09938v2	2026 survey covering DARE delta-drop and rescaling behavior
SLERP geodesic merging	arXiv:2603.04972v1	Fisher–Rao manifold framing, two-model geodesic characterization
EXTEND paper	arXiv:2408.03092	"SLERP is tailored for the combination of two models" — direct citation

Keywords: mergekit, SLERP, TIES, DARE, Llama 3.1 70B, Mistral 7B, Qwen2.5, Hugging Face Hub, PyTorch, CPU out-of-core merging, low-memory tensor loading, MoE-style merging, NVIDIA H100, GitHub README, mergekit-multi

Was this guide helpful?

Share: X · LinkedIn · Reddit

mergekit vs TIES vs DARE vs SLERP: which model-merging method should you use in 2026?