Skip to content
AxiomLogicaSearch
Category

AI & ML

All about AI and Machine Learning, Latest articles, advances in domain.

All articles

How Megatron-LM handles tensor, pipeline, and sequence parallelism for large transformer training
AI & ML

How Megatron-LM handles tensor, pipeline, and sequence parallelism for large transformer training

Megatron-LM’s design composes tensor parallelism, pipeline parallelism, data parallelism, expert parallelism, and context/sequence parallelism inside Megatron Core so large transformers can be partitioned across GPUs without changing the model’s mathematical behavior — but the trade-off is added communication, scheduling complexity, and a need to balance activation recomputation against throughput.

25 min read
LLaMA Factory vs TRL for instruction tuning in 2026: when to choose each stack
AI & ML

LLaMA Factory vs TRL for instruction tuning in 2026: when to choose each stack

LLaMA Factory packages a broader turnkey training surface — 100+ models, multiple fine-tuning and preference-tuning methods, and a zero-code UI/CLI — while TRL stays closer to the Hugging Face ecosystem and is better when you want a lighter, library-first SFT/PPO/DPO workflow; the right choice depends on how much orchestration you want to absorb yourself.

22 min read
How Qwen3-Coder-Next constructs tool chat templates for agentic SFT
AI & ML

How Qwen3-Coder-Next constructs tool chat templates for agentic SFT

Qwen-style tool templates encode tool calls and tool responses as explicit structured chat turns, which lets agentic SFT learn when to emit function calls versus natural language — but that same rigid structure makes tokenization, message ordering, and role boundaries critical to correctness.

24 min read
How to run multi-node fine-tuning with Axolotl using FSDP2 or torchrun over InfiniBand
AI & ML

How to run multi-node fine-tuning with Axolotl using FSDP2 or torchrun over InfiniBand

Axolotl’s multi-node path works either through Accelerate/FSDP2 config or torchrun rendezvous, and for InfiniBand the docs explicitly recommend torchrun with NCCL_IB_DISABLE=0 and tuned NCCL_SOCKET_IFNAME/NCCL_BUFFSIZE settings — but every node must share the same Axolotl commit and config, and the launcher choice changes how you debug NCCL and rendezvous failures.

19 min read
SimPO paper explained: what changes when you drop the reference-log-ratio term
AI & ML

SimPO paper explained: what changes when you drop the reference-log-ratio term

SimPO replaces the reference-log-ratio term with a reference-free reward and the released repo reports stronger results than DPO variants on AlpacaEval 2, MT-Bench, and Arena-Hard — but the authors also caution that performance depends heavily on learning-rate and beta tuning, so the method is not plug-and-play.

22 min read
OpenAI text-embedding-3-small vs BGE, E5, Voyage, Cohere, and Qwen3 Embedding for retrieval
AI & ML

OpenAI text-embedding-3-small vs BGE, E5, Voyage, Cohere, and Qwen3 Embedding for retrieval

In 2026, the main differentiators are not just benchmark averages but retrieval quality, multilingual coverage, dimensionality, and operational constraints — OpenAI text-embedding-3-small is the cost-effective default, Voyage is positioned for top retrieval accuracy, and BGE-M3 is the common self-hosted multilingual pick, but model choice is sticky because re-embedding an existing corpus is expensive.

22 min read
Inside ORPO: why monolithic preference optimization removes the reference model
AI & ML

Inside ORPO: why monolithic preference optimization removes the reference model

ORPO’s monolithic objective folds supervised and preference learning into a single optimization path, removing the separate reference model used by DPO-style methods — which simplifies the training stack and can reduce orchestration overhead, but shifts more of the stability burden onto loss design and tuning.

23 min read
Should teams fine-tune with LoRA or buy a managed custom-model platform?
AI & ML

Should teams fine-tune with LoRA or buy a managed custom-model platform?

The economic break-even for self-managed LoRA usually depends less on adapter training cost than on ongoing platform labor, governance, and model-lifecycle overhead, so the cheapest per-token path can still be the most expensive operating model once staffing and reliability are counted.

21 min read

The weekly brief.

One email each Sunday with what we tested, what we'd buy, and what to skip. No filler.