Liger-Kernel is a collection of Triton kernels developed by LinkedIn, specifically designed for training large language models (LLMs). This project effectively improves multi-GPU training throughput by 20% and reduces memory usage by 60%. The project name "Liger" stands for "LinkedIn GPU Efficient Runtime," reflecting its core concept of efficient GPU runtime.
The project supports various mainstream large language model architectures, including:
The project implements various optimized kernel operations:
LigerRMSNorm
: RMS NormalizationLigerLayerNorm
: Layer Normalizationliger_rotary_pos_emb
: Rotary Position Embedding (RoPE)LigerSwiGLUMLP
: SwiGLU Activation FunctionLigerGEGLUMLP
: GeGLU Activation FunctionLigerCrossEntropyLoss
: Cross-Entropy LossLigerFusedLinearCrossEntropyLoss
: Fused Linear Cross-Entropy LossSupports various alignment and preference optimization loss functions:
LigerFusedLinearDPOLoss
: DPO LossLigerFusedLinearORPOLoss
: ORPO LossLigerFusedLinearCPOLoss
: CPO LossLigerFusedLinearSimPOLoss
: SimPO LossLigerFusedLinearKTOLoss
: KTO Lossfrom liger_kernel.transformers import AutoLigerKernelForCausalLM
model = AutoLigerKernelForCausalLM.from_pretrained("path/to/some/model")
import transformers
from liger_kernel.transformers import apply_liger_kernel_to_llama
apply_liger_kernel_to_llama()
apply_liger_kernel_to_llama(
rope=True,
swiglu=True,
cross_entropy=True,
fused_linear_cross_entropy=False,
rms_norm=False
)
model = transformers.AutoModelForCausalLM("path/to/llama/model")
from liger_kernel.transformers import LigerFusedLinearCrossEntropyLoss
import torch.nn as nn
import torch
model = nn.Linear(128, 256).cuda()
loss_fn = LigerFusedLinearCrossEntropyLoss()
input = torch.randn(4, 128, requires_grad=True, device="cuda")
target = torch.randint(256, (4, ), device="cuda")
loss = loss_fn(model.weight, input, target)
loss.backward()
from liger_kernel.chunked_loss import LigerFusedLinearORPOLoss
orpo_loss = LigerFusedLinearORPOLoss()
y = orpo_loss(lm_head.weight, x, target)
pip install liger-kernel
pip install liger-kernel-nightly
git clone https://github.com/linkedin/Liger-Kernel.git
cd Liger-Kernel
pip install -e .
pip install -e ".[dev]"
torch >= 2.1.2
triton >= 2.3.0
torch >= 2.5.0
triton >= 3.0.0
transformers >= 4.x
: If using transformers model patching APIBenchmark Conditions:
Test results show:
Liger-Kernel has been integrated into several mainstream training frameworks:
By fusing multiple operations into a single kernel, the number of GPU memory accesses is reduced, improving computational efficiency.
For memory-intensive operations, a chunked processing technique is used to break down large calculations into smaller chunks, reducing peak memory usage.
Whenever possible, in-place operations are used to avoid additional memory allocation, further optimizing memory efficiency.
Liger-Kernel represents a significant advancement in large language model training optimization. With carefully designed Triton kernels, memory optimization techniques, and broad model support, it provides researchers and engineers with a powerful and easy-to-use tool that can significantly improve training efficiency and reduce computational costs. The project's open-source nature and active community support make it an important resource in the LLM training field.