dmlc/xgboostView GitHub Homepage for Latest Official Releases

XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible and portable. It implements machine learning algorithms under the Gradient Boosting framework.

Apache-2.0C++xgboostdmlc 27.2k Last Updated: August 07, 2025

XGBoost (eXtreme Gradient Boosting)

Project Overview

XGBoost (eXtreme Gradient Boosting) is an optimized distributed gradient boosting library designed to be highly efficient, flexible, and portable. It implements machine learning algorithms under the Gradient Boosting framework. XGBoost provides a parallel tree boosting (also known as GBDT, GBM) that solves many data science problems in a fast and accurate way.

Background

Gradient boosting is a powerful machine learning technique that has been widely used in various prediction tasks. XGBoost emerged to address some of the limitations of existing gradient boosting algorithms, such as slow speed, poor scalability, and lack of flexibility. XGBoost significantly improves the performance of gradient boosting algorithms by introducing the following optimizations:

Regularization: XGBoost uses L1 and L2 regularization to prevent overfitting and improve the generalization ability of the model.
Sparsity Awareness: XGBoost can automatically handle missing values without the need for data preprocessing.
Parallel Processing: XGBoost supports parallel computation, which can leverage multi-core CPUs and distributed computing clusters to accelerate the training process.
Cache Optimization: XGBoost optimizes data access patterns, improving cache hit rates and thus speeding up training.
Scalability: XGBoost can handle large-scale datasets and supports multiple programming languages and platforms.

Core Features

Efficiency: XGBoost has excellent computational efficiency and can quickly train high-performance models.
Flexibility: XGBoost supports various loss functions, evaluation metrics, and regularization methods, allowing it to flexibly adapt to different prediction tasks.
Portability: XGBoost can run on various operating systems and hardware platforms, including Windows, Linux, macOS, and GPU.
Scalability: XGBoost can handle large-scale datasets and supports distributed computing.
Regularization: L1 and L2 regularization can prevent overfitting and improve the generalization ability of the model.
Sparsity Awareness: Automatically handles missing values without the need for data preprocessing.
Cross-Validation: Built-in cross-validation functionality for easy evaluation of model performance.
Model Saving and Loading: Trained models can be saved to disk and loaded when needed.
Feature Importance Evaluation: Can evaluate the contribution of each feature to the model's predictions.

Application Scenarios

XGBoost is widely used in various machine learning tasks, including:

Classification: Predicting the category to which a sample belongs, such as spam detection, image recognition.
Regression: Predicting continuous values, such as house price prediction, stock price prediction.
Ranking: Ranking search results or recommended items.
Recommendation Systems: Recommending products or services that users may be interested in based on their historical behavior.
Fraud Detection: Detecting credit card fraud, online fraud, etc.
Risk Assessment: Assessing loan default risk, insurance claim risk, etc.
Natural Language Processing: Text classification, sentiment analysis, machine translation, etc.
Computer Vision: Image classification, object detection, image segmentation, etc.

XGBoost has achieved excellent results in many machine learning competitions, such as Kaggle competitions. It has become one of the preferred algorithms for data scientists and machine learning engineers.