THUDM/ChatGLM3Please refer to the latest official releases for information GitHub Homepage

An open-source bilingual conversational large language model jointly released by Zhipu AI and the KEG Lab of Tsinghua University, supporting tool calling, code execution, and other functions.

Apache-2.0Python 13.7kTHUDMChatGLM3 Last Updated: 2025-01-13

ChatGLM3 Project Detailed Introduction

Project Overview

ChatGLM3 is a conversational pre-trained model jointly released by Zhipu AI and the KEG Laboratory of Tsinghua University. ChatGLM3-6B is an open-source model in the ChatGLM3 series. While retaining the excellent features of the previous two generations, such as smooth conversations and low deployment threshold, ChatGLM3-6B introduces several important new features and improvements.

Project Address: https://github.com/THUDM/ChatGLM3

Core Features

1. More Powerful Base Model

The base model of ChatGLM3-6B, ChatGLM3-6B-Base, adopts more diverse training data, more sufficient training steps, and more reasonable training strategies. Evaluations on datasets from different perspectives such as semantics, mathematics, reasoning, code, and knowledge show that ChatGLM3-6B-Base has the strongest performance among base models below 10B.

2. More Complete Function Support

New Prompt Format: Adopts a newly designed Prompt format to support more flexible dialogue interaction.
Function Call: Natively supports the function call feature, allowing the model to actively call external tools.
Code Interpreter: Supports executing code in the Jupyter environment and obtaining results.
Agent Tasks: Supports complex agent task scenarios.

3. More Comprehensive Open-Source Sequence

Provides multiple versions to meet different needs:

ChatGLM3-6B: Standard dialogue model, supports 8K context length.
ChatGLM3-6B-Base: Basic pre-trained model.
ChatGLM3-6B-32K: Long text dialogue model, supports 32K context.
ChatGLM3-6B-128K: Ultra-long text understanding model, supports 128K context.

Performance

Basic Capability Evaluation

Test results on 8 typical Chinese and English datasets:

Model	GSM8K	MATH	BBH	MMLU	C-Eval	CMMLU	MBPP	AGIEval
ChatGLM2-6B-Base	32.4	6.5	33.7	47.9	51.7	50.0	-	-
ChatGLM3-6B-Base	72.3	25.7	66.1	61.4	69.0	67.5	52.4	53.7

Long Text Processing Capability

Manual evaluation tests were performed on ChatGLM3-6B-32K in multiple long text application scenarios. Compared with the second-generation model, its effect has improved by more than 50% on average. The improvement is particularly significant in applications such as paper reading, document summarization, and financial report analysis.

Installation and Usage

Environment Preparation

git clone https://github.com/THUDM/ChatGLM3
cd ChatGLM3
pip install -r requirements.txt

Basic Usage Example

from transformers import AutoTokenizer, AutoModel

# Load model
tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True, device='cuda')
model = model.eval()

# Dialogue interaction
response, history = model.chat(tokenizer, "你好", history=[])
print(response)

Hardware Requirements

Standard Loading: Requires approximately 13GB of video memory (FP16 precision).
Quantization Loading: Video memory requirements are greatly reduced after 4-bit quantization.
CPU Inference: Requires approximately 32GB of memory.
Multi-GPU Support: The model can be distributed across multiple GPUs.

Deployment Methods

1. Web Interface Deployment

# Gradio version
python web_demo_gradio.py

# Streamlit version
streamlit run web_demo_streamlit.py

2. Command Line Interaction

python cli_demo.py

3. API Service Deployment

cd openai_api_demo
python api_server.py

Provides OpenAI-compatible API interfaces, supporting:

Standard dialogue interface
Tool calling interface
Streaming response
Temperature and top_p parameter control

Fine-tuning and Expansion

Fine-tuning Support

The project provides a complete fine-tuning suite, supporting:

Instruction fine-tuning
Dialogue fine-tuning
Task-specific fine-tuning

Community Ecosystem

Supports several excellent open-source projects:

Inference Acceleration:

chatglm.cpp: Quantization acceleration scheme similar to llama.cpp
ChatGLM3-TPU: TPU accelerated inference
TensorRT-LLM: NVIDIA GPU high-performance inference
OpenVINO: Intel device accelerated inference

Fine-tuning Frameworks:

LLaMA-Factory: Efficient fine-tuning framework

Application Frameworks:

LangChain-Chatchat: RAG knowledge base project
BISHENG: Large model application development platform
RAGFlow: Deep document understanding RAG engine

Comprehensive Demo Functionality

The project provides a comprehensive demo integrating three modes:

Chat Mode: Standard dialogue interaction
Tool Mode: Tool calling demonstration
Code Interpreter Mode: Code execution environment

License and Terms of Use

Academic Research: Completely open for use.
Commercial Use: Free commercial use is allowed after filling out a questionnaire and registering.
Usage Restrictions: Must not be used for purposes that may endanger the country and society.
Security Requirements: Services need to pass security assessments and filings.

Technical Architecture Features

Model Architecture

Improved version based on the GLM architecture.
Optimized attention mechanism.
Better multilingual support.
Native support for tool calling.

Training Optimization

More diverse training data.
More sufficient training steps.
More reasonable training strategies.
Optimized for Chinese.

Community Contribution

The project actively embraces the open-source community and has in-depth cooperation with several excellent projects, forming a complete ecosystem. Developers can develop various innovative applications based on ChatGLM3.