An open-source bilingual conversational large language model jointly released by Zhipu AI and the KEG Lab of Tsinghua University, supporting tool calling, code execution, and other functions.
ChatGLM3 Project Detailed Introduction
Project Overview
ChatGLM3 is a conversational pre-trained model jointly released by Zhipu AI and the KEG Laboratory of Tsinghua University. ChatGLM3-6B is an open-source model in the ChatGLM3 series. While retaining the excellent features of the previous two generations, such as smooth conversations and low deployment threshold, ChatGLM3-6B introduces several important new features and improvements.
Project Address: https://github.com/THUDM/ChatGLM3
Core Features
1. More Powerful Base Model
The base model of ChatGLM3-6B, ChatGLM3-6B-Base, adopts more diverse training data, more sufficient training steps, and more reasonable training strategies. Evaluations on datasets from different perspectives such as semantics, mathematics, reasoning, code, and knowledge show that ChatGLM3-6B-Base has the strongest performance among base models below 10B.
2. More Complete Function Support
- New Prompt Format: Adopts a newly designed Prompt format to support more flexible dialogue interaction.
- Function Call: Natively supports the function call feature, allowing the model to actively call external tools.
- Code Interpreter: Supports executing code in the Jupyter environment and obtaining results.
- Agent Tasks: Supports complex agent task scenarios.
3. More Comprehensive Open-Source Sequence
Provides multiple versions to meet different needs:
- ChatGLM3-6B: Standard dialogue model, supports 8K context length.
- ChatGLM3-6B-Base: Basic pre-trained model.
- ChatGLM3-6B-32K: Long text dialogue model, supports 32K context.
- ChatGLM3-6B-128K: Ultra-long text understanding model, supports 128K context.
Performance
Basic Capability Evaluation
Test results on 8 typical Chinese and English datasets:
Model | GSM8K | MATH | BBH | MMLU | C-Eval | CMMLU | MBPP | AGIEval |
---|---|---|---|---|---|---|---|---|
ChatGLM2-6B-Base | 32.4 | 6.5 | 33.7 | 47.9 | 51.7 | 50.0 | - | - |
ChatGLM3-6B-Base | 72.3 | 25.7 | 66.1 | 61.4 | 69.0 | 67.5 | 52.4 | 53.7 |
Long Text Processing Capability
Manual evaluation tests were performed on ChatGLM3-6B-32K in multiple long text application scenarios. Compared with the second-generation model, its effect has improved by more than 50% on average. The improvement is particularly significant in applications such as paper reading, document summarization, and financial report analysis.
Installation and Usage
Environment Preparation
git clone https://github.com/THUDM/ChatGLM3
cd ChatGLM3
pip install -r requirements.txt
Basic Usage Example
from transformers import AutoTokenizer, AutoModel
# Load model
tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True, device='cuda')
model = model.eval()
# Dialogue interaction
response, history = model.chat(tokenizer, "你好", history=[])
print(response)
Hardware Requirements
- Standard Loading: Requires approximately 13GB of video memory (FP16 precision).
- Quantization Loading: Video memory requirements are greatly reduced after 4-bit quantization.
- CPU Inference: Requires approximately 32GB of memory.
- Multi-GPU Support: The model can be distributed across multiple GPUs.
Deployment Methods
1. Web Interface Deployment
# Gradio version
python web_demo_gradio.py
# Streamlit version
streamlit run web_demo_streamlit.py
2. Command Line Interaction
python cli_demo.py
3. API Service Deployment
cd openai_api_demo
python api_server.py
Provides OpenAI-compatible API interfaces, supporting:
- Standard dialogue interface
- Tool calling interface
- Streaming response
- Temperature and top_p parameter control
Fine-tuning and Expansion
Fine-tuning Support
The project provides a complete fine-tuning suite, supporting:
- Instruction fine-tuning
- Dialogue fine-tuning
- Task-specific fine-tuning
Community Ecosystem
Supports several excellent open-source projects:
Inference Acceleration:
- chatglm.cpp: Quantization acceleration scheme similar to llama.cpp
- ChatGLM3-TPU: TPU accelerated inference
- TensorRT-LLM: NVIDIA GPU high-performance inference
- OpenVINO: Intel device accelerated inference
Fine-tuning Frameworks:
- LLaMA-Factory: Efficient fine-tuning framework
Application Frameworks:
- LangChain-Chatchat: RAG knowledge base project
- BISHENG: Large model application development platform
- RAGFlow: Deep document understanding RAG engine
Comprehensive Demo Functionality
The project provides a comprehensive demo integrating three modes:
- Chat Mode: Standard dialogue interaction
- Tool Mode: Tool calling demonstration
- Code Interpreter Mode: Code execution environment
License and Terms of Use
- Academic Research: Completely open for use.
- Commercial Use: Free commercial use is allowed after filling out a questionnaire and registering.
- Usage Restrictions: Must not be used for purposes that may endanger the country and society.
- Security Requirements: Services need to pass security assessments and filings.
Technical Architecture Features
Model Architecture
- Improved version based on the GLM architecture.
- Optimized attention mechanism.
- Better multilingual support.
- Native support for tool calling.
Training Optimization
- More diverse training data.
- More sufficient training steps.
- More reasonable training strategies.
- Optimized for Chinese.
Community Contribution
The project actively embraces the open-source community and has in-depth cooperation with several excellent projects, forming a complete ecosystem. Developers can develop various innovative applications based on ChatGLM3.