LAION-AI/Open-AssistantPlease refer to the latest official releases for information GitHub Homepage

Open-source, chat-based AI assistant trained using Reinforcement Learning from Human Feedback, aiming to provide free large language model access to everyone.

Apache-2.0Python 37.4kLAION-AI Last Updated: 2024-08-17

Open Assistant Project Detailed Introduction

Project Overview

Open Assistant is an open-source chat AI assistant project developed by the LAION-AI organization. The project aims to provide everyone with access to excellent chat-based large language models (LLMs), creating innovation in language technology through open source.

⚠️ Important Notice: The OpenAssistant project is complete and has ended. The final released dataset can be found at OpenAssistant/oasst2 on HuggingFace.

Project Vision

Open Assistant believes that open-source collaboration can create a revolution in the field of language technology, just as Stable Diffusion has helped the world create art and images in new ways. The ultimate goal of the project is not just to replicate ChatGPT, but to build the assistant of the future, capable of:

Writing emails and cover letters
Performing meaningful work
Using APIs
Dynamically researching information
Supporting personalization and expansion

Technical Approach

Core Technology Stack

The main goal of the project is to have a chatbot that can answer questions and better follow instructions by adapting large language models (LLMs). To this end, the project used the method proposed in the InstructGPT paper, which is based on Reinforcement Learning from Human Feedback (RLHF).

Three-Step Training Method

The project follows the three-step method outlined in the InstructGPT paper:

Step 1: Data Collection

Collect high-quality human-generated instruction-completion samples (prompt + response)
Goal: Over 50,000 samples
Design a crowdsourcing process to collect and review prompts
Avoid training flood attacks/toxic/garbage/personal information data
Incentivize the community through leaderboards, showcasing progress and most active users

Step 2: Ranking Collection

Sample multiple completions for each collected prompt
Randomly present the completions of the prompt to users for ranking (from best to worst)
Handle unreliable or malicious users through crowdsourcing
Collect votes from multiple independent users to measure overall consistency
Use the collected ranking data to train a reward model

Step 3: RLHF Training

Perform Reinforcement Learning from Human Feedback training phase based on prompts and reward models
The resulting model can be used to continue the sampling step for the next iteration

Project Architecture

Development Environment Setup

The project supports a complete Docker stack deployment, including the website, backend, and related dependent services.

Basic startup command:

docker compose --profile ci up --build --attach-dependencies

MacOS M1 chip users need to use:

DB_PLATFORM=linux/x86_64 docker compose ...

Local access:

Main website: http://localhost:3000
Email login link: http://localhost:1080

Development Container Support

The project provides standardized development environment support:

Local VSCode devcontainer
GitHub Codespaces web browser environment
Configuration files are located in the .devcontainer folder

Features

Chat Functionality

The chat frontend is online, and users can log in and start chatting
Supports liking or disliking feedback on assistant responses
Real-time interactive experience

Data Collection

The data collection frontend is online, and users can log in and start performing tasks
Directly help improve Open Assistant capabilities by submitting, ranking, and labeling model prompts and responses
Crowdsourced approach to collecting high-quality data

Inference System

The project includes a complete inference system, supporting:

Local deployment of inference services
Ability to run on consumer-grade hardware
Scalable architecture design

Open Source Features

Community Participation

The project is organized by LAION and individuals around the world who are interested in bringing this technology to everyone
Developers are welcome to contribute code
Detailed contribution guidelines are provided

Project Status

Important Reminder: The OpenAssistant project is complete and has ended. Although the project itself has ended:

The final dataset oasst2 is available on HuggingFace
The code is still open source and accessible
The community can continue to develop based on existing work

Technical Requirements

Hardware Requirements

The project vision is to create a large language model that can run on a single high-end consumer-grade GPU
Supports consumer-grade hardware deployment
Optimized inference performance

Deployment Options

Docker containerized deployment
Local development environment
Cloud deployment support
Independent deployment of inference services

Related Resources

GitHub Repository: https://github.com/LAION-AI/Open-Assistant
Dataset: HuggingFace OpenAssistant/oasst2
Official Website: https://open-assistant.io/
Project Documentation: https://projects.laion.ai/Open-Assistant/
Chat Interface: https://open-assistant.io/chat