stanford-oval/storm View GitHub Homepage for Latest Official Releases

史丹佛大學開發的AI知識策展系統，能夠自動研究主題並生成帶引用的維基百科風格長篇報告

MITPythonstormstanford-oval 27.6k Last Updated: September 30, 2025

STORM專案詳細介紹

專案概述

STORM（Synthesis of Topic Outlines through Retrieval and Multi-perspective Question Asking）是由史丹佛大學開放虛擬助理實驗室（Stanford Open Virtual Assistant Lab, OVAL）開發的一個開源AI知識策展系統。該系統能夠基於網際網路搜尋，從零開始編寫類似維基百科的文章，並自動生成完整的引用文獻。

截至目前，已有超過7萬人試用了STORM的線上研究預覽版，這表明了該系統的實用價值和廣泛關注度。

核心功能特色

1. 雙階段文章生成流程

STORM將長篇文章生成分解為兩個關鍵階段：

預寫作階段：系統進行基於網際網路的研究，收集參考資料並生成文章大綱
寫作階段：系統使用大綱和參考資料生成完整的帶引用的文章

2. 多視角問答機制

STORM採用兩種策略來提高問題的深度和廣度：

視角導向問答：透過調研相似主題的現有文章發現不同視角，用於控制問答過程
模擬對話：模擬維基百科編輯者與主題專家之間的對話，基於網際網路資源進行討論

3. Co-STORM協作增強版

Co-STORM是STORM的協作增強版本，支援人機協作的知識策展：

多類型LLM智能體：包括Co-STORM專家智能體和調節者
動態心智圖：維護動態更新的心智圖，組織收集的資訊為層次化概念結構
人機協作協定：實現轉換管理策略，支援人類與AI系統的順暢協作

技術架構

支援的元件

語言模型：

支援litellm支援的所有語言模型
可配置不同模型用於不同任務元件

檢索模組：支援多種搜尋引擎和檢索器：YouRM、BingSearch、VectorRM、SerperRM、BraveRM、SearXNG、DuckDuckGoSearchRM、TavilySearchRM、GoogleSearch和AzureAISearch

嵌入模型：

支援litellm支援的所有嵌入模型

模組化設計

STORM採用高度模組化的設計，基於dspy框架實現，包含四個主要模組：

知識策展模組：收集關於給定主題的廣泛資訊
大綱生成模組：透過生成層次化大綱來組織收集的資訊
文章生成模組：基於大綱和收集的資訊填充生成文章
文章潤飾模組：最佳化和增強寫作文章的呈現效果

安裝與使用

快速安裝

# 使用pip安裝
pip install knowledge-storm

# 或從原始碼安裝
git clone https://github.com/stanford-oval/storm.git
cd storm
conda create -n storm python=3.11
conda activate storm
pip install -r requirements.txt

基本使用範例

import os
from knowledge_storm import STORMWikiRunnerArguments, STORMWikiRunner, STORMWikiLMConfigs
from knowledge_storm.lm import LitellmModel
from knowledge_storm.rm import YouRM

# 配置語言模型
lm_configs = STORMWikiLMConfigs()
openai_kwargs = {
    'api_key': os.getenv("OPENAI_API_KEY"),
    'temperature': 1.0,
    'top_p': 0.9,
}

# 設定不同元件的模型
gpt_35 = LitellmModel(model='gpt-3.5-turbo', max_tokens=500, **openai_kwargs)
gpt_4 = LitellmModel(model='gpt-4o', max_tokens=3000, **openai_kwargs)

lm_configs.set_conv_simulator_lm(gpt_35)
lm_configs.set_question_asker_lm(gpt_35)
lm_configs.set_outline_gen_lm(gpt_4)
lm_configs.set_article_gen_lm(gpt_4)
lm_configs.set_article_polish_lm(gpt_4)

# 配置檢索模組
engine_args = STORMWikiRunnerArguments(...)
rm = YouRM(ydc_api_key=os.getenv('YDC_API_KEY'), k=engine_args.search_top_k)
runner = STORMWikiRunner(engine_args, lm_configs, rm)

# 執行生成
topic = input('Topic: ')
runner.run(
    topic=topic,
    do_research=True,
    do_generate_outline=True,
    do_generate_article=True,
    do_polish_article=True,
)

Co-STORM使用範例

from knowledge_storm.collaborative_storm.engine import CollaborativeStormLMConfigs, RunnerArgument, CoStormRunner

# 配置Co-STORM
lm_config = CollaborativeStormLMConfigs()
# ... 配置各種語言模型 ...

topic = input('Topic: ')
runner_argument = RunnerArgument(topic=topic, ...)
costorm_runner = CoStormRunner(lm_config=lm_config, ...)

# 熱啟動系統
costorm_runner.warm_start()

# 進行協作對話
conv_turn = costorm_runner.step()
# 或注入使用者話語
costorm_runner.step(user_utterance="YOUR UTTERANCE HERE")

# 生成報告
costorm_runner.knowledge_base.reorganize()
article = costorm_runner.generate_report()

學術研究與資料集

研究論文

STORM的研究成果發表在NAACL 2024會議上，論文標題為《Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models》。Co-STORM的論文被EMNLP 2024主會議接收。

資料集貢獻

FreshWiki資料集：包含100篇高品質維基百科文章的集合，專注於2022年2月至2023年9月期間編輯最多的頁面。

WildSeek資料集：用於研究使用者在複雜資訊搜尋任務中的興趣，每個資料點包含一個主題和使用者進行深度搜尋的目標。

評估與回饋

自動評估結果

STORM在所有自動指標上都優於強大的檢索增強生成基準，包括LM評估和與人類撰寫文章的比較指標。

專家評估

在與經驗豐富的維基百科編輯者進行的人類評估中，所有參與者都同意該系統對他們的預寫作階段有幫助。與基於大綱的檢索增強基準生成的文章相比，更多的STORM文章被認為是有組織的（絕對增加25%）和覆蓋面廣的（增加10%）。

應用場景

適用使用者群體

學生：建立帶引用的研究論文和報告
研究人員：編纂綜合性文獻綜述
內容創作者：生成結構化、有深度的文章
維基百科編輯者：預寫作階段的輔助工具

使用限制

雖然系統無法產生發布就緒的文章（通常需要大量編輯），但經驗豐富的維基百科編輯者發現它在預寫作階段很有幫助。

專案發展

未來方向

團隊正在積極開發：

人機互動功能：支援使用者參與知識策展過程
資訊抽象：開發策展資訊的抽象，以支援超越維基百科風格報告的呈現格式

開源貢獻

該專案完全開源，歡迎社群貢獻。特別歡迎整合更多搜尋引擎/檢索器到knowledge_storm/rm.py的PR。

專案網址：https://github.com/stanford-oval/storm 線上演示：https://storm.genie.stanford.edu/ 專案網站：https://storm-project.stanford.edu/

引用資訊

如果在研究中使用STORM，請引用相關論文：

@inproceedings{shao-etal-2024-assisting,
    title = "Assisting in Writing {W}ikipedia-like Articles From Scratch with Large Language Models",
    author = "Shao, Yijia and Jiang, Yucheng and Kanell, Theodore and Xu, Peter and Khattab, Omar and Lam, Monica",
    booktitle = "Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)",
    month = jun,
    year = "2024",
    address = "Mexico City, Mexico",
    publisher = "Association for Computational Linguistics",
    pages = "6252--6278",
}

STORM專案代表了AI輔助知識策展領域的重要突破，為自動化研究和寫作提供了強大的工具和方法。