Firecrawl 是一個 API 服務,接收 URL,爬取它,並將其轉換為乾淨的 markdown 或結構化數據。它爬取所有可訪問的子頁面,為每個頁面提供乾淨的數據。無需站點地圖。
在抓取內容之前可以執行各種操作:
curl -X POST https://api.firecrawl.dev/v1/crawl \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer fc-YOUR_API_KEY' \
-d '{
"url": "https://docs.firecrawl.dev",
"limit": 10,
"scrapeOptions": {
"formats": ["markdown", "html"]
}
}'
curl -X POST https://api.firecrawl.dev/v1/scrape \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer YOUR_API_KEY' \
-d '{
"url": "https://docs.firecrawl.dev",
"formats" : ["markdown", "html"]
}'
curl -X POST https://api.firecrawl.dev/v1/scrape \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer YOUR_API_KEY' \
-d '{
"url": "https://www.mendable.ai/",
"formats": ["json"],
"jsonOptions": {
"schema": {
"type": "object",
"properties": {
"company_mission": {"type": "string"},
"supports_sso": {"type": "boolean"},
"is_open_source": {"type": "boolean"},
"is_in_yc": {"type": "boolean"}
},
"required": ["company_mission", "supports_sso", "is_open_source", "is_in_yc"]
}
}
}'
pip install firecrawl-py
from firecrawl.firecrawl import FirecrawlApp
from firecrawl.firecrawl import ScrapeOptions
app = FirecrawlApp(api_key="fc-YOUR_API_KEY")
# 抓取網站
scrape_status = app.scrape_url(
'https://firecrawl.dev',
formats=["markdown", "html"]
)
print(scrape_status)
# 爬取網站
crawl_status = app.crawl_url(
'https://firecrawl.dev',
limit=100,
scrape_options=ScrapeOptions(formats=["markdown", "html"]),
poll_interval=30
)
print(crawl_status)
npm install @mendable/firecrawl-js
import FirecrawlApp, { CrawlParams, CrawlStatusResponse } from '@mendable/firecrawl-js';
const app = new FirecrawlApp({apiKey: "fc-YOUR_API_KEY"});
// 抓取網站
const scrapeResponse = await app.scrapeUrl('https://firecrawl.dev', {
formats: ['markdown', 'html'],
});
if (scrapeResponse) {
console.log(scrapeResponse)
}
// 爬取網站
const crawlResponse = await app.crawlUrl('https://firecrawl.dev', {
limit: 100,
scrapeOptions: {
formats: ['markdown', 'html'],
}
} satisfies CrawlParams, true, 30) satisfies CrawlStatusResponse;
用戶在使用 Firecrawl 進行抓取、搜索和爬取時,有責任遵守網站的政策。建議用戶在啟動任何抓取活動之前,遵守適用網站的隱私政策和使用條款。默認情況下,Firecrawl 在爬取時會遵守網站 robots.txt 文件中指定的指令。
該項目目前處於活躍開發狀態,團隊正在將自定義模塊集成到單體倉庫中。雖然尚未完全準備好進行自託管部署,但可以本地運行進行開發和測試。項目擁有活躍的社區和持續的更新,是網頁數據提取領域的領先解決方案。